[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999688#comment-15999688
 ] 

Devang Shah commented on ARROW-955:
---

That worked as expected. Thanks a lot for the excellent response, Wes! Truly 
appreciated.

Here's another quick question: the documentation at:

https://media.readthedocs.org/pdf/pyarrow/latest/pyarrow.pdf

is quite sparse in the read/write section. There's nothing about ParquetFile or 
ParquetReader, especially, APIs like read_row_group().

Specifically about read_row_group(): are there interfaces to then extract a row 
at a time from the result of read_row_group()? A row group by default is 128MB, 
but may have been written out as larger (I think the recommendation is 1GB?) - 
so read_row_group() would return a row_group which would occupy 128MB or 1GB of 
RAM (depending on the parquet.block.size when the row group was written out). 
Am I right? If so, are there interfaces to then read a row at a time from the 
row group (returned via read_row_group())?



> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-962) [Python] Add schema attribute to FileReader

2017-05-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-962:
--

 Summary: [Python] Add schema attribute to FileReader
 Key: ARROW-962
 URL: https://issues.apache.org/jira/browse/ARROW-962
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.4.0


This will help with API conformity between the Stream and File classes



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999655#comment-15999655
 ] 

Wes McKinney commented on ARROW-955:


>From above, the command to build and install is:

{code}
python setup.py build_ext --build-type=release --with-parquet --with-jemalloc 
install
{code}

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999651#comment-15999651
 ] 

Devang Shah edited comment on ARROW-955 at 5/7/17 2:00 AM:
---

How do I install it now, that it's been built --inplace ? Should I just re-run 
the command as:

{code}
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-jemalloc --install
{code}

Or is there a different command, which just does install? Like:

{code}
python setup.py --install
{code}



was (Author: derringdo):
How do I install it now, that it's been built --inplace ? Should I just re-run 
the command as:

{code}
c build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc 
--install
{code}

Or is there a different command, which just does install? Like:

{code}
python setup.py --install
{code}


> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999651#comment-15999651
 ] 

Devang Shah commented on ARROW-955:
---

How do I install it now, that it's been built --inplace ? Should I just re-run 
the command as:

{code}
c build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet --with-jemalloc 
--install
{code}

Or is there a different command, which just does install? Like:

{code}
python setup.py --install
{code}


> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-961) [Python] Rename InMemoryOutputStream to BufferOutputStream

2017-05-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-961:
--

 Summary: [Python] Rename InMemoryOutputStream to BufferOutputStream
 Key: ARROW-961
 URL: https://issues.apache.org/jira/browse/ARROW-961
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney
 Fix For: 0.4.0


Having this name difference does not seem especially helpful. We can maintain 
the existing name as an alias for the duration of 1 release. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999646#comment-15999646
 ] 

Wes McKinney commented on ARROW-955:


Your command is missing {{install}} at the end, which copies the built package 
into the environment's {{site-packages}} directory. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999644#comment-15999644
 ] 

Devang Shah commented on ARROW-955:
---

Here's the output of the successful "python setup.py ..." call for the pyarrow 
build (which used to fail until your fix):

{code}
(pyarrow-dev) derdo@prompt:~/repos/arrow/python$ python setup.py build_ext 
--build-type=$ARROW_BUILD_TYPE
--with-parquet --with-jemalloc --inplace
running build_ext
cmake  -DPYTHON_EXECUTABLE=/home/derdo/miniconda2/envs/pyarrow-dev/bin/python  
-DPYARROW_BUILD_PARQUET=on
-DPYARROW_BUILD_JEMALLOC=on -DCMAKE_BUILD_TYPE=release 
/home/derdo/repos/arrow/python
Configured for RELEASE build (set with cmake 
-DCMAKE_BUILD_TYPE={release,debug,...})
-- Build Type: RELEASE
INFOCompiler version: Using built-in specs.
COLLECT_GCC=/usr/bin/c++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 
4.8.4-2ubuntu1~14.04.3'
--with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs 
--enable-languages=c,c++,java,go,d,fortran,objc,obj-c++
--prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix 
--with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib
--enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug 
--enable-libstdcxx-time=yes
--enable-gnu-unique-object --disable-libmudflap --enable-plugin 
--with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 
--with-arch-directory=amd64
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc 
--enable-multiarch --disable-werror --with-arch-32=i686
--with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic 
--enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.4 (Ubuntu 4.8.4-2ubuntu1~14.04.3)

INFOCompiler id: GNU
Selected compiler gcc 4.8.4
Using static linking for RELEASE builds
collect2 version 4.8.4
/usr/bin/ld --sysroot=/ --build-id --eh-frame-hdr -m elf_x86_64 
--hash-style=gnu --as-needed -dynamic-linker
/lib64/ld-linux-x86-64.so.2 -z relro 
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crt1.o
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crti.o 
/usr/lib/gcc/x86_64-linux-gnu/4.8/crtbegin.o
-L/usr/lib/gcc/x86_64-linux-gnu/4.8 
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu
-L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../../lib -L/lib/x86_64-linux-gnu 
-L/lib/../lib -L/usr/lib/x86_64-linux-gnu
-L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.8/../../.. --version 
-lstdc++ -lm -lgcc_s -lgcc -lc -lgcc_s -lgcc
/usr/lib/gcc/x86_64-linux-gnu/4.8/crtend.o 
/usr/lib/gcc/x86_64-linux-gnu/4.8/../../../x86_64-linux-gnu/crtn.o
Using ld linker
-- Build output directory: 
/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/release/
-- Searching for Python libs in
/home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config
-- Looking for python2.7
-- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so
-- Searching for Python libs in
/home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config
-- Looking for python2.7
-- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so
-- Found the Parquet library: /usr/local/lib/libparquet.so
-- Found the Parquet Arrow library: /usr/local/lib
-- Found the Arrow core library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
-- Found the Arrow Python library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
-- Found the Arrow jemalloc library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
Added shared library dependency arrow: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
Added shared library dependency arrow_python: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
Added shared library dependency parquet_arrow: 
/usr/local/lib/libparquet_arrow.so
Added shared library dependency arrow_jemalloc: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
-- Configuring done
-- Generating done
-- Build files have been written to: 
/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7
make
Scanning dependencies of target _parquet_pyx
[  4%] Compiling Cython CXX source for _parquet...
[  4%] Built target _parquet_pyx
Scanning dependencies of target _parquet
[  8%] Building CXX object 

[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999642#comment-15999642
 ] 

Wes McKinney commented on ARROW-955:


Can you show the console output of

{code}
python setup.py \
build_ext --build-type=release --with-parquet --with-jemalloc \
install
{code}

with that conda environment activated?

For your other questions, you're now in the domain of general Python package 
management and devops, which is not something we can help too much with. I 
recommend either building conda packages or binary wheels (e.g. using our 
manylinux1 toolchain), if using our released binary artifacts doesn't work for 
your use case. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999635#comment-15999635
 ] 

Devang Shah commented on ARROW-955:
---

Also, once I've experimented with a conda package (in this case, the pyarrow 
package), within a conda environment; how do I install it for the system (i.e. 
outside the environment)? So that anyone in the system can then use the package 
(without having to first activate a specific conda environment).


> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999631#comment-15999631
 ] 

Devang Shah commented on ARROW-955:
---

I followed the source-from-build instructions at:

https://github.com/apache/arrow/blob/master/python/doc/source/development.rst

So, I did a "source activate pyarrow-dev" before cloning the repos and building 
parquet-cpp, arrow and pyarrow exactly as instructed (with your fix which was 
needed to make the pyarrow build / install work). So do these steps not install 
into the pyarrow-dev environment?

If they do, then maybe when I activate this environment from a different 
terminal or ssh session, I need to set some env-vars to make this package 
usable/importable from this environment ?



> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999624#comment-15999624
 ] 

Wes McKinney commented on ARROW-955:


In the {{pyarrow-dev}} case, it looks like the package was not installed in 
that environment. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999620#comment-15999620
 ] 

Devang Shah commented on ARROW-955:
---

The same thing in the conda env which has the binary download works as 
expected, and shows that the package is coming from the activated environment:

{code}
derdo@prompt:~$ source activate wfparq

(wfparq) derdo@prompt:~$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.parquet
>>> pyarrow.parquet


{code}

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999618#comment-15999618
 ] 

Devang Shah commented on ARROW-955:
---

Thanks, but I am running into a problem: when I activate the source-build conda 
environment from a different terminal, I can't import pyarrow.parquet at all

{code}
source activate pyarrow-dev
(pyarrow-dev) derdo@prompt:~$ python
Python 2.7.13 | packaged by conda-forge | (default, May  2 2017, 12:48:11) 
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow.parquet
Traceback (most recent call last):
  File "", line 1, in 
ImportError: No module named pyarrow.parquet
>>> 

{code}

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-300) [Format] Add buffer compression option to IPC file format

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999611#comment-15999611
 ] 

Wes McKinney commented on ARROW-300:


I'm sorry for the delay. With the 0.3 Arrow release done, it would be good to 
make a push on compression and encoding. 

How about we start a Google Document that supports public comments and you can 
give edit support to whomever you like? Once we agree on the design, one of us 
can make a pull request containing the Flatbuffer metadata for the compression 
/ encoding details. Does that sound good?

> [Format] Add buffer compression option to IPC file format
> -
>
> Key: ARROW-300
> URL: https://issues.apache.org/jira/browse/ARROW-300
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Format
>Reporter: Wes McKinney
>
> It may be useful if data is to be sent over the wire to compress the data 
> buffers themselves as their being written in the file layout.
> I would propose that we keep this extremely simple with a global buffer 
> compression setting in the file Footer. Probably only two compressors worth 
> supporting out of the box would be zlib (higher compression ratios) and lz4 
> (better performance).
> What does everyone think?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999609#comment-15999609
 ] 

Wes McKinney commented on ARROW-955:


If you're going to work with both development builds and released binary 
artifacts, it's good practice to work in conda environments, so you would do 
development in a different environment from the one where you installed the 
pyarrow package from conda-forge. You can see what is imported in the Python 
shell

{code}
In [1]: import pyarrow

In [2]: pyarrow
Out[2]: 
{code}

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999607#comment-15999607
 ] 

Devang Shah commented on ARROW-955:
---

BTW, on the same machine in a different conda env, I installed version 0.3.0 
through conda-forge (the binary download). So, if I am on a different conda env 
where I did the build from source, invoking python in the new environment, and 
importing pyarrow.parquet should give me the pyarrow from the 
"build-from-source" in the new environment - correct? How do I double-check 
this? That I am getting the right pyarrow when I import it in python in the new 
conda env?

Thanks a lot for your prompt help.


> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-957) [Doc] Add HDFS and Windows documents to doxygen output

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999605#comment-15999605
 ] 

Wes McKinney commented on ARROW-957:


Sounds good to me. 

> [Doc] Add HDFS and Windows documents to doxygen output
> --
>
> Key: ARROW-957
> URL: https://issues.apache.org/jira/browse/ARROW-957
> Project: Apache Arrow
>  Issue Type: Task
>  Components: C++
>Reporter: Uwe L. Korn
>Assignee: Uwe L. Korn
>
> Currently these documents are not rendered on the website. I would move them 
> to the {{apidoc/}} folder and link to them in the main doxygen page. Probably 
> this is the point where we also would move {{apidoc/}} back to {{docs/}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999604#comment-15999604
 ] 

Wes McKinney commented on ARROW-955:


Those tests are expected failures, so all is good. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999598#comment-15999598
 ] 

Devang Shah edited comment on ARROW-955 at 5/6/17 9:59 PM:
---

Yes! Thanks a million. However, a couple of tests fail:

{code}
-- Found the Parquet library: /usr/local/lib/libparquet.so
-- Found the Parquet Arrow library: /usr/local/lib
-- Found the Arrow core library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
-- Found the Arrow Python library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
-- Found the Arrow jemalloc library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
Added shared library dependency arrow: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
Added shared library dependency arrow_python: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
Added shared library dependency parquet_arrow: 
/usr/local/lib/libparquet_arrow.so
Added shared library dependency arrow_jemalloc: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
-- Configuring done
-- Generating done
-- Build files have been written to: 
/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7
make
Scanning dependencies of target _parquet_pyx
[  4%] Compiling Cython CXX source for _parquet...
[  4%] Built target _parquet_pyx
Scanning dependencies of target _parquet
[  8%] Building CXX object CMakeFiles/_parquet.dir/_parquet.cxx.o
[ 12%] Linking CXX shared module release/_parquet.so
[ 12%] Built target _parquet
Scanning dependencies of target _error_pyx
[ 16%] Compiling Cython CXX source for _error...
[ 16%] Built target _error_pyx
Scanning dependencies of target _error
[ 20%] Building CXX object CMakeFiles/_error.dir/_error.cxx.o
[ 25%] Linking CXX shared module release/_error.so
[ 25%] Built target _error
Scanning dependencies of target _jemalloc_pyx
[ 29%] Compiling Cython CXX source for _jemalloc...
[ 29%] Built target _jemalloc_pyx
Scanning dependencies of target _jemalloc
[ 33%] Building CXX object CMakeFiles/_jemalloc.dir/_jemalloc.cxx.o
[ 37%] Linking CXX shared module release/_jemalloc.so
[ 37%] Built target _jemalloc
Scanning dependencies of target _table_pyx
[ 41%] Compiling Cython CXX source for _table...
[ 41%] Built target _table_pyx
Scanning dependencies of target _table
[ 45%] Building CXX object CMakeFiles/_table.dir/_table.cxx.o
[ 50%] Linking CXX shared module release/_table.so
[ 50%] Built target _table
Scanning dependencies of target _config_pyx
[ 54%] Compiling Cython CXX source for _config...
[ 54%] Built target _config_pyx
Scanning dependencies of target _config
[ 58%] Building CXX object CMakeFiles/_config.dir/_config.cxx.o
[ 62%] Linking CXX shared module release/_config.so
[ 62%] Built target _config
Scanning dependencies of target _memory_pyx
[ 66%] Compiling Cython CXX source for _memory...
[ 66%] Built target _memory_pyx
Scanning dependencies of target _memory
[ 70%] Building CXX object CMakeFiles/_memory.dir/_memory.cxx.o
[ 75%] Linking CXX shared module release/_memory.so
[ 75%] Built target _memory
Scanning dependencies of target _array_pyx
[ 79%] Compiling Cython CXX source for _array...
[ 79%] Built target _array_pyx
Scanning dependencies of target _array
[ 83%] Building CXX object CMakeFiles/_array.dir/_array.cxx.o
[ 87%] Linking CXX shared module release/_array.so
[ 87%] Built target _array
Scanning dependencies of target _io_pyx
[ 91%] Compiling Cython CXX source for _io...
[ 91%] Built target _io_pyx
Scanning dependencies of target _io
[ 95%] Building CXX object CMakeFiles/_io.dir/_io.cxx.o
[100%] Linking CXX shared module release/_io.so
[100%] Built target _io
('Moving built C-extension', 'release/_array.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_array.so')
('Moving built C-extension', 'release/_config.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_config.so')
('Moving built C-extension', 'release/_error.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_error.so')
('Moving built C-extension', 'release/_io.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_io.so')
('Moving built C-extension', 'release/_jemalloc.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_jemalloc.so')
('Moving built C-extension', 'release/_memory.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_memory.so')
('Moving built C-extension', 'release/_parquet.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_parquet.so')
('Moving built C-extension', 'release/_table.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_table.so')
(pyarrow-dev) derdo@prompt:~/repos/arrow/python$ py.test pyarrow
=== test session starts ===
platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /home/derdo/repos/arrow/python, inifile:
collected 210 items


[jira] [Updated] (ARROW-909) libjemalloc.so.2: cannot open shared object file:

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-909:
---
Fix Version/s: 0.4.0

> libjemalloc.so.2: cannot open shared object file: 
> --
>
> Key: ARROW-909
> URL: https://issues.apache.org/jira/browse/ARROW-909
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: linux centos
>Reporter: Abdul Rahman
>Assignee: Uwe L. Korn
>  Labels: pyarrow
> Fix For: 0.4.0
>
>
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py",
>  line 28, in 
> import pyarrow._config
> ImportError: libjemalloc.so.2: cannot open shared object file: No such file 
> or directory
> $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including 
> libarrow.so,  libparquet.so, libparquet_arrow.so. Pyarrow was built using 
> with-jemalloc and parquet-cpp was cmake-d with 
> -DPARQUET_ARROW=ON  
> Also, noticed that arrow/python documentation has been cleaned up with the 
> installation instructions having the coda approach only .Is this the only 
> supported way going forward ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-909) libjemalloc.so.2: cannot open shared object file:

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-909.

Resolution: Fixed

Issue resolved by pull request 651
[https://github.com/apache/arrow/pull/651]

> libjemalloc.so.2: cannot open shared object file: 
> --
>
> Key: ARROW-909
> URL: https://issues.apache.org/jira/browse/ARROW-909
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: linux centos
>Reporter: Abdul Rahman
>Assignee: Uwe L. Korn
>  Labels: pyarrow
>
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py",
>  line 28, in 
> import pyarrow._config
> ImportError: libjemalloc.so.2: cannot open shared object file: No such file 
> or directory
> $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including 
> libarrow.so,  libparquet.so, libparquet_arrow.so. Pyarrow was built using 
> with-jemalloc and parquet-cpp was cmake-d with 
> -DPARQUET_ARROW=ON  
> Also, noticed that arrow/python documentation has been cleaned up with the 
> installation instructions having the coda approach only .Is this the only 
> supported way going forward ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-947) [Python] Improve execution time of manylinux1 build

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-947:
---
Fix Version/s: 0.4.0

> [Python] Improve execution time of manylinux1 build
> ---
>
> Key: ARROW-947
> URL: https://issues.apache.org/jira/browse/ARROW-947
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.3.0
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
> Fix For: 0.4.0
>
>
> Perhaps we could have the same testing benefits by limiting the matrix of 
> builds? Pulling the Docker image takes about 90 seconds, but the build itself 
> takes 25 minutes or more. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-856) CmakeError by Unknown compiler.

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-856:
---
Fix Version/s: 0.4.0

> CmakeError by Unknown compiler. 
> 
>
> Key: ARROW-856
> URL: https://issues.apache.org/jira/browse/ARROW-856
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: YJ
>Assignee: Uwe L. Korn
> Fix For: 0.4.0
>
>
> From :https://github.com/ray-project/ray/issues/468
> [root@SZV1000268092 python]# LANG=C gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
> --enable-bootstrap --enable-shared --enable-threads=posix 
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
> --disable-libunwind-exceptions --enable-gnu-unique-object 
> --enable-linker-build-id --with-linker-hash-style=gnu 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin 
> --enable-initfini-array --disable-libgcj 
> --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
>  
> --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
>  --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
> Result:
> INFO GNU
> CMake Error at cmake_modules/CompilerInfo.cmake:62 (message):
> Unknown compiler. Version info is just the above.
> Error
> /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC   
> -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c 
> /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx
> c++: error: unrecognized command line option '-maltivec'
> make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1
> make[1]: Leaving directory 
> `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp'
> make: *** [cmTryCompileExec1115247767/fast] Error 2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-899.

Resolution: Fixed

Issue resolved by pull request 652
[https://github.com/apache/arrow/pull/652]

> [Docs] Add CHANGELOG for 0.3.0
> --
>
> Key: ARROW-899
> URL: https://issues.apache.org/jira/browse/ARROW-899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-856) CmakeError by Unknown compiler.

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-856.

Resolution: Fixed

Issue resolved by pull request 650
[https://github.com/apache/arrow/pull/650]

> CmakeError by Unknown compiler. 
> 
>
> Key: ARROW-856
> URL: https://issues.apache.org/jira/browse/ARROW-856
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: YJ
>Assignee: Uwe L. Korn
>
> From :https://github.com/ray-project/ray/issues/468
> [root@SZV1000268092 python]# LANG=C gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
> --enable-bootstrap --enable-shared --enable-threads=posix 
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
> --disable-libunwind-exceptions --enable-gnu-unique-object 
> --enable-linker-build-id --with-linker-hash-style=gnu 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin 
> --enable-initfini-array --disable-libgcj 
> --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
>  
> --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
>  --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
> Result:
> INFO GNU
> CMake Error at cmake_modules/CompilerInfo.cmake:62 (message):
> Unknown compiler. Version info is just the above.
> Error
> /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC   
> -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c 
> /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx
> c++: error: unrecognized command line option '-maltivec'
> make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1
> make[1]: Leaving directory 
> `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp'
> make: *** [cmTryCompileExec1115247767/fast] Error 2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999598#comment-15999598
 ] 

Devang Shah commented on ARROW-955:
---

Yes! Thanks a million. However, a couple of tests fail:

(code)
-- Found the Parquet library: /usr/local/lib/libparquet.so
-- Found the Parquet Arrow library: /usr/local/lib
-- Found the Arrow core library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
-- Found the Arrow Python library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
-- Found the Arrow jemalloc library: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
Added shared library dependency arrow: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow.so
Added shared library dependency arrow_python: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_python.so
Added shared library dependency parquet_arrow: 
/usr/local/lib/libparquet_arrow.so
Added shared library dependency arrow_jemalloc: 
/home/derdo/miniconda2/envs/pyarrow-dev/lib/libarrow_jemalloc.so
-- Configuring done
-- Generating done
-- Build files have been written to: 
/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7
make
Scanning dependencies of target _parquet_pyx
[  4%] Compiling Cython CXX source for _parquet...
[  4%] Built target _parquet_pyx
Scanning dependencies of target _parquet
[  8%] Building CXX object CMakeFiles/_parquet.dir/_parquet.cxx.o
[ 12%] Linking CXX shared module release/_parquet.so
[ 12%] Built target _parquet
Scanning dependencies of target _error_pyx
[ 16%] Compiling Cython CXX source for _error...
[ 16%] Built target _error_pyx
Scanning dependencies of target _error
[ 20%] Building CXX object CMakeFiles/_error.dir/_error.cxx.o
[ 25%] Linking CXX shared module release/_error.so
[ 25%] Built target _error
Scanning dependencies of target _jemalloc_pyx
[ 29%] Compiling Cython CXX source for _jemalloc...
[ 29%] Built target _jemalloc_pyx
Scanning dependencies of target _jemalloc
[ 33%] Building CXX object CMakeFiles/_jemalloc.dir/_jemalloc.cxx.o
[ 37%] Linking CXX shared module release/_jemalloc.so
[ 37%] Built target _jemalloc
Scanning dependencies of target _table_pyx
[ 41%] Compiling Cython CXX source for _table...
[ 41%] Built target _table_pyx
Scanning dependencies of target _table
[ 45%] Building CXX object CMakeFiles/_table.dir/_table.cxx.o
[ 50%] Linking CXX shared module release/_table.so
[ 50%] Built target _table
Scanning dependencies of target _config_pyx
[ 54%] Compiling Cython CXX source for _config...
[ 54%] Built target _config_pyx
Scanning dependencies of target _config
[ 58%] Building CXX object CMakeFiles/_config.dir/_config.cxx.o
[ 62%] Linking CXX shared module release/_config.so
[ 62%] Built target _config
Scanning dependencies of target _memory_pyx
[ 66%] Compiling Cython CXX source for _memory...
[ 66%] Built target _memory_pyx
Scanning dependencies of target _memory
[ 70%] Building CXX object CMakeFiles/_memory.dir/_memory.cxx.o
[ 75%] Linking CXX shared module release/_memory.so
[ 75%] Built target _memory
Scanning dependencies of target _array_pyx
[ 79%] Compiling Cython CXX source for _array...
[ 79%] Built target _array_pyx
Scanning dependencies of target _array
[ 83%] Building CXX object CMakeFiles/_array.dir/_array.cxx.o
[ 87%] Linking CXX shared module release/_array.so
[ 87%] Built target _array
Scanning dependencies of target _io_pyx
[ 91%] Compiling Cython CXX source for _io...
[ 91%] Built target _io_pyx
Scanning dependencies of target _io
[ 95%] Building CXX object CMakeFiles/_io.dir/_io.cxx.o
[100%] Linking CXX shared module release/_io.so
[100%] Built target _io
('Moving built C-extension', 'release/_array.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_array.so')
('Moving built C-extension', 'release/_config.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_config.so')
('Moving built C-extension', 'release/_error.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_error.so')
('Moving built C-extension', 'release/_io.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_io.so')
('Moving built C-extension', 'release/_jemalloc.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_jemalloc.so')
('Moving built C-extension', 'release/_memory.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_memory.so')
('Moving built C-extension', 'release/_parquet.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_parquet.so')
('Moving built C-extension', 'release/_table.so', 'to build path', 
'/home/derdo/repos/arrow/python/pyarrow/_table.so')
(pyarrow-dev) derdo@prompt:~/repos/arrow/python$ py.test pyarrow
=== test session starts ===
platform linux2 -- Python 2.7.13, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /home/derdo/repos/arrow/python, inifile:
collected 210 items

pyarrow/tests/test_array.py ...

[jira] [Created] (ARROW-960) [Python] Add source build guide for macOS + Homebrew

2017-05-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-960:
--

 Summary: [Python] Add source build guide for macOS + Homebrew
 Key: ARROW-960
 URL: https://issues.apache.org/jira/browse/ARROW-960
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney


This should include Homebrew-installed Python and installing pyarrow in a 
virtualenv. As an alternative to the current conda-based instructions



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-959) [Python] Add source build guide for CentOS 6 (with devtoolset) and CentOS 7

2017-05-06 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-959:
--

 Summary: [Python] Add source build guide for CentOS 6 (with 
devtoolset) and CentOS 7
 Key: ARROW-959
 URL: https://issues.apache.org/jira/browse/ARROW-959
 Project: Apache Arrow
  Issue Type: Improvement
  Components: Python
Reporter: Wes McKinney






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999595#comment-15999595
 ] 

Wes McKinney commented on ARROW-955:


See https://github.com/apache/arrow/pull/653. I will leave this JIRA open to 
add a build guide for Ubuntu 14.04 and/or 16.04. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999593#comment-15999593
 ] 

Wes McKinney commented on ARROW-955:


I see the problem, when we added the TOOLCHAIN variables, this does not handle 
the library search paths for the Python build. 

Can you confirm that setting

{code}
export ARROW_HOME=$CONDA_PREFIX
export PARQUET_HOME=$CONDA_PREFIX
{code}

fixes the problem? I will write a patch to fix the documentation. 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999575#comment-15999575
 ] 

Wes McKinney edited comment on ARROW-955 at 5/6/17 9:38 PM:


I was able to download the binary package 0.3.0 from conda, and check that 
read_row_group is available in the ParquetReader class:

>>> print dir(pa.ParquetReader)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', 
'__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', 
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'column_name_idx', 
'metadata', 'num_row_groups', 'open', 'read_all', 'read_column', 
'read_row_group', 'set_num_threads']
>>>

However, I do need the build from source to work.. So again I tried the 
instructions from:
https://github.com/apache/arrow/blob/master/python/doc/source/development.rst

But this failed in the penultimate step:

{code}
python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-jemalloc --inplace
running build_ext
creating build
creating build/temp.linux-x86_64-2.7
...
...
-- Searching for Python libs in
/home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config
-- Looking for python2.7
-- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so
-- Found the Parquet library: /usr/local/lib/libparquet.so
-- Found the Parquet Arrow library: /usr/local/lib
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.26")
-- Checking for module 'arrow'
--   No package 'arrow' found
CMake Error at cmake_modules/FindArrow.cmake:106 (message):
  Could not find the Arrow library.  Looked for headers in , and for libs in
Call Stack (most recent call first):
  CMakeLists.txt:234 (find_package)


-- Configuring incomplete, errors occurred!
See also 
"/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeOutput.log".
See also 
"/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeError.log".
error: command 'cmake' failed with exit status 1

-- The full output of the entire sequence of steps is below 
-
(pyarrow-dev) [prompt]:~$ mkdir repos
(pyarrow-dev) [prompt]:~$ date
Sat May  6 12:56:26 PDT 2017
(pyarrow-dev) [prompt]:~$ cd repos
(pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/arrow.git
Cloning into 'arrow'...
remote: Counting objects: 10468, done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 10468 (delta 5), reused 1 (delta 1), pack-reused 10446
Receiving objects: 100% (10468/10468), 4.52 MiB | 0 bytes/s, done.
Resolving deltas: 100% (6827/6827), done.
Checking connectivity... done.
(pyarrow-dev) [prompt]:~/repos$ git clone 
https://github.com/apache/parquet-cpp.git
Cloning into 'parquet-cpp'...
remote: Counting objects: 4022, done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 4022 (delta 3), reused 0 (delta 0), pack-reused 4010
Receiving objects: 100% (4022/4022), 1.70 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2916/2916), done.
Checking connectivity... done.
(pyarrow-dev) [prompt]:~/repos$ ls -l
total 8
drwxrwxr-x 13 derdo derdo 4096 May  6 12:56 arrow
drwxrwxr-x 13 derdo derdo 4096 May  6 12:57 parquet-cpp
(pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TYPE=release
(pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
(pyarrow-dev) [prompt]:~/repos$ echo $CONDA_PREFIX
/home/derdo/miniconda2/envs/pyarrow-dev
(pyarrow-dev) [prompt]:~/repos$ export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX
(pyarrow-dev) [prompt]:~/repos$ pwd
/home/derdo/repos
(pyarrow-dev) [prompt]:~/repos$ ls
arrow  parquet-cpp
(pyarrow-dev) [prompt]:~/repos$ ls arrow
appveyor.yml  ci   format   java NOTICE.txt  site
c_glibcpp  header   js   python
CHANGELOG.md  dev  integration  LICENSE.txt  README.md
(pyarrow-dev) [prompt]:~/repos$ ls arrow/cpp
apidoc CMakeLists.txt  docsrc
build-support  cmake_modules   README.md  thirdparty
(pyarrow-dev) [prompt]:~/repos$ mkdir !$/build
mkdir arrow/cpp/build
(pyarrow-dev) [prompt]:~/repos$ pushd !$
pushd arrow/cpp/build
~/repos/arrow/cpp/build ~/repos
(pyarrow-dev) [prompt]:~/repos/arrow/cpp/build$ cmake 
-DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DARROW_PYTHON=on -DARROW_BUILD_TESTS=OFF 
..
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI 

[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999575#comment-15999575
 ] 

Devang Shah commented on ARROW-955:
---

I was able to download the binary package 0.3.0 from conda, and check that 
read_row_group is available in the ParquetReader class:

>>> print dir(pa.ParquetReader)
['__class__', '__delattr__', '__doc__', '__format__', '__getattribute__', 
'__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', 
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'column_name_idx', 
'metadata', 'num_row_groups', 'open', 'read_all', 'read_column', 
'read_row_group', 'set_num_threads']
>>>

However, I do need the build from source to work.. So again I tried the 
instructions from:
https://github.com/apache/arrow/blob/master/python/doc/source/development.rst

But this failed in the penultimate step:

python setup.py build_ext --build-type=$ARROW_BUILD_TYPE --with-parquet 
--with-jemalloc --inplace
running build_ext
creating build
creating build/temp.linux-x86_64-2.7
...
...
-- Searching for Python libs in
/home/derdo/miniconda2/envs/pyarrow-dev/lib64;/home/derdo/miniconda2/envs/pyarrow-dev/lib;/home/derdo/miniconda2/envs/pyarrow-dev/lib/python2.7/config
-- Looking for python2.7
-- Found Python lib /home/derdo/miniconda2/envs/pyarrow-dev/lib/libpython2.7.so
-- Found the Parquet library: /usr/local/lib/libparquet.so
-- Found the Parquet Arrow library: /usr/local/lib
-- Found PkgConfig: /usr/bin/pkg-config (found version "0.26")
-- Checking for module 'arrow'
--   No package 'arrow' found
CMake Error at cmake_modules/FindArrow.cmake:106 (message):
  Could not find the Arrow library.  Looked for headers in , and for libs in
Call Stack (most recent call first):
  CMakeLists.txt:234 (find_package)


-- Configuring incomplete, errors occurred!
See also 
"/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeOutput.log".
See also 
"/home/derdo/repos/arrow/python/build/temp.linux-x86_64-2.7/CMakeFiles/CMakeError.log".
error: command 'cmake' failed with exit status 1

-- The full output of the entire sequence of steps is below 
-
(pyarrow-dev) [prompt]:~$ mkdir repos
(pyarrow-dev) [prompt]:~$ date
Sat May  6 12:56:26 PDT 2017
(pyarrow-dev) [prompt]:~$ cd repos
(pyarrow-dev) [prompt]:~/repos$ git clone https://github.com/apache/arrow.git
Cloning into 'arrow'...
remote: Counting objects: 10468, done.
remote: Compressing objects: 100% (21/21), done.
remote: Total 10468 (delta 5), reused 1 (delta 1), pack-reused 10446
Receiving objects: 100% (10468/10468), 4.52 MiB | 0 bytes/s, done.
Resolving deltas: 100% (6827/6827), done.
Checking connectivity... done.
(pyarrow-dev) [prompt]:~/repos$ git clone 
https://github.com/apache/parquet-cpp.git
Cloning into 'parquet-cpp'...
remote: Counting objects: 4022, done.
remote: Compressing objects: 100% (11/11), done.
remote: Total 4022 (delta 3), reused 0 (delta 0), pack-reused 4010
Receiving objects: 100% (4022/4022), 1.70 MiB | 0 bytes/s, done.
Resolving deltas: 100% (2916/2916), done.
Checking connectivity... done.
(pyarrow-dev) [prompt]:~/repos$ ls -l
total 8
drwxrwxr-x 13 derdo derdo 4096 May  6 12:56 arrow
drwxrwxr-x 13 derdo derdo 4096 May  6 12:57 parquet-cpp
(pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TYPE=release
(pyarrow-dev) [prompt]:~/repos$ export ARROW_BUILD_TOOLCHAIN=$CONDA_PREFIX
(pyarrow-dev) [prompt]:~/repos$ echo $CONDA_PREFIX
/home/derdo/miniconda2/envs/pyarrow-dev
(pyarrow-dev) [prompt]:~/repos$ export PARQUET_BUILD_TOOLCHAIN=$CONDA_PREFIX
(pyarrow-dev) [prompt]:~/repos$ pwd
/home/derdo/repos
(pyarrow-dev) [prompt]:~/repos$ ls
arrow  parquet-cpp
(pyarrow-dev) [prompt]:~/repos$ ls arrow
appveyor.yml  ci   format   java NOTICE.txt  site
c_glibcpp  header   js   python
CHANGELOG.md  dev  integration  LICENSE.txt  README.md
(pyarrow-dev) [prompt]:~/repos$ ls arrow/cpp
apidoc CMakeLists.txt  docsrc
build-support  cmake_modules   README.md  thirdparty
(pyarrow-dev) [prompt]:~/repos$ mkdir !$/build
mkdir arrow/cpp/build
(pyarrow-dev) [prompt]:~/repos$ pushd !$
pushd arrow/cpp/build
~/repos/arrow/cpp/build ~/repos
(pyarrow-dev) [prompt]:~/repos/arrow/cpp/build$ cmake 
-DCMAKE_BUILD_TYPE=$ARROW_BUILD_TYPE
-DCMAKE_INSTALL_PREFIX=$CONDA_PREFIX -DARROW_PYTHON=on -DARROW_BUILD_TESTS=OFF 
..
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- 

[jira] [Commented] (ARROW-813) [Python] setup.py sdist must also bundle dependent cmake modules

2017-05-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999568#comment-15999568
 ] 

Uwe L. Korn commented on ARROW-813:
---

The simplest version here could be to simply symlink the necessary module in 
{{python/cmake_module}}. Then the {{sdist}} command should include a copy (and 
not a symlink to nowhere) to the modules.

> [Python] setup.py sdist must also bundle dependent cmake modules
> 
>
> Key: ARROW-813
> URL: https://issues.apache.org/jira/browse/ARROW-813
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Affects Versions: 0.2.0
>Reporter: Wes McKinney
>
> The pyarrow tarball from sdist cannot be built currently because it depends 
> on files from the C++ directory



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-909) libjemalloc.so.2: cannot open shared object file:

2017-05-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999551#comment-15999551
 ] 

Uwe L. Korn commented on ARROW-909:
---

[~abdulrahman004] Did you have any special compilation/linking options set? 
Normally {{pyarrow/_config.so}} should not link to {{libjemalloc.so.2}}. If you 
are able to run {{lddtree pyarrow/_config.so}}, it would really help me to 
understand where the linkage is coming from.

I made PR https://github.com/apache/arrow/pull/651 to cover for the initial 
problem that when building jemalloc as an external project it should be 
statically linked as the shared library is not installed on {{make install}}.

> libjemalloc.so.2: cannot open shared object file: 
> --
>
> Key: ARROW-909
> URL: https://issues.apache.org/jira/browse/ARROW-909
> Project: Apache Arrow
>  Issue Type: Bug
> Environment: linux centos
>Reporter: Abdul Rahman
>  Labels: pyarrow
>
> >>> import pyarrow
> Traceback (most recent call last):
>   File "", line 1, in 
>   File 
> "/home/default/src/venv/lib/python2.7/site-packages/pyarrow-0.2.1.dev244+g14bec24-py2.7-linux-x86_64.egg/pyarrow/__init__.py",
>  line 28, in 
> import pyarrow._config
> ImportError: libjemalloc.so.2: cannot open shared object file: No such file 
> or directory
> $LD_LIBRARY_PATH has libarrow_jemalloc.a along with other libraries including 
> libarrow.so,  libparquet.so, libparquet_arrow.so. Pyarrow was built using 
> with-jemalloc and parquet-cpp was cmake-d with 
> -DPARQUET_ARROW=ON  
> Also, noticed that arrow/python documentation has been cleaned up with the 
> installation instructions having the coda approach only .Is this the only 
> supported way going forward ?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Assigned] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney reassigned ARROW-899:
--

Assignee: Wes McKinney

> [Docs] Add CHANGELOG for 0.3.0
> --
>
> Key: ARROW-899
> URL: https://issues.apache.org/jira/browse/ARROW-899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>
> See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-899:
---
Fix Version/s: (was: 0.3.0)
   0.4.0

> [Docs] Add CHANGELOG for 0.3.0
> --
>
> Key: ARROW-899
> URL: https://issues.apache.org/jira/browse/ARROW-899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
> Fix For: 0.4.0
>
>
> See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-899) [Docs] Add CHANGELOG for 0.3.0

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-899:
---
Summary: [Docs] Add CHANGELOG for 0.3.0  (was: [Docs] Add CHANGELOG)

> [Docs] Add CHANGELOG for 0.3.0
> --
>
> Key: ARROW-899
> URL: https://issues.apache.org/jira/browse/ARROW-899
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Wes McKinney
> Fix For: 0.4.0
>
>
> See e.g. https://github.com/apache/aurora/blob/master/CHANGELOG



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-670) Arrow 0.3 release

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-670.

Resolution: Fixed

> Arrow 0.3 release
> -
>
> Key: ARROW-670
> URL: https://issues.apache.org/jira/browse/ARROW-670
> Project: Apache Arrow
>  Issue Type: Task
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.3.0
>
>
> As we near the next development milestone, please link issues that block the 
> release so we can keep track of what needs to be done



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-532) [Python] Expand pyarrow.parquet documentation for 0.3 release

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-532:
---
Fix Version/s: (was: 0.3.0)
   0.4.0

> [Python] Expand pyarrow.parquet documentation for 0.3 release
> -
>
> Key: ARROW-532
> URL: https://issues.apache.org/jira/browse/ARROW-532
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Reporter: Wes McKinney
> Fix For: 0.4.0
>
>
> Follow up to ARROW-531



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-446) [Python] Document NativeFile interfaces, HDFS client in Sphinx

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-446:
---
Fix Version/s: (was: 0.3.0)
   0.4.0

> [Python] Document NativeFile interfaces, HDFS client in Sphinx
> --
>
> Key: ARROW-446
> URL: https://issues.apache.org/jira/browse/ARROW-446
> Project: Apache Arrow
>  Issue Type: New Feature
>  Components: Python
>Reporter: Wes McKinney
>Assignee: Wes McKinney
> Fix For: 0.4.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-944) Python: Compat broken for pandas==0.18.1

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-944?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-944:
---
Fix Version/s: 0.4.0

> Python: Compat broken for pandas==0.18.1
> 
>
> Key: ARROW-944
> URL: https://issues.apache.org/jira/browse/ARROW-944
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: Python
>Reporter: Uwe L. Korn
>Assignee: Jeff Reback
> Fix For: 0.4.0
>
>
> The following failed for me with {{pandas==0.18.1}}:
> {code}
> In [1]: from pandas.core.dtypes import DatetimeTZDtype
> ---
> ImportError   Traceback (most recent call last)
>  in ()
> > 1 from pandas.core.dtypes import DatetimeTZDtype
> ImportError: No module named dtypes
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-956) remove pandas pre-0.20.0 compat

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-956.

Resolution: Fixed

Issue resolved by pull request 649
[https://github.com/apache/arrow/pull/649]

> remove pandas pre-0.20.0 compat
> ---
>
> Key: ARROW-956
> URL: https://issues.apache.org/jira/browse/ARROW-956
> Project: Apache Arrow
>  Issue Type: Task
>  Components: Python
>Reporter: Jeff Reback
>Assignee: Jeff Reback
>Priority: Trivial
>
> xref to ARROW-879



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-856) CmakeError by Unknown compiler.

2017-05-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999535#comment-15999535
 ] 

Uwe L. Korn commented on ARROW-856:
---

PR: https://github.com/apache/arrow/pull/650

> CmakeError by Unknown compiler. 
> 
>
> Key: ARROW-856
> URL: https://issues.apache.org/jira/browse/ARROW-856
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++, Python
>Reporter: YJ
>Assignee: Uwe L. Korn
>
> From :https://github.com/ray-project/ray/issues/468
> [root@SZV1000268092 python]# LANG=C gcc -v
> Using built-in specs.
> COLLECT_GCC=gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.8.5/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man 
> --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla 
> --enable-bootstrap --enable-shared --enable-threads=posix 
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit 
> --disable-libunwind-exceptions --enable-gnu-unique-object 
> --enable-linker-build-id --with-linker-hash-style=gnu 
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto --enable-plugin 
> --enable-initfini-array --disable-libgcj 
> --with-isl=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/isl-install
>  
> --with-cloog=/builddir/build/BUILD/gcc-4.8.5-20150702/obj-x86_64-redhat-linux/cloog-install
>  --enable-gnu-indirect-function --with-tune=generic --with-arch_32=x86-64 
> --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC)
> Result:
> INFO GNU
> CMake Error at cmake_modules/CompilerInfo.cmake:62 (message):
> Unknown compiler. Version info is just the above.
> Error
> /usr/bin/c++-g -O3 -march=native -mtune=native -DCXX_SUPPORTS_ALTIVEC   
> -maltivec -o CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o -c 
> /home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp/src.cxx
> c++: error: unrecognized command line option '-maltivec'
> make[1]: *** [CMakeFiles/cmTryCompileExec1115247767.dir/src.cxx.o] Error 1
> make[1]: Leaving directory 
> `/home/dl/yangjie/ray/src/numbuf/thirdparty/arrow/cpp/build/CMakeFiles/CMakeTmp'
> make: *** [cmTryCompileExec1115247767/fast] Error 2



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-947) [Python] Improve execution time of manylinux1 build

2017-05-06 Thread Uwe L. Korn (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999474#comment-15999474
 ] 

Uwe L. Korn commented on ARROW-947:
---

PR: https://github.com/apache/arrow/pull/648 (down to 14min)

> [Python] Improve execution time of manylinux1 build
> ---
>
> Key: ARROW-947
> URL: https://issues.apache.org/jira/browse/ARROW-947
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
>Affects Versions: 0.3.0
>Reporter: Wes McKinney
>Assignee: Uwe L. Korn
>
> Perhaps we could have the same testing benefits by limiting the matrix of 
> builds? Pulling the Docker image takes about 90 seconds, but the build itself 
> takes 25 minutes or more. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Comment Edited] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999469#comment-15999469
 ] 

Wes McKinney edited comment on ARROW-955 at 5/6/17 3:51 PM:


Yes, 0.3.0 includes that function. 

Since I don't have access to your environment, and you aren't pasting a 
reproducible set of steps or console output, it's very difficult for me to 
debug. 

I strongly recommend starting from a basic Miniconda installation (see 
https://conda.io/miniconda.html, do not use Ubuntu's system Python) and 
installing from conda-forge. If you can't get that working please report back 
and provide complete details (console output). 


was (Author: wesmckinn):
Yes, 0.3.0 includes that function. 

Since I don't have access to your environment, and you aren't pasting a 
reproducible set of steps or console output, it's very difficult for me to 
debug. 

I strongly recommend starting from a basic Miniconda installation (not Ubuntu's 
system Python) and installing from conda-forge. If you can't get that working 
please report back and provide complete details (console output). 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999469#comment-15999469
 ] 

Wes McKinney commented on ARROW-955:


Yes, 0.3.0 includes that function. 

Since I don't have access to your environment, and you aren't pasting a 
reproducible set of steps or console output, it's very difficult for me to 
debug. 

I strongly recommend starting from a basic Miniconda installation (not Ubuntu's 
system Python) and installing from conda-forge. If you can't get that working 
please report back and provide complete details (console output). 

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Devang Shah (JIRA)

[ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15999466#comment-15999466
 ] 

Devang Shah commented on ARROW-955:
---

Does the 0.3.0 release export "read_row_group()" ? That's what I am interested 
in.

Also, I tried conda instructions from:

https://github.com/apache/arrow/blob/master/python/doc/source/development.rst

but I couldn't get this to even build (the setup.py last step wasn't working) - 
so I switched to what seemed to be simpler instructions at:

https://arrow.apache.org/docs/python/install.html

which I was able to build as stated above, but then failed to run.

This is the first time I am using conda, and so very unfamiliar with it. I'd 
really appreciate any help to get me to the next step in running 
pyarrow.parquet on my current non-conda build from source. It may be a simple 
matter of configuration which is eluding me...

Thanks a lot for your response!




> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-957) [Doc] Add HDFS and Windows documents to doxygen output

2017-05-06 Thread Uwe L. Korn (JIRA)
Uwe L. Korn created ARROW-957:
-

 Summary: [Doc] Add HDFS and Windows documents to doxygen output
 Key: ARROW-957
 URL: https://issues.apache.org/jira/browse/ARROW-957
 Project: Apache Arrow
  Issue Type: Task
  Components: C++
Reporter: Uwe L. Korn
Assignee: Uwe L. Korn


Currently these documents are not rendered on the website. I would move them to 
the {{apidoc/}} folder and link to them in the main doxygen page. Probably this 
is the point where we also would move {{apidoc/}} back to {{docs/}}.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-939) Fix division by zero for zero-dimensional Tensors

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney resolved ARROW-939.

Resolution: Fixed

Issue resolved by pull request 634
[https://github.com/apache/arrow/pull/634]

> Fix division by zero for zero-dimensional Tensors
> -
>
> Key: ARROW-939
> URL: https://issues.apache.org/jira/browse/ARROW-939
> Project: Apache Arrow
>  Issue Type: Bug
>  Components: C++
>Affects Versions: 0.3.0
>Reporter: Philipp Moritz
>Priority: Minor
>
> see https://github.com/ray-project/ray/issues/500
> The division "remaining /= dimsize" in cpp/src/arrow/tensor.cc:45 raises a 
> division by zero exception if dimsize = 0.
> This was found by https://github.com/stephanie-wang.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-955) [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-955:
---
Summary: [Docs] Guide for building Python from source on Ubuntu 14.04 LTS 
without conda  (was: ImportError: No module named _config)

> [Docs] Guide for building Python from source on Ubuntu 14.04 LTS without conda
> --
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-956) remove pandas pre-0.20.0 compat

2017-05-06 Thread Jeff Reback (JIRA)
Jeff Reback created ARROW-956:
-

 Summary: remove pandas pre-0.20.0 compat
 Key: ARROW-956
 URL: https://issues.apache.org/jira/browse/ARROW-956
 Project: Apache Arrow
  Issue Type: Task
  Components: Python
Reporter: Jeff Reback
Assignee: Jeff Reback
Priority: Trivial


xref to ARROW-879



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Updated] (ARROW-955) ImportError: No module named _config

2017-05-06 Thread Wes McKinney (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wes McKinney updated ARROW-955:
---
Issue Type: Improvement  (was: Bug)

> ImportError: No module named _config
> 
>
> Key: ARROW-955
> URL: https://issues.apache.org/jira/browse/ARROW-955
> Project: Apache Arrow
>  Issue Type: Improvement
>  Components: Python
> Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
> Python 2.7.6
>Reporter: Devang Shah
>
> I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
> new read_row_group() interface and in general, have access to the latest 
> versions. I ran into many issues during the build but was ultimately 
> successful (notes below). However, I am not able to import pyarrow.parquet 
> due to the following issue:
> >>import pyarrow.parquet
> Traceback (most recent call last):
> File "", line 1, in 
> File "pyarrow/init.py", line 28, in 
> import pyarrow._config
> ImportError: No module named _config
> This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
> where also I posted this...but I think this forum is more direct and 
> appropriate - so re-posting here.
> I used instructions at https://arrow.apache.org/docs/python/install.html to 
> build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
> (I view them as possibly bugs in the instructions):
> arrow/cpp build:
> export ARROW_HOME=$HOME/local
> I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake 
> command (besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)
> parquet-cpp build:
> export ARROW_HOME=$HOME/local
> cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
> -DPARQUET_ARROW=ON .
> make
> sudo make install > this installs parquet libs in the std systems 
> location (/usr/local/lib) so that the pyarrow build (see below) can find the 
> parquet libs
> pyarrow build:
> export ARROW_HOME=$HOME/local (not a deviation; just repeating here)
> export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest
> sudo python setup.py build_ext --with-parquet --with-jemalloc 
> --build-type=release install
> sudo python setup.py install
> (sudo is needed to install in /usr/local/lib/python2.7/dist-packages )
> These are the steps and modifications to the instructions needed for me to 
> build the pyarrow.parquet package. However, when I now try to import the 
> package I get the error specified above.
> Maybe I did something wrong in my steps which I kind of put together by 
> searching for these issues...but really can't tell what. It took me almost a 
> whole day to get to the point where I can build pyarrow and parquet, and now 
> I can't use what I built.
> Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (ARROW-955) ImportError: No module named _config

2017-05-06 Thread Devang Shah (JIRA)
Devang Shah created ARROW-955:
-

 Summary: ImportError: No module named _config
 Key: ARROW-955
 URL: https://issues.apache.org/jira/browse/ARROW-955
 Project: Apache Arrow
  Issue Type: Bug
  Components: Python
 Environment: Ubuntu - 3.19.0-80-generic #88~14.04.1-Ubuntu
Python 2.7.6
Reporter: Devang Shah
Priority: Blocker



I built pyarrow, arrow, and parquet-cpp from source - so that I could use the 
new read_row_group() interface and in general, have access to the latest 
versions. I ran into many issues during the build but was ultimately successful 
(notes below). However, I am not able to import pyarrow.parquet due to the 
following issue:

>>import pyarrow.parquet
Traceback (most recent call last):
File "", line 1, in 
File "pyarrow/init.py", line 28, in 
import pyarrow._config
ImportError: No module named _config

This is similar to an issue reported in github/conda-forge/pyarrow-feedstock, 
where also I posted this...but I think this forum is more direct and 
appropriate - so re-posting here.

I used instructions at https://arrow.apache.org/docs/python/install.html to 
build arrow/cpp, parquet-cpp, and then pyarrow, with the following deviations 
(I view them as possibly bugs in the instructions):

arrow/cpp build:
export ARROW_HOME=$HOME/local
I had to specify -DARROW_PYTHON=on and -DPARQUET_ARROW=ON to the cmake command 
(besides the -DCMAKE_INSTALL_PREFIX=$ARROW_HOME)

parquet-cpp build:

export ARROW_HOME=$HOME/local

cmake -DARROW_HOME=$HOME/local -DPARQUET_ARROW_LINKAGE=static 
-DPARQUET_ARROW=ON .
make

sudo make install > this installs parquet libs in the std systems location 
(/usr/local/lib) so that the pyarrow build (see below) can find the parquet libs

pyarrow build:

export ARROW_HOME=$HOME/local (not a deviation; just repeating here)

export LD_LIBRARY_PATH=$HOME/local/lib:$HOME/parquet4/parquet-cpp/build/latest

sudo python setup.py build_ext --with-parquet --with-jemalloc 
--build-type=release install

sudo python setup.py install

(sudo is needed to install in /usr/local/lib/python2.7/dist-packages )

These are the steps and modifications to the instructions needed for me to 
build the pyarrow.parquet package. However, when I now try to import the 
package I get the error specified above.

Maybe I did something wrong in my steps which I kind of put together by 
searching for these issues...but really can't tell what. It took me almost a 
whole day to get to the point where I can build pyarrow and parquet, and now I 
can't use what I built.

Any comments, help appreciated! Thanks in advance.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Resolved] (ARROW-929) Move KEYS file to SVN, remove from git

2017-05-06 Thread Uwe L. Korn (JIRA)

 [ 
https://issues.apache.org/jira/browse/ARROW-929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe L. Korn resolved ARROW-929.
---
Resolution: Fixed

Issue resolved by pull request 646
[https://github.com/apache/arrow/pull/646]

> Move KEYS file to SVN, remove from git
> --
>
> Key: ARROW-929
> URL: https://issues.apache.org/jira/browse/ARROW-929
> Project: Apache Arrow
>  Issue Type: Improvement
>Reporter: Wes McKinney
>Assignee: Wes McKinney
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)