Yes I am following the instructions on arrow.apache.org under build from
source python development. So I am pretty sure that I ran the make command
as indicated in the tutorial meaning I ran with Parquet on but I will do a
make clean and go back and run it again and let you know if I get a
different result.

Chris

On Fri, Jan 28, 2022, 00:35 Weston Pace <[email protected]> wrote:

> How did you build Arrow C++?  That error most likely means that the
> C++ parquet module was not turned on when the C++ was built.  For
> example, in [1] an example command to build C++ is:
>
> ---
>
> mkdir arrow/cpp/build
> pushd arrow/cpp/build
>
> cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \
>       -DCMAKE_INSTALL_LIBDIR=lib \
>       -DARROW_WITH_BZ2=ON \
>       -DARROW_WITH_ZLIB=ON \
>       -DARROW_WITH_ZSTD=ON \
>       -DARROW_WITH_LZ4=ON \
>       -DARROW_WITH_SNAPPY=ON \
>       -DARROW_WITH_BROTLI=ON \
>       -DARROW_PARQUET=ON \
>       -DARROW_PYTHON=ON \
>       -DARROW_BUILD_TESTS=ON \
>       ..
> make -j4
> make install
> popd
>
> ---
>
> Specifically the -DARROW_PARQUET=ON part tells the build to build the
> parquet module.  The Arrow C++ implementation is broken up into a
> bunch of small modules.  When we build for pip/conda we normally turn
> on a "stock set" of modules so users that get pyarrow from those
> sources don't often have to worry about this detail.
>
> Another option is to disable parquet support when you build python.
> Similar to C++, the python module is also broken up into smaller
> submodules.  I'm guessing you were following our guides and you ran
> `export PYARROW_WITH_PARQUET=1` which tells the python build to build
> the parquet module.  You could set that to 0 and the python build
> would not build the parquet module.  However, given your original plan
> was to play with datasets you probably want to build both the parquet
> module and the datasets module.  I'd recommend you include
> -DARROW_PYTHON=ON, -DARROW_PARQUET=ON, and -DARROW_DATASET=ON in your
> cmake build.
>
> When building python you can either run both `export
> PYARROW_WITH_PARQUET=1` and `export PYARROW_WITH_DATASET=1` or you can
> run the following build command:
>
>     python setup.py build_ext --inplace --with-parquet --with-dataset
>
> The `--with-dataset` flag achieves the same thing as `export
> PYARROW_WITH_DATASET=1`.
>
> [1] https://arrow.apache.org/docs/developers/python.html#build-and-test
>
> On Thu, Jan 27, 2022 at 6:17 PM Chris Nyland <[email protected]> wrote:
> >
> >
> > Good guess yes I am working on my beater laptop which is an old thinkpad
> x200. So I started running down the compile instructions and I was going
> along pretty good till I got to the point to build the Python extensions.
> When I run those commands I get a message basically that it can't find
> parquet. Full output is below.
> >
> > Any ideas? I did look in the CMakeOutput.log but didn't see anything
> that made obvious sense to me.
> >
> > The result of running
> >
> > python setup.py build_ext --inplace
> >
> > running build_ext
> > -- Running cmake for pyarrow
> > cmake -DPYTHON_EXECUTABLE=~/build_arrow/pyarrow/bin/python
> -DPython3_EXECUTABLE=~/build_arrow/pyarrow/bin/python ""
> -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_FLIGHT=off
> -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off
> -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on
> -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off
> -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off
> -DPYARROW_BUNDLE_BOOST=off -DPYARROW_GENERATE_COVERAGE=off
> -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on
> -DCMAKE_BUILD_TYPE=release ~/build_arrow/arrow/python
> > -- System processor: x86_64
> > -- Arrow build warning level: PRODUCTION
> > Using ld linker
> > Configured for RELEASE build (set with cmake
> -DCMAKE_BUILD_TYPE={release,debug,...})
> > -- Build Type: RELEASE
> > -- Generator: Unix Makefiles
> > -- Build output directory:
> ~/build_arrow/arrow/python/build/temp.linux-x86_64-3.7/release
> > -- Searching for Python libs in
> ~/build_arrow/pyarrow/lib64;~/build_arrow/pyarrow/lib;/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu
> > -- Looking for python3.7m
> > -- Found Python lib /usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/
> libpython3.7m.so
> > -- Searching for Python libs in
> ~/build_arrow/pyarrow/lib64;~/build_arrow/pyarrow/lib;/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu
> > -- Looking for python3.7m
> > -- Found Python lib /usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/
> libpython3.7m.so
> > -- Arrow version: 7.0.0 (HOME: ~/build_arrow/dist)
> > -- Arrow SO and ABI version: 700
> > -- Arrow full SO version: 700.0.0
> > -- Found the Arrow core shared library:
> ~/build_arrow/dist/lib/libarrow.so
> > -- Found the Arrow core import library:
> ~/build_arrow/dist/lib/libarrow.so
> > -- Found the Arrow core static library: ~/build_arrow/dist/lib/libarrow.a
> > -- Found the Arrow Python by HOME: ~/build_arrow/dist
> > -- Found the Arrow Python shared library:
> ~/build_arrow/dist/lib/libarrow_python.so
> > -- Found the Arrow Python import library:
> ~/build_arrow/dist/lib/libarrow_python.so
> > -- Found the Arrow Python static library:
> ~/build_arrow/dist/lib/libarrow_python.a
> > CMake Error at
> /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:137
> (message):
> >   Could NOT find Parquet (missing: PARQUET_INCLUDE_DIR PARQUET_LIB_DIR
> >   PARQUET_SO_VERSION)
> > Call Stack (most recent call first):
> >   /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:378
> (_FPHSA_FAILURE_MESSAGE)
> >   cmake_modules/FindParquet.cmake:115 (find_package_handle_standard_args)
> >   CMakeLists.txt:447 (find_package)
> >
> >
> > -- Configuring incomplete, errors occurred!
> > See also
> "~/build_arrow/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log".
> > error: command '/usr/bin/cmake' failed with exit code 1
> >
> > On Tue, Jan 25, 2022 at 12:27 AM Weston Pace <[email protected]>
> wrote:
> >>
> >> Your problem is probably old hardware, specifically an older CPU.  Pip
> builds rely on popcnt (which I think is SSE4.1?)
> >>
> >> I'm pretty sure you are right that you can compile from source and be
> ok.  It's a performance / portability tradeoff that has to be made when
> packaging prebuilt binaries.
> >>
> >> On Mon, Jan 24, 2022, 6:18 PM Chris Nyland <[email protected]> wrote:
> >>>
> >>> Hello,
> >>>
> >>> I was just taking a look at pyarrow in my off hours. I was trying to
> write a partitioned data set based on the birthdays example in the pyarrow
> cook book. However when I run the script I get no data written and a
> "Illegal Instruction" message prints to screen, no exception is raised. I
> installed the pyarrow manylinux x86_64 version 6.0.1 wheel via pip for
> Python 3.7 using a virtual environment. I suspect that if I build pyarrow
> myself it would work, it doesn't look too terribly difficult, but it is
> still kind of a drag since I was looking to make some quick progress on an
> off hours project.
> >>>
> >>> If anyone has any ideas on what else it would be I would like to try
> it before building the library myself. Also is this a pretty typical issue
> to run into? At work I primarily do Python on Windows and really haven't
> had any build issues there since the Python 2.7 days.
> >>>
> >>> Thanks
> >>>
> >>> Chris
>

Reply via email to