Yes I am following the instructions on arrow.apache.org under build from source python development. So I am pretty sure that I ran the make command as indicated in the tutorial meaning I ran with Parquet on but I will do a make clean and go back and run it again and let you know if I get a different result.
Chris On Fri, Jan 28, 2022, 00:35 Weston Pace <[email protected]> wrote: > How did you build Arrow C++? That error most likely means that the > C++ parquet module was not turned on when the C++ was built. For > example, in [1] an example command to build C++ is: > > --- > > mkdir arrow/cpp/build > pushd arrow/cpp/build > > cmake -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ > -DCMAKE_INSTALL_LIBDIR=lib \ > -DARROW_WITH_BZ2=ON \ > -DARROW_WITH_ZLIB=ON \ > -DARROW_WITH_ZSTD=ON \ > -DARROW_WITH_LZ4=ON \ > -DARROW_WITH_SNAPPY=ON \ > -DARROW_WITH_BROTLI=ON \ > -DARROW_PARQUET=ON \ > -DARROW_PYTHON=ON \ > -DARROW_BUILD_TESTS=ON \ > .. > make -j4 > make install > popd > > --- > > Specifically the -DARROW_PARQUET=ON part tells the build to build the > parquet module. The Arrow C++ implementation is broken up into a > bunch of small modules. When we build for pip/conda we normally turn > on a "stock set" of modules so users that get pyarrow from those > sources don't often have to worry about this detail. > > Another option is to disable parquet support when you build python. > Similar to C++, the python module is also broken up into smaller > submodules. I'm guessing you were following our guides and you ran > `export PYARROW_WITH_PARQUET=1` which tells the python build to build > the parquet module. You could set that to 0 and the python build > would not build the parquet module. However, given your original plan > was to play with datasets you probably want to build both the parquet > module and the datasets module. I'd recommend you include > -DARROW_PYTHON=ON, -DARROW_PARQUET=ON, and -DARROW_DATASET=ON in your > cmake build. > > When building python you can either run both `export > PYARROW_WITH_PARQUET=1` and `export PYARROW_WITH_DATASET=1` or you can > run the following build command: > > python setup.py build_ext --inplace --with-parquet --with-dataset > > The `--with-dataset` flag achieves the same thing as `export > PYARROW_WITH_DATASET=1`. > > [1] https://arrow.apache.org/docs/developers/python.html#build-and-test > > On Thu, Jan 27, 2022 at 6:17 PM Chris Nyland <[email protected]> wrote: > > > > > > Good guess yes I am working on my beater laptop which is an old thinkpad > x200. So I started running down the compile instructions and I was going > along pretty good till I got to the point to build the Python extensions. > When I run those commands I get a message basically that it can't find > parquet. Full output is below. > > > > Any ideas? I did look in the CMakeOutput.log but didn't see anything > that made obvious sense to me. > > > > The result of running > > > > python setup.py build_ext --inplace > > > > running build_ext > > -- Running cmake for pyarrow > > cmake -DPYTHON_EXECUTABLE=~/build_arrow/pyarrow/bin/python > -DPython3_EXECUTABLE=~/build_arrow/pyarrow/bin/python "" > -DPYARROW_BUILD_CUDA=off -DPYARROW_BUILD_FLIGHT=off > -DPYARROW_BUILD_GANDIVA=off -DPYARROW_BUILD_DATASET=off > -DPYARROW_BUILD_ORC=off -DPYARROW_BUILD_PARQUET=on > -DPYARROW_BUILD_PLASMA=off -DPYARROW_BUILD_S3=off -DPYARROW_BUILD_HDFS=off > -DPYARROW_USE_TENSORFLOW=off -DPYARROW_BUNDLE_ARROW_CPP=off > -DPYARROW_BUNDLE_BOOST=off -DPYARROW_GENERATE_COVERAGE=off > -DPYARROW_BOOST_USE_SHARED=on -DPYARROW_PARQUET_USE_SHARED=on > -DCMAKE_BUILD_TYPE=release ~/build_arrow/arrow/python > > -- System processor: x86_64 > > -- Arrow build warning level: PRODUCTION > > Using ld linker > > Configured for RELEASE build (set with cmake > -DCMAKE_BUILD_TYPE={release,debug,...}) > > -- Build Type: RELEASE > > -- Generator: Unix Makefiles > > -- Build output directory: > ~/build_arrow/arrow/python/build/temp.linux-x86_64-3.7/release > > -- Searching for Python libs in > ~/build_arrow/pyarrow/lib64;~/build_arrow/pyarrow/lib;/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu > > -- Looking for python3.7m > > -- Found Python lib /usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/ > libpython3.7m.so > > -- Searching for Python libs in > ~/build_arrow/pyarrow/lib64;~/build_arrow/pyarrow/lib;/usr/lib/python3.7/config-3.7m-x86_64-linux-gnu > > -- Looking for python3.7m > > -- Found Python lib /usr/lib/python3.7/config-3.7m-x86_64-linux-gnu/ > libpython3.7m.so > > -- Arrow version: 7.0.0 (HOME: ~/build_arrow/dist) > > -- Arrow SO and ABI version: 700 > > -- Arrow full SO version: 700.0.0 > > -- Found the Arrow core shared library: > ~/build_arrow/dist/lib/libarrow.so > > -- Found the Arrow core import library: > ~/build_arrow/dist/lib/libarrow.so > > -- Found the Arrow core static library: ~/build_arrow/dist/lib/libarrow.a > > -- Found the Arrow Python by HOME: ~/build_arrow/dist > > -- Found the Arrow Python shared library: > ~/build_arrow/dist/lib/libarrow_python.so > > -- Found the Arrow Python import library: > ~/build_arrow/dist/lib/libarrow_python.so > > -- Found the Arrow Python static library: > ~/build_arrow/dist/lib/libarrow_python.a > > CMake Error at > /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:137 > (message): > > Could NOT find Parquet (missing: PARQUET_INCLUDE_DIR PARQUET_LIB_DIR > > PARQUET_SO_VERSION) > > Call Stack (most recent call first): > > /usr/share/cmake-3.13/Modules/FindPackageHandleStandardArgs.cmake:378 > (_FPHSA_FAILURE_MESSAGE) > > cmake_modules/FindParquet.cmake:115 (find_package_handle_standard_args) > > CMakeLists.txt:447 (find_package) > > > > > > -- Configuring incomplete, errors occurred! > > See also > "~/build_arrow/arrow/python/build/temp.linux-x86_64-3.7/CMakeFiles/CMakeOutput.log". > > error: command '/usr/bin/cmake' failed with exit code 1 > > > > On Tue, Jan 25, 2022 at 12:27 AM Weston Pace <[email protected]> > wrote: > >> > >> Your problem is probably old hardware, specifically an older CPU. Pip > builds rely on popcnt (which I think is SSE4.1?) > >> > >> I'm pretty sure you are right that you can compile from source and be > ok. It's a performance / portability tradeoff that has to be made when > packaging prebuilt binaries. > >> > >> On Mon, Jan 24, 2022, 6:18 PM Chris Nyland <[email protected]> wrote: > >>> > >>> Hello, > >>> > >>> I was just taking a look at pyarrow in my off hours. I was trying to > write a partitioned data set based on the birthdays example in the pyarrow > cook book. However when I run the script I get no data written and a > "Illegal Instruction" message prints to screen, no exception is raised. I > installed the pyarrow manylinux x86_64 version 6.0.1 wheel via pip for > Python 3.7 using a virtual environment. I suspect that if I build pyarrow > myself it would work, it doesn't look too terribly difficult, but it is > still kind of a drag since I was looking to make some quick progress on an > off hours project. > >>> > >>> If anyone has any ideas on what else it would be I would like to try > it before building the library myself. Also is this a pretty typical issue > to run into? At work I primarily do Python on Windows and really haven't > had any build issues there since the Python 2.7 days. > >>> > >>> Thanks > >>> > >>> Chris >
