Re: Helping new contributors get started [was Re: Renaming master branch, removing blacklist/whitelist]

2020-06-20 Thread Suvayu Ali
Hi Wes, others, Thank you for taking the time to draft a long response. On Sat, Jun 20, 2020 at 3:57 PM Wes McKinney wrote: > > From a purely factual view, the project is successfully attracting and > supporting contributors. Over 500 different people have contributed to > the project (more

Re: Renaming master branch, removing blacklist/whitelist

2020-06-19 Thread Suvayu Ali
Hi all, (sorry if this is a duplicate post, I always have trouble posting to this list) On Fri, Jun 19, 2020 at 5:54 PM Todd Hendricks wrote: > > I'm a black data scientist. For whatever it's worth, I have never taken > offense to the term "Master" branch, as I have never interpreted it to have

[jira] [Created] (ARROW-6577) Dependency conflict in conda packages

2019-09-17 Thread Suvayu Ali (Jira)
Suvayu Ali created ARROW-6577: - Summary: Dependency conflict in conda packages Key: ARROW-6577 URL: https://issues.apache.org/jira/browse/ARROW-6577 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS] Passing the torch on Python wheel (binary) maintenance

2019-07-15 Thread Suvayu Ali
Hi Wes, others, A few thoughts from a user. Firstly, I completely understand your frustration. I myself have delved into a bit of packaging for many scientific computing packages, like ROOT from CERN, although not at the scale of users that you face here. AIU, wheels are a Python-first

[jira] [Created] (ARROW-5871) Can't import pyarrow 0.14.0 due to mismatching libcrypt

2019-07-07 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-5871: - Summary: Can't import pyarrow 0.14.0 due to mismatching libcrypt Key: ARROW-5871 URL: https://issues.apache.org/jira/browse/ARROW-5871 Project: Apache Arrow Issue

Re: CMake refactor Heads-up

2019-03-17 Thread Suvayu Ali
On Sat, Mar 16, 2019 at 04:31:32PM +0100, Uwe L. Korn wrote: > > > 4. AFAIU, the pyarrow build expects the libraries in > > $CMAKE_INSTALL_PREFIX/lib. This will never be accepted by a distro. I do > > realise this one is probably hard to resolve, given how the builds are > > setup at the

[jira] [Created] (ARROW-4930) Remove LIBDIR assumptions in Python build

2019-03-17 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-4930: - Summary: Remove LIBDIR assumptions in Python build Key: ARROW-4930 URL: https://issues.apache.org/jira/browse/ARROW-4930 Project: Apache Arrow Issue Type

Re: CMake refactor Heads-up

2019-03-16 Thread Suvayu Ali
Hi Uwe, On Sat, Mar 16, 2019 at 04:31:32PM +0100, Uwe L. Korn wrote: > > > 2. I don't know if this is intentional, but jemalloc and rapidjson aren't > > detected on my system. > > > > Not sure if it is detected or not as this is missing from the log above. That > log lists only the bundled

Re: CMake refactor Heads-up

2019-03-16 Thread Suvayu Ali
Hello Uwe, On Fri, Mar 15, 2019 at 10:38:32AM -0400, Uwe L. Korn wrote: > > we have merged the CMake refactor yesterday > https://github.com/apache/arrow/pull/3688 and this means that the build > system behaves a bit different. The main differences are: > A few more comments: 1. There was an

Re: CMake refactor Heads-up

2019-03-16 Thread Suvayu Ali
On Sat, Mar 16, 2019 at 04:35:32AM -0400, Uwe L. Korn wrote: > Hello Suvayu, > > On Fri, Mar 15, 2019, at 10:08 PM, Suvayu Ali wrote: > > Is there a recommended way to choose between Python versions? Fedora > > repos often provide several Python versions, e.g. on F28 I h

Re: CMake refactor Heads-up

2019-03-16 Thread Suvayu Ali
On Sat, Mar 16, 2019 at 11:03:27AM +0530, Ravindra Pindikura wrote: > On Sat, Mar 16, 2019 at 2:38 AM Suvayu Ali > wrote: > > > Secondly, I was trying to compile with Gandiva enabled. But it seems the > > LLVM requirement has gone up to 7.0 (available only on F29 onwards

Re: CMake refactor Heads-up

2019-03-15 Thread Suvayu Ali
Hi Uwe, On Fri, Mar 15, 2019 at 10:38:32AM -0400, Uwe L. Korn wrote: > > we have merged the CMake refactor yesterday > https://github.com/apache/arrow/pull/3688 and this means that the build > system behaves a bit different. The main differences are: That's a lot of work! Thank you very much

[jira] [Created] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet

2019-03-10 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-4814: - Summary: [Python] Exception when writing nested columns that are tuples to parquet Key: ARROW-4814 URL: https://issues.apache.org/jira/browse/ARROW-4814 Project: Apache

Re: Distributing Arrow in Debian and Fedora

2019-02-08 Thread Suvayu Ali
Hello Todd, On Fri, Feb 08, 2019 at 11:41:05AM -0500, Todd Rme wrote: > On 2019/02/02 10:00:37, Suvayu Ali wrote: > > > 1. Fedora doesn't allow including external dependencies. In my experience> > >building arrow on Fedora, the way external deps like Protobuf, Thrift,

Re: Distributing Arrow in Debian and Fedora

2019-02-02 Thread Suvayu Ali
Hi Javier, On Sat, Feb 02, 2019 at 11:52:10AM -0800, Javier Luraschi wrote: > Thanks for the additional info, it's really helpful to hear your thoughts > and potential > issues we might need to resolve. > > The R package is planning to support feature flags, so a start, I'm hoping > we can >

Re: Distributing Arrow in Debian and Fedora

2019-02-02 Thread Suvayu Ali
Hello Javier, On Fri, Feb 01, 2019 at 06:51:34PM -0800, Javier Luraschi wrote: > Hi, in order to make Arrow available to the R community through CRAN (R's > package archive), we need to get the Arrow binaries submitted to the Debian >

Tests fail with PyArrow

2019-01-13 Thread Suvayu Ali
Hi, I'm on Fedora 28 with Python 3.6.7, and I'm building under a virtual environment with pip. I am building the tag apache-arrow-0.12.0, and my steps are as follows. $ cmake -G Ninja -DCMAKE_BUILD_TYPE=Debug \ -DCMAKE_INSTALL_PREFIX=$ARROW_HOME \ -DCMAKE_INSTALL_LIBDIR=lib \

Re: Compiling Arrow for RaspberryPi

2019-01-09 Thread Suvayu Ali
Hello Uwe, Wes, others On Wed, Jan 9, 2019 at 10:08 AM Uwe L. Korn wrote: > > for arrow-cpp it is definitely possible to cross-compile on the desktop as it > using standard CMake for the build. There are a lot of guides available for > doing cross compilation with CMake. This may work but I

Compiling Arrow for RaspberryPi

2019-01-08 Thread Suvayu Ali
Hi everyone, I wanted run a long running data collection process on an RPi. But it has been proven difficult to install pyarrow with pip as it still needs to compile. Is it possible to I cross-compile it on my desktop? If so, could someone point me in the right direction? Cheers, --

Re: Access Gandiva filter result by array index

2018-12-14 Thread Suvayu Ali
Hi Ravindra, On Fri, Dec 14, 2018 at 01:11:02PM +0530, Ravindra Pindikura wrote: > > > > But I can't access the elements of the selection vector! Since it is > > declared > > as std::shared_ptr, the Value(..) method isn't found. I had > > filled it with SelectionVector::MakeInt16(..),

Access Gandiva filter result by array index

2018-12-13 Thread Suvayu Ali
Hi everyone, Maybe I'm missing something obvious, but for the life of me, I can't figure out how I can access the elements of an array after a Gandiva filter operation. I have linked a minimal example at the end which I compile like this: $ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17

[jira] [Created] (ARROW-3874) [Gandiva] Cannot build: LLVM not detected

2018-11-25 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-3874: - Summary: [Gandiva] Cannot build: LLVM not detected Key: ARROW-3874 URL: https://issues.apache.org/jira/browse/ARROW-3874 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-3806) [Python] When converting nested types to pandas, use tuples

2018-11-16 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-3806: - Summary: [Python] When converting nested types to pandas, use tuples Key: ARROW-3806 URL: https://issues.apache.org/jira/browse/ARROW-3806 Project: Apache Arrow

[jira] [Created] (ARROW-3792) [PARQUET] Segmentation fault when writing empty RecordBatches

2018-11-14 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-3792: - Summary: [PARQUET] Segmentation fault when writing empty RecordBatches Key: ARROW-3792 URL: https://issues.apache.org/jira/browse/ARROW-3792 Project: Apache Arrow

[jira] [Created] (ARROW-1956) Support reading specific partitions from a partitioned parquet dataset

2017-12-28 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-1956: - Summary: Support reading specific partitions from a partitioned parquet dataset Key: ARROW-1956 URL: https://issues.apache.org/jira/browse/ARROW-1956 Project: Apache Arrow

Re: Installing PyArrow on Amazon Linux

2017-07-02 Thread Suvayu Ali
Hello Uwe, On Sun, Jul 02, 2017 at 02:15:38PM +0200, Uwe L. Korn wrote: > > 1. Your pip is too old, you need at least 8.1.2 That was it :). Thanks a lot! Cheers, -- Suvayu Open source is the future. It sets us free.

Installing PyArrow on Amazon Linux

2017-07-02 Thread Suvayu Ali
Hi Arrow devs, I'm not sure if this is the correct place to ask, if not, please point me in the right direction. I wanted to use the HDFS client with PySpark (for now). My Spark cluster is on Amazon EMR, so the nodes use Amazon Linux (2017.03). On my dev machine (Fedora 25) with Python 3.5.1,