Abdeali Kothari created ARROW-4890:
--
Summary: Spark+Arrow Grouped pandas UDAF - read length must be
positive or -1
Key: ARROW-4890
URL: https://issues.apache.org/jira/browse/ARROW-4890
Project
Hi, any help on this would be much appreciated.
I've not been able to figure out any reason for this to happen yet
On Sat, Mar 2, 2019, 11:50 Abdeali Kothari wrote:
> Hi Li Jin, thanks for the note.
>
> I get this error only for larger data - when I reduce the number of
> records o
k 2.3). I forgot whether there is binary
> incompatibility between these versions and pyarrow 0.12.
>
> On Fri, Mar 1, 2019 at 3:32 PM Abdeali Kothari
> wrote:
>
> > Forgot to mention: The above testing is with 0.11.1
> > I tried 0.12.1 as you suggested - and am getting the
at 1:57 AM Abdeali Kothari
wrote:
> That was spot on!
> I had 3 columns with 80characters => 80*21*10^6 = 1.56 bytes
> I removed these columns and replaced each with 10 doubleType columns (so
> it would still be 80 bytes of data) - and this error didn't come up anymore.
>
exact size of your columns. We support 2G
> per column, if it is only 1.5G, then there is probably a rounding error in
> the Arrow. Alternatively, you might also be in luck that the following
> patch
> https://github.com/apache/arrow/commit/bfe6865ba8087a46bd7665679e48af3a77987cef
>
ry splitting your DataFrame
> into more partitions before applying the UDAF.
>
> Cheers
> Uwe
>
> On Fri, Mar 1, 2019, at 9:09 AM, Abdeali Kothari wrote:
> > I was using arrow with spark+python and when I'm trying some pandas-UDAF
&
I was using arrow with spark+python and when I'm trying some pandas-UDAF
functions I am getting this error:
org.apache.arrow.vector.util.OversizedAllocationException: Unable to expand
the buffer
at
org.apache.arrow.vector.BaseVariableWidthVector.reallocDataBuffer(BaseVariableWidthVector.java:457)