Re: Max Buffer Size Reached

Hanifi Gunes Mon, 08 Feb 2016 13:10:19 -0800

Thanks for bringing this up. I am not quite opinionated for or against any
of these. Personally, I find it more convenient for users to follow a
conversation on the (user) mailing list than the JIRA. It is also cool that
other users who might have an idea could jump in and comment. While JIRA
stands out for tracking issues so it is probably more suitable for little
deeper and tailored discussions or checking progress.


For the upcoming discussions, if broader participation is targeted, mailing
list is likely to go place for iterating on ideas etc. Salient points from
these threads can then be carried under JIRA. User-list seems a good choice
for your list above. Feel free to do back and forth here esp since your
points are in regards to overall user experience.

-Hanifi

On Mon, Feb 8, 2016 at 11:38 AM, John Omernik <[email protected]> wrote:

> Thanks Hanifi, do you think the user list or the Jira will be the best
> place to track my questions?  I have quite a few, and I think I am going to
> try to put some "typical" user restrictions on how I can handle the data.
> I.e. How would a user who may not know string manipulation split those
> lists?  Yes, I could write python or spark job, but what if this is
> generated data, this could cause some very interesting conversations with
> Users.  The reason I say this is I don't want to appear to be ignoring
> advice or asking inane questions, but instead I am trying to figure out how
> we can do a few things
>
> 1. Keep troubleshooting in Drill if at all possible
> 2. Increase effectiveness of error messages.
> 3. Increase the number of use cases for Drill
> 4. Make the user experience for Drill outstanding.
>
> Please bear with my questions with that in mind,  So, to my first question,
> keep on the list here (where many users may be able read/see/learn from) or
> should we have the back and forth be in the JIRA?
>
> Thanks again,
>
> John
>
> On Mon, Feb 8, 2016 at 1:05 PM, Hanifi Gunes <[email protected]> wrote:
>
> > Thanks for the feedback. Yep my answer seems very much dev focused than
> > user.
> >
> > The error is manifestation of extremely wide columns in your dataset. I
> > would recommend splitting the list if that's an option.
> >
> > Assuming the problem column is a list of integers as below
> >
> > {
> > "wide": [1,2,.....N]
> > }
> >
> > after splitting it should look like
> >
> > {
> > "wide0": [1,2,.....X],
> > "wide1": [Y,.......Z]
> > ...
> > "wideN": [T,.......N]
> > }
> >
> > Sounds like good idea to enhance the error reporting with file & column
> > name. Filed [1] to track this.
> >
> > Thanks.
> >
> > 1: https://issues.apache.org/jira/browse/DRILL-4371
> >
> >
> > On Fri, Feb 5, 2016 at 6:28 PM, John Omernik <[email protected]> wrote:
> >
> > > Excuse my basic questions, when you say we are you reference Drill
> > coders?
> > > So what is Integer.MAX_VALUE bytes? Is that a query time setting?
> > Drillbit
> > > setting? Is it editable?  How does that value get interpreted for
> complex
> > > data types (objects and arrays).
> > >
> > > Not only would the column be helpful, but the source file as well.  (
> if
> > > this is an individual record issue....or is this a cumulative error
> where
> > > the max size of the sum of the length of multiple records of a column
> is
> > at
> > > issue).
> > >
> > >
> > > Thoughts on how as a user I could address this in my dataset?
> > >
> > > Thanks!
> > >
> > > On Friday, February 5, 2016, Hanifi Gunes <[email protected]> wrote:
> > >
> > > > You see this exception because one of the columns in your dataset is
> > > larger
> > > > than an individual DrillBuf could store. The hard limit
> > > > is Integer.MAX_VALUE bytes. Around the time we are trying to expand
> one
> > > of
> > > > the buffers, we notice the allocation request is oversized and fail
> the
> > > > query. It would be nice if error message contained the column that
> > raised
> > > > this issue though.
> > > >
> > > > On Fri, Feb 5, 2016 at 1:39 PM, John Omernik <[email protected]
> > > > <javascript:;>> wrote:
> > > >
> > > > > Any thoughts on how to troubleshoot this (I have some fat json data
> > > going
> > > > > into the buffers apparently) It's not huge data, just wide/complex
> > > (total
> > > > > size is 1.4 GB)  Any thoughts on how to troubleshoot or settings I
> > can
> > > > use
> > > > > to work through these errors?
> > > > >
> > > > >
> > > > > Thanks!
> > > > >
> > > > >
> > > > > John
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand
> > the
> > > > > buffer. Max allowed buffer size is reached.
> > > > >
> > > > >
> > > > >
> > > > > Fragment 1:11
> > > > >
> > > > >
> > > > >
> > > > > [Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1
> > > > >
> > > > >
> > > > >
> > > > >   (org.apache.drill.exec.exception.OversizedAllocationException)
> > Unable
> > > > to
> > > > > expand the buffer. Max allowed buffer size is reached.
> > > > >
> > > > >     org.apache.drill.exec.vector.UInt1Vector.reAlloc():214
> > > > >
> > > > >
> > >  org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324
> > > > >
> > > > >     org.apache.drill.exec.physical.impl.ScanBatch.next():247
> > > > >
> > > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > > > >
> > > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():109
> > > > >
> > > > >
> > >  org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132
> > > > >
> > > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > > >
> > > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():119
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167
> > > > >
> > > > >     org.apache.drill.exec.record.AbstractRecordBatch.next():162
> > > > >
> > > > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():104
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93
> > > > >
> > > > >     org.apache.drill.exec.physical.impl.BaseRootExec.next():94
> > > > >
> > > > >
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256
> > > > >
> > > > >
>  org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250
> > > > >
> > > > >     java.security.AccessController.doPrivileged():-2
> > > > >
> > > > >     javax.security.auth.Subject.doAs():415
> > > > >
> > > > >     org.apache.hadoop.security.UserGroupInformation.doAs():1595
> > > > >
> > > > >     org.apache.drill.exec.work.fragment.FragmentExecutor.run():250
> > > > >
> > > > >     org.apache.drill.common.SelfCleaningRunnable.run():38
> > > > >
> > > > >     java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> > > > >
> > > > >     java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> > > > >
> > > > >     java.lang.Thread.run():745 (state=,code=0)
> > > > >
> > > >
> > >
> > >
> > > --
> > > Sent from my iThing
> > >
> >
>

Re: Max Buffer Size Reached

Reply via email to