I think this is likely related to: https://issues.apache.org/jira/browse/DRILL-4410
A fix has been merged for this. Can you try from tip of master and see if this is resolved? thanks, Jacques -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon, Mar 7, 2016 at 6:04 AM, Vince Gonzalez <[email protected]> wrote: > Hanifi, > > I just bumped into this as well. > > Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand the > buffer. Max allowed buffer size is reached. > > Earlier in the thread you say: > > > You see this exception because one of the columns in your dataset is > larger > > than an individual DrillBuf could store. The hard limit > > is Integer.MAX_VALUE bytes. > > > So is it right to say the maximum buffer size is 2GB? I'm getting this > exception on a data set whose *total* size is less than 1GB - as reported > by "du -sh" on the top level directory I am querying. So I'm confused. > > I have a guess as to which column in my dataset is causing the problem. > It's likely a substantial JSON document that comes from a file, and the > size of that file varies widely. I process the file into a dictionary in > Python before writing it to my workspace in a format that works but for > this issue. The largest of these documents weighs in at only 320KB. > > I could go down the path of reshaping the large document so that Drill sees > multiple columns, but I don't see how I can be sure that will work since > all of the columns in my data are so far below Integer.MAX_VALUE bytes. > > Is there any other recommendation you can make, apart from further ETLing > the data? > > --vince > > > ---- > Vince Gonzalez > Systems Engineer > 212.694.3879 > > mapr.com > > On Mon, Feb 8, 2016 at 2:05 PM, Hanifi Gunes <[email protected]> wrote: > > > Thanks for the feedback. Yep my answer seems very much dev focused than > > user. > > > > The error is manifestation of extremely wide columns in your dataset. I > > would recommend splitting the list if that's an option. > > > > Assuming the problem column is a list of integers as below > > > > { > > "wide": [1,2,.....N] > > } > > > > after splitting it should look like > > > > { > > "wide0": [1,2,.....X], > > "wide1": [Y,.......Z] > > ... > > "wideN": [T,.......N] > > } > > > > Sounds like good idea to enhance the error reporting with file & column > > name. Filed [1] to track this. > > > > Thanks. > > > > 1: https://issues.apache.org/jira/browse/DRILL-4371 > > > > > > On Fri, Feb 5, 2016 at 6:28 PM, John Omernik <[email protected]> wrote: > > > > > Excuse my basic questions, when you say we are you reference Drill > > coders? > > > So what is Integer.MAX_VALUE bytes? Is that a query time setting? > > Drillbit > > > setting? Is it editable? How does that value get interpreted for > complex > > > data types (objects and arrays). > > > > > > Not only would the column be helpful, but the source file as well. ( > if > > > this is an individual record issue....or is this a cumulative error > where > > > the max size of the sum of the length of multiple records of a column > is > > at > > > issue). > > > > > > > > > Thoughts on how as a user I could address this in my dataset? > > > > > > Thanks! > > > > > > On Friday, February 5, 2016, Hanifi Gunes <[email protected]> wrote: > > > > > > > You see this exception because one of the columns in your dataset is > > > larger > > > > than an individual DrillBuf could store. The hard limit > > > > is Integer.MAX_VALUE bytes. Around the time we are trying to expand > one > > > of > > > > the buffers, we notice the allocation request is oversized and fail > the > > > > query. It would be nice if error message contained the column that > > raised > > > > this issue though. > > > > > > > > On Fri, Feb 5, 2016 at 1:39 PM, John Omernik <[email protected] > > > > <javascript:;>> wrote: > > > > > > > > > Any thoughts on how to troubleshoot this (I have some fat json data > > > going > > > > > into the buffers apparently) It's not huge data, just wide/complex > > > (total > > > > > size is 1.4 GB) Any thoughts on how to troubleshoot or settings I > > can > > > > use > > > > > to work through these errors? > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > John > > > > > > > > > > > > > > > > > > > > > > > > > Error: SYSTEM ERROR: OversizedAllocationException: Unable to expand > > the > > > > > buffer. Max allowed buffer size is reached. > > > > > > > > > > > > > > > > > > > > Fragment 1:11 > > > > > > > > > > > > > > > > > > > > [Error Id: db21dea0-ddd7-4fcf-9fea-b5031e358dad on node1 > > > > > > > > > > > > > > > > > > > > (org.apache.drill.exec.exception.OversizedAllocationException) > > Unable > > > > to > > > > > expand the buffer. Max allowed buffer size is reached. > > > > > > > > > > org.apache.drill.exec.vector.UInt1Vector.reAlloc():214 > > > > > > > > > > > > > org.apache.drill.exec.vector.UInt1Vector$Mutator.setValueCount():469 > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.vector.complex.ListVector$Mutator.setValueCount():324 > > > > > > > > > > org.apache.drill.exec.physical.impl.ScanBatch.next():247 > > > > > > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > > > > > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > > > > > > > > > > > > > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():132 > > > > > > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > > > > > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.test.generated.StreamingAggregatorGen1931.doWork():172 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.physical.impl.aggregate.StreamingAggBatch.innerNext():167 > > > > > > > > > > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > > > > > > > > > > org.apache.drill.exec.physical.impl.BaseRootExec.next():104 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():93 > > > > > > > > > > org.apache.drill.exec.physical.impl.BaseRootExec.next():94 > > > > > > > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():256 > > > > > > > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():250 > > > > > > > > > > java.security.AccessController.doPrivileged():-2 > > > > > > > > > > javax.security.auth.Subject.doAs():415 > > > > > > > > > > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > > > > > > > > > > org.apache.drill.exec.work.fragment.FragmentExecutor.run():250 > > > > > > > > > > org.apache.drill.common.SelfCleaningRunnable.run():38 > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > > > > > > > > > > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > > > > > > > > > > java.lang.Thread.run():745 (state=,code=0) > > > > > > > > > > > > > > > > > > -- > > > Sent from my iThing > > > > > >
