@Akif, right. Although this issue manifests more than one problem, the major cause for this behavior seems that copying nested data per row overflows the underlying buffer. A problem that's known with flatten. We should be able to fix this soon.
On Mon, Jun 22, 2015 at 9:49 AM, Andries Engelbrecht < [email protected]> wrote: > I was not able to read the JSON doc in the link with either VisualJSON or > some online JSON editors. > Are you sure the JSON document is structured properly? > > —Andries > > > On Jun 20, 2015, at 2:37 AM, Akif Khan <[email protected]> wrote: > > > Hi > > > > I found out that the Drill flattening fails when the nesting is too > large. > > you can find the json on which it fails here : > > https://gist.github.com/anonymous/d18a5da201a995084c1b > > > > When I ran the query select flatten(campaign['funders'])['user_id'] from > > `crowd/xal2.json`; > > It failed, while It works perfectly on smaller Nested json. > > > > > > > > On Sat, Jun 20, 2015 at 12:51 AM, Jason Altekruse < > [email protected]> > > wrote: > > > >> The allocation that is failing is not the data actually required for the > >> flatten operation, but the unneeded copy of all of the lists. If we > remove > >> this from the plan a lot more flatten queries will execute > successfully. We > >> still don't have a solution for a single list that does not fit in the > max > >> allocation size for a buffer, but this is a larger issue that needs to > be > >> addressed with some additional design work. > >> > >> On Fri, Jun 19, 2015 at 11:57 AM, Hanifi Gunes <[email protected]> > >> wrote: > >> > >>> Jason, pointed out a possible indefinite loop problem where requested > >>> allocation size > max allowed so we will have to address that before > >>> checking it in. > >>> > >>> It is not entirely clear to me from the description of D-3323 what the > >>> problem and proposal are. Is the issue solely targeting to fix the > >>> redundant vector copy issue? And also, how is that contributing to the > >>> manifestation of the original problem? > >>> > >>> -Hanifi > >>> > >>> On Fri, Jun 19, 2015 at 10:17 AM, Jason Altekruse < > >>> [email protected]> > >>> wrote: > >>> > >>>> The patch is currently in review, I don't think that it is going to > >>>> necessarily fix this issue. I am have been looking into issues with > >>> flatten > >>>> and I just opened a new JIRA that I think will actually address your > >>> issue. > >>>> This is a little bit of a low level issue with how the flatten is > >>> currently > >>>> being planned. > >>>> > >>>> https://issues.apache.org/jira/browse/DRILL-3323 > >>>> > >>>> Are the lists that you are trying to flatten very large? This would > >> make > >>> it > >>>> likely caused by the problem I just filed this JIRA against. I hope > >> that > >>> we > >>>> can get in a fix for this issue in to the 1.1 release. > >>>> > >>>> On Fri, Jun 19, 2015 at 1:41 AM, Akif Khan <[email protected]> > >>>> wrote: > >>>> > >>>>> Hi All, > >>>>> > >>>>> Thanks for the response, @Hanifi Gunes wanted to ask you whether the > >>>> patch > >>>>> is being worked on or has it been released, I couldn't see any patch > >> on > >>>> the > >>>>> JIRA Dashboard. > >>>>> > >>>>> On Fri, Jun 19, 2015 at 1:26 AM, Hanifi Gunes <[email protected]> > >>>> wrote: > >>>>> > >>>>>> The patch is in-progress and should be check-in soon. It would be > >>> great > >>>>> if > >>>>>> you could apply and battle-test it. > >>>>>> > >>>>>> -Hanifi > >>>>>> > >>>>>> > >>>>>> On Thu, Jun 18, 2015 at 9:18 AM, Abdel Hakim Deneche < > >>>>>> [email protected]> > >>>>>> wrote: > >>>>>> > >>>>>>> Hey Akif, > >>>>>>> > >>>>>>> There is a known issue that looks similar to the error you > >>> reported: > >>>>>>> > >>>>>>> DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851> > >>>>>>> > >>>>>>> There is already a patch for review to fix for this and it may > >> fix > >>>> your > >>>>>>> issue or at the very least give you a more meaningful error > >>> message. > >>>>> You > >>>>>>> could either wait until the patch is merged in master or try it > >> by > >>>>>> yourself > >>>>>>> and see if the issue has been fixed. > >>>>>>> > >>>>>>> Thanks! > >>>>>>> > >>>>>>> On Thu, Jun 18, 2015 at 5:35 AM, Akif Khan < > >>> [email protected] > >>>>> > >>>>>>> wrote: > >>>>>>> > >>>>>>>> Hi > >>>>>>>> > >>>>>>>> I am re posting my query as there weren't any responses > >> earlier. > >>>>> please > >>>>>>>> tell why this error happens and can it avoided ? Or is it due > >> to > >>>> bad > >>>>>>> data ? > >>>>>>>> > >>>>>>>> I wrote a query mentioned below and got this error, I have an > >>>> amazon > >>>>>> aws > >>>>>>>> with four nodes having 32 GB RAM and 8 cores on ubuntu with > >>> Hadoop > >>>> FS > >>>>>> and > >>>>>>>> zookeeper installed : > >>>>>>>> > >>>>>>>> *Query *: select flatten(campaign['funders'])['user_id'] from > >>>>>>>> `new_crowdfunding`; > >>>>>>>> > >>>>>>>> > >>>>>>>> the s*tructure of new_crowdfunding table* is as follows: > >>>>>>>> https://gist.github.com/akifkhan/d864ad9dcf5be712ff24 > >>>>>>>> > >>>>>>>> *Error after running for 40 seconds and printing various > >>> user_ids* > >>>>>>>> > >>>>>>>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM > >> ERROR: > >>>>>>>> java.lang.IllegalArgumentException: initialCapacity: > >> -2147483648 > >>>>>>> (expectd: > >>>>>>>> 0+) > >>>>>>>> > >>>>>>>> Fragment 0:0 > >>>>>>>> > >>>>>>>> [Error Id: 4fa13e31-ad84-42c6-aa50-c80c92ab026d on > >>>>> hadoop-slave1:31010] > >>>>>>>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) > >>>>>>>> at > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85) > >>>>>>>> at sqlline.TableOutputFormat.print(TableOutputFormat.java:116) > >>>>>>>> at sqlline.SqlLine.print(SqlLine.java:1583) > >>>>>>>> at sqlline.Commands.execute(Commands.java:852) > >>>>>>>> at sqlline.Commands.sql(Commands.java:751) > >>>>>>>> at sqlline.SqlLine.dispatch(SqlLine.java:738) > >>>>>>>> at sqlline.SqlLine.begin(SqlLine.java:612) > >>>>>>>> at sqlline.SqlLine.start(SqlLine.java:366) > >>>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> -- > >>>>>>> > >>>>>>> Abdelhakim Deneche > >>>>>>> > >>>>>>> Software Engineer > >>>>>>> > >>>>>>> <http://www.mapr.com/> > >>>>>>> > >>>>>>> > >>>>>>> Now Available - Free Hadoop On-Demand Training > >>>>>>> < > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>>> > >>>>> -- > >>>>> Regards > >>>>> > >>>>> *Akif Khan* > >>>>> *InnovAccer Inc.* > >>>>> *www.innovaccer.com <http://www.innovaccer.com>* > >>>>> *+91 8802290360* > >>>>> > >>>> > >>> > >> > > > > > > > > -- > > Regards > > > > *Akif Khan* > > *InnovAccer Inc.* > > *www.innovaccer.com <http://www.innovaccer.com>* > > *+91 8802290360* > >
