A quick update: I am able to run your failing query to completion when D-2851 is applied. If you want early access, please pull the branch from my GH[1] and let me know if you hit to any other problems.
1: https://github.com/hnfgns/incubator-drill/tree/DRILL-2851 On Mon, Jun 22, 2015 at 11:48 AM, Hanifi Gunes <[email protected]> wrote: > @Akif, right. Although this issue manifests more than one problem, the > major cause for this behavior seems that copying nested data per row > overflows the underlying buffer. A problem that's known with flatten. We > should be able to fix this soon. > > On Mon, Jun 22, 2015 at 9:49 AM, Andries Engelbrecht < > [email protected]> wrote: > >> I was not able to read the JSON doc in the link with either VisualJSON or >> some online JSON editors. >> Are you sure the JSON document is structured properly? >> >> —Andries >> >> >> On Jun 20, 2015, at 2:37 AM, Akif Khan <[email protected]> wrote: >> >> > Hi >> > >> > I found out that the Drill flattening fails when the nesting is too >> large. >> > you can find the json on which it fails here : >> > https://gist.github.com/anonymous/d18a5da201a995084c1b >> > >> > When I ran the query select flatten(campaign['funders'])['user_id'] from >> > `crowd/xal2.json`; >> > It failed, while It works perfectly on smaller Nested json. >> > >> > >> > >> > On Sat, Jun 20, 2015 at 12:51 AM, Jason Altekruse < >> [email protected]> >> > wrote: >> > >> >> The allocation that is failing is not the data actually required for >> the >> >> flatten operation, but the unneeded copy of all of the lists. If we >> remove >> >> this from the plan a lot more flatten queries will execute >> successfully. We >> >> still don't have a solution for a single list that does not fit in the >> max >> >> allocation size for a buffer, but this is a larger issue that needs to >> be >> >> addressed with some additional design work. >> >> >> >> On Fri, Jun 19, 2015 at 11:57 AM, Hanifi Gunes <[email protected]> >> >> wrote: >> >> >> >>> Jason, pointed out a possible indefinite loop problem where requested >> >>> allocation size > max allowed so we will have to address that before >> >>> checking it in. >> >>> >> >>> It is not entirely clear to me from the description of D-3323 what the >> >>> problem and proposal are. Is the issue solely targeting to fix the >> >>> redundant vector copy issue? And also, how is that contributing to the >> >>> manifestation of the original problem? >> >>> >> >>> -Hanifi >> >>> >> >>> On Fri, Jun 19, 2015 at 10:17 AM, Jason Altekruse < >> >>> [email protected]> >> >>> wrote: >> >>> >> >>>> The patch is currently in review, I don't think that it is going to >> >>>> necessarily fix this issue. I am have been looking into issues with >> >>> flatten >> >>>> and I just opened a new JIRA that I think will actually address your >> >>> issue. >> >>>> This is a little bit of a low level issue with how the flatten is >> >>> currently >> >>>> being planned. >> >>>> >> >>>> https://issues.apache.org/jira/browse/DRILL-3323 >> >>>> >> >>>> Are the lists that you are trying to flatten very large? This would >> >> make >> >>> it >> >>>> likely caused by the problem I just filed this JIRA against. I hope >> >> that >> >>> we >> >>>> can get in a fix for this issue in to the 1.1 release. >> >>>> >> >>>> On Fri, Jun 19, 2015 at 1:41 AM, Akif Khan <[email protected] >> > >> >>>> wrote: >> >>>> >> >>>>> Hi All, >> >>>>> >> >>>>> Thanks for the response, @Hanifi Gunes wanted to ask you whether the >> >>>> patch >> >>>>> is being worked on or has it been released, I couldn't see any patch >> >> on >> >>>> the >> >>>>> JIRA Dashboard. >> >>>>> >> >>>>> On Fri, Jun 19, 2015 at 1:26 AM, Hanifi Gunes <[email protected]> >> >>>> wrote: >> >>>>> >> >>>>>> The patch is in-progress and should be check-in soon. It would be >> >>> great >> >>>>> if >> >>>>>> you could apply and battle-test it. >> >>>>>> >> >>>>>> -Hanifi >> >>>>>> >> >>>>>> >> >>>>>> On Thu, Jun 18, 2015 at 9:18 AM, Abdel Hakim Deneche < >> >>>>>> [email protected]> >> >>>>>> wrote: >> >>>>>> >> >>>>>>> Hey Akif, >> >>>>>>> >> >>>>>>> There is a known issue that looks similar to the error you >> >>> reported: >> >>>>>>> >> >>>>>>> DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851> >> >>>>>>> >> >>>>>>> There is already a patch for review to fix for this and it may >> >> fix >> >>>> your >> >>>>>>> issue or at the very least give you a more meaningful error >> >>> message. >> >>>>> You >> >>>>>>> could either wait until the patch is merged in master or try it >> >> by >> >>>>>> yourself >> >>>>>>> and see if the issue has been fixed. >> >>>>>>> >> >>>>>>> Thanks! >> >>>>>>> >> >>>>>>> On Thu, Jun 18, 2015 at 5:35 AM, Akif Khan < >> >>> [email protected] >> >>>>> >> >>>>>>> wrote: >> >>>>>>> >> >>>>>>>> Hi >> >>>>>>>> >> >>>>>>>> I am re posting my query as there weren't any responses >> >> earlier. >> >>>>> please >> >>>>>>>> tell why this error happens and can it avoided ? Or is it due >> >> to >> >>>> bad >> >>>>>>> data ? >> >>>>>>>> >> >>>>>>>> I wrote a query mentioned below and got this error, I have an >> >>>> amazon >> >>>>>> aws >> >>>>>>>> with four nodes having 32 GB RAM and 8 cores on ubuntu with >> >>> Hadoop >> >>>> FS >> >>>>>> and >> >>>>>>>> zookeeper installed : >> >>>>>>>> >> >>>>>>>> *Query *: select flatten(campaign['funders'])['user_id'] from >> >>>>>>>> `new_crowdfunding`; >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> the s*tructure of new_crowdfunding table* is as follows: >> >>>>>>>> https://gist.github.com/akifkhan/d864ad9dcf5be712ff24 >> >>>>>>>> >> >>>>>>>> *Error after running for 40 seconds and printing various >> >>> user_ids* >> >>>>>>>> >> >>>>>>>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM >> >> ERROR: >> >>>>>>>> java.lang.IllegalArgumentException: initialCapacity: >> >> -2147483648 >> >>>>>>> (expectd: >> >>>>>>>> 0+) >> >>>>>>>> >> >>>>>>>> Fragment 0:0 >> >>>>>>>> >> >>>>>>>> [Error Id: 4fa13e31-ad84-42c6-aa50-c80c92ab026d on >> >>>>> hadoop-slave1:31010] >> >>>>>>>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73) >> >>>>>>>> at >> >>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85) >> >>>>>>>> at sqlline.TableOutputFormat.print(TableOutputFormat.java:116) >> >>>>>>>> at sqlline.SqlLine.print(SqlLine.java:1583) >> >>>>>>>> at sqlline.Commands.execute(Commands.java:852) >> >>>>>>>> at sqlline.Commands.sql(Commands.java:751) >> >>>>>>>> at sqlline.SqlLine.dispatch(SqlLine.java:738) >> >>>>>>>> at sqlline.SqlLine.begin(SqlLine.java:612) >> >>>>>>>> at sqlline.SqlLine.start(SqlLine.java:366) >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> >> >>>>>>> Abdelhakim Deneche >> >>>>>>> >> >>>>>>> Software Engineer >> >>>>>>> >> >>>>>>> <http://www.mapr.com/> >> >>>>>>> >> >>>>>>> >> >>>>>>> Now Available - Free Hadoop On-Demand Training >> >>>>>>> < >> >>>>>>> >> >>>>>> >> >>>>> >> >>>> >> >>> >> >> >> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available >> >>>>>>>> >> >>>>>>> >> >>>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> -- >> >>>>> Regards >> >>>>> >> >>>>> *Akif Khan* >> >>>>> *InnovAccer Inc.* >> >>>>> *www.innovaccer.com <http://www.innovaccer.com>* >> >>>>> *+91 8802290360* >> >>>>> >> >>>> >> >>> >> >> >> > >> > >> > >> > -- >> > Regards >> > >> > *Akif Khan* >> > *InnovAccer Inc.* >> > *www.innovaccer.com <http://www.innovaccer.com>* >> > *+91 8802290360* >> >> >
