Re: Reposting the Error on Flattening

Hanifi Gunes Mon, 22 Jun 2015 18:36:45 -0700

A quick update: I am able to run your failing query to completion when
D-2851 is applied. If you want early access, please pull the branch from my
GH[1] and let me know if you hit to any other problems.



1: https://github.com/hnfgns/incubator-drill/tree/DRILL-2851

On Mon, Jun 22, 2015 at 11:48 AM, Hanifi Gunes <[email protected]> wrote:

> @Akif, right. Although this issue manifests more than one problem, the
> major cause for this behavior seems that copying nested data per row
> overflows the underlying buffer. A problem that's known with flatten. We
> should be able to fix this soon.
>
> On Mon, Jun 22, 2015 at 9:49 AM, Andries Engelbrecht <
> [email protected]> wrote:
>
>> I was not able to read the JSON doc in the link with either VisualJSON or
>> some online JSON editors.
>> Are you sure the JSON document is structured properly?
>>
>> —Andries
>>
>>
>> On Jun 20, 2015, at 2:37 AM, Akif Khan <[email protected]> wrote:
>>
>> > Hi
>> >
>> > I found out that the Drill flattening fails when the nesting is too
>> large.
>> > you can find the json on which it fails here :
>> > https://gist.github.com/anonymous/d18a5da201a995084c1b
>> >
>> > When I ran the query select flatten(campaign['funders'])['user_id'] from
>> > `crowd/xal2.json`;
>> > It failed, while It works perfectly on smaller Nested json.
>> >
>> >
>> >
>> > On Sat, Jun 20, 2015 at 12:51 AM, Jason Altekruse <
>> [email protected]>
>> > wrote:
>> >
>> >> The allocation that is failing is not the data actually required for
>> the
>> >> flatten operation, but the unneeded copy of all of the lists. If we
>> remove
>> >> this from the plan a lot more flatten queries will execute
>> successfully. We
>> >> still don't have a solution for a single list that does not fit in the
>> max
>> >> allocation size for a buffer, but this is a larger issue that needs to
>> be
>> >> addressed with some additional design work.
>> >>
>> >> On Fri, Jun 19, 2015 at 11:57 AM, Hanifi Gunes <[email protected]>
>> >> wrote:
>> >>
>> >>> Jason, pointed out a possible indefinite loop problem where requested
>> >>> allocation size > max allowed so we will have to address that before
>> >>> checking it in.
>> >>>
>> >>> It is not entirely clear to me from the description of D-3323 what the
>> >>> problem and proposal are. Is the issue solely targeting to fix the
>> >>> redundant vector copy issue? And also, how is that contributing to the
>> >>> manifestation of the original problem?
>> >>>
>> >>> -Hanifi
>> >>>
>> >>> On Fri, Jun 19, 2015 at 10:17 AM, Jason Altekruse <
>> >>> [email protected]>
>> >>> wrote:
>> >>>
>> >>>> The patch is currently in review, I don't think that it is going to
>> >>>> necessarily fix this issue. I am have been looking into issues with
>> >>> flatten
>> >>>> and I just opened a new JIRA that I think will actually address your
>> >>> issue.
>> >>>> This is a little bit of a low level issue with how the flatten is
>> >>> currently
>> >>>> being planned.
>> >>>>
>> >>>> https://issues.apache.org/jira/browse/DRILL-3323
>> >>>>
>> >>>> Are the lists that you are trying to flatten very large? This would
>> >> make
>> >>> it
>> >>>> likely caused by the problem I just filed this JIRA against. I hope
>> >> that
>> >>> we
>> >>>> can get in a fix for this issue in to the 1.1 release.
>> >>>>
>> >>>> On Fri, Jun 19, 2015 at 1:41 AM, Akif Khan <[email protected]
>> >
>> >>>> wrote:
>> >>>>
>> >>>>> Hi All,
>> >>>>>
>> >>>>> Thanks for the response, @Hanifi Gunes wanted to ask you whether the
>> >>>> patch
>> >>>>> is being worked on or has it been released, I couldn't see any patch
>> >> on
>> >>>> the
>> >>>>> JIRA Dashboard.
>> >>>>>
>> >>>>> On Fri, Jun 19, 2015 at 1:26 AM, Hanifi Gunes <[email protected]>
>> >>>> wrote:
>> >>>>>
>> >>>>>> The patch is in-progress and should be check-in soon. It would be
>> >>> great
>> >>>>> if
>> >>>>>> you could apply and battle-test it.
>> >>>>>>
>> >>>>>> -Hanifi
>> >>>>>>
>> >>>>>>
>> >>>>>> On Thu, Jun 18, 2015 at 9:18 AM, Abdel Hakim Deneche <
>> >>>>>> [email protected]>
>> >>>>>> wrote:
>> >>>>>>
>> >>>>>>> Hey Akif,
>> >>>>>>>
>> >>>>>>> There is a known issue that looks similar to the error you
>> >>> reported:
>> >>>>>>>
>> >>>>>>> DRILL-2851 <https://issues.apache.org/jira/browse/DRILL-2851>
>> >>>>>>>
>> >>>>>>> There is already a patch for review to fix for this and it may
>> >> fix
>> >>>> your
>> >>>>>>> issue or at the very least give you a more meaningful error
>> >>> message.
>> >>>>> You
>> >>>>>>> could either wait until the patch is merged in master or try it
>> >> by
>> >>>>>> yourself
>> >>>>>>> and see if the issue has been fixed.
>> >>>>>>>
>> >>>>>>> Thanks!
>> >>>>>>>
>> >>>>>>> On Thu, Jun 18, 2015 at 5:35 AM, Akif Khan <
>> >>> [email protected]
>> >>>>>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>>> Hi
>> >>>>>>>>
>> >>>>>>>> I am re posting my query as there weren't any responses
>> >> earlier.
>> >>>>> please
>> >>>>>>>> tell why this error happens and can it avoided ? Or is it due
>> >> to
>> >>>> bad
>> >>>>>>> data ?
>> >>>>>>>>
>> >>>>>>>> I wrote a query mentioned below and got this error, I have an
>> >>>> amazon
>> >>>>>> aws
>> >>>>>>>> with four nodes having 32 GB RAM and 8 cores on ubuntu with
>> >>> Hadoop
>> >>>> FS
>> >>>>>> and
>> >>>>>>>> zookeeper installed :
>> >>>>>>>>
>> >>>>>>>> *Query *: select flatten(campaign['funders'])['user_id'] from
>> >>>>>>>> `new_crowdfunding`;
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> the s*tructure of new_crowdfunding table* is as follows:
>> >>>>>>>> https://gist.github.com/akifkhan/d864ad9dcf5be712ff24
>> >>>>>>>>
>> >>>>>>>> *Error after running for 40 seconds and printing various
>> >>> user_ids*
>> >>>>>>>>
>> >>>>>>>> java.lang.RuntimeException: java.sql.SQLException: SYSTEM
>> >> ERROR:
>> >>>>>>>> java.lang.IllegalArgumentException: initialCapacity:
>> >> -2147483648
>> >>>>>>> (expectd:
>> >>>>>>>> 0+)
>> >>>>>>>>
>> >>>>>>>> Fragment 0:0
>> >>>>>>>>
>> >>>>>>>> [Error Id: 4fa13e31-ad84-42c6-aa50-c80c92ab026d on
>> >>>>> hadoop-slave1:31010]
>> >>>>>>>> at sqlline.IncrementalRows.hasNext(IncrementalRows.java:73)
>> >>>>>>>> at
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> sqlline.TableOutputFormat$ResizingRowsProvider.next(TableOutputFormat.java:85)
>> >>>>>>>> at sqlline.TableOutputFormat.print(TableOutputFormat.java:116)
>> >>>>>>>> at sqlline.SqlLine.print(SqlLine.java:1583)
>> >>>>>>>> at sqlline.Commands.execute(Commands.java:852)
>> >>>>>>>> at sqlline.Commands.sql(Commands.java:751)
>> >>>>>>>> at sqlline.SqlLine.dispatch(SqlLine.java:738)
>> >>>>>>>> at sqlline.SqlLine.begin(SqlLine.java:612)
>> >>>>>>>> at sqlline.SqlLine.start(SqlLine.java:366)
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>> Abdelhakim Deneche
>> >>>>>>>
>> >>>>>>> Software Engineer
>> >>>>>>>
>> >>>>>>>  <http://www.mapr.com/>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> Now Available - Free Hadoop On-Demand Training
>> >>>>>>> <
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> Regards
>> >>>>>
>> >>>>> *Akif Khan*
>> >>>>> *InnovAccer Inc.*
>> >>>>> *www.innovaccer.com <http://www.innovaccer.com>*
>> >>>>> *+91 8802290360*
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Regards
>> >
>> > *Akif Khan*
>> > *InnovAccer Inc.*
>> > *www.innovaccer.com <http://www.innovaccer.com>*
>> > *+91 8802290360*
>>
>>
>

Re: Reposting the Error on Flattening

Reply via email to