Hi Praveen, Can you also share what is the schema of your entire dataset and in what format it's stored?
Thanks, Sorabh On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua <ku...@apache.org> wrote: > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh > I suspect that your 5MB JSON data might be having a lot of objects, which > need to be serialized in memory. > > FLATTEN has the problem that it replicates the data parent data for each > child node that is being flattened into a row... so the resulting data > being constructed in memory can grow significantly. > One way to work around (not elegant, but worth trying) would be to > generate intermediate flatten data and write temporary (if not using WebUI) > tables and keep flattening out those records until you have a fully > flattened dataset to work with directly. > > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY <pravee...@gmail.com> > wrote: > Thnks a lot Kunal. I am looking into that. I have one observation. > > With out flatten also, i tried to run a query of size 5MB, it is taking 5GB > of heap? how do i control heap? Are there any settings i can modify. i am > reading a lot, but nothing is working for me. It would be helpful how to > control heap, i modified memory parameters based on the documentation, it > is not working yet. it would be really helpful if i get some help in this > regard. Thanks in advance. > > Regards > Praveen > > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote: > > > This is a good starting point for understanding LATERAL-UNNEST and how it > > compares to the FLATTEN operator. > > > > https://drill.apache.org/docs/lateral-join/ > > > > > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote: > > Thanks Kunal. > > i am not getting how to use lateral-unrest as dataset does not have child > > rows. All data is in array of json objects(as mentioned below). There are > > two json objects separated by comma and enclosed in squre bracket. > > > > > [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}] > > > > We are using drill from Java. Through a rest invocation. Not using json > > files. All data is sent over post as string. We are using convert_from > > function in the query to convert into json objects. As we are sending > array > > of json objects, using FLATTEN operator to convert into multiple rows. is > > there any way to avoid Flatten, as we see huge spike for 54MB data, going > > to 24GB and still failing with heap error. not sure what is wrong. Can i > > use FLATTEN on the entire data set? There are almost 54K records that is > > getting FLATTENED. > > > > example query: 1)first converted into array of json objects 2) flatten to > > convert into multiple rows > > select ems.* from (select flatten(t.jdata) as record from (select > > > > > convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]') > > as jdata) as t) ems > > > > > > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote: > > > > > The memory (heap) would climb as it tries to flatten the JSON data. > Have > > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant > to > > > address memory issues for some use cases of the FLATTEN operator. > > > > > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote: > > > I am running a query with UNION ALL. as below > > > > > > select > > > from ( select FLATTEN(t.jdata) as record from > > > ((select convert_from(json string, json) union all > > > (select conver_from(json_string,json) union all > > > ... > > > ) as jdata) ) as t) ems > > > > > > Reason for giving union all is because we are invoking a call using > rest > > > app, there is limitation of 20,000 when we use convert_from function. > Our > > > heap size is 8GB, server is 8core. From profiling, it shows this > > perticula > > > query spikes from 100MB to 8GB continuously. is there anything i am > > > doing wrong?. > > > > > > Thanks, > > > Prveen > > > > > >