Re: Performace issue

2019-02-14 Thread PRAVEEN DEVERACHETTY
On Thu, Feb 14, 2019 at 7:10 AM PRAVEEN DEVERACHETTY wrote: > HI Sorabh, > > Here is the sample query passed using REST API. This query is sent as a > body and submit job(REST). convert_from function convert json string to > json string object. Then flatten on the resultset defined in you

RE: Performace issue

2019-02-13 Thread Lee, David
orn": null, "died": null. "adopted": {"by": "Daddy Warbucks", "when": "1934-01-04"}}, ] Changing the memory structure on the fly with copying, transforms, etc.. is very expensive especially if this type of operation is repeated. B

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
HI Sorabh, Here is the sample query passed using REST API. This query is sent as a body and submit job(REST). convert_from function convert json string to json string object. Then flatten on the resultset defined in you query. Please let me know if anythng else is required. Our main gao SELECT

Re: Performace issue

2019-02-13 Thread Sorabh Hamirwasia
Hi Praveen, I am probably missing something here because I don't understand how are you feeding data to Drill in memory using the rest api. As you mentioned data has to be stored on disk or some db for Drill to fetch it. Can you please share the query profile for your query ? P.S. Attachments are

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
As per my understanding with Apache drill, it is based on the file store only. Please help me if i can create any plugins for the following use case 1. Create a json object and push to Apache drill in memory(cache). I can create json object in java, and if any api available from drill to push this

Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Hi Sorabh, Data is in json string format, sent over rest api. Using convert_from function to convert json string to json array and flatten the result array into multiple rows. Data is not stored in the disk. All data is in the memory. Thanks, Praveen On Tue, Feb 12, 2019 at 11:49 PM Sorabh

Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Our json data has 5000 objects, each object has around 40 attributes. Our data does not have any child rows, the reason we are using FLATTEN because we are sending the data using rest api post method. Using CONVERT_FROM function to format it into json in the memory(no storage plugin), as it is an

Re: Performace issue

2019-02-12 Thread Sorabh Hamirwasia
Hi Praveen, Can you also share what is the schema of your entire dataset and in what format it's stored? Thanks, Sorabh On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua wrote: > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh > I suspect that your 5MB JSON data might be

Re: Performace issue

2019-02-12 Thread Kunal Khatua
You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh  I suspect that your 5MB JSON data might be having a lot of objects, which need to be serialized in memory. FLATTEN has the problem that it replicates the data parent data for each child node that is being flattened into a

Re: Performace issue

2019-02-11 Thread PRAVEEN DEVERACHETTY
Thnks a lot Kunal. I am looking into that. I have one observation. With out flatten also, i tried to run a query of size 5MB, it is taking 5GB of heap? how do i control heap? Are there any settings i can modify. i am reading a lot, but nothing is working for me. It would be helpful how to control

Re: Performace issue

2019-02-11 Thread Kunal Khatua
This is a good starting point for understanding LATERAL-UNNEST and how it compares to the FLATTEN operator. https://drill.apache.org/docs/lateral-join/ On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote: Thanks Kunal. i am not getting how to use lateral-unrest as dataset does not have child

Re: Performace issue

2019-02-11 Thread PRAVEEN DEVERACHETTY
Thanks Kunal. i am not getting how to use lateral-unrest as dataset does not have child rows. All data is in array of json objects(as mentioned below). There are two json objects separated by comma and enclosed in squre bracket.

Re: Performace issue

2019-02-08 Thread Kunal Khatua
The memory (heap) would climb as it tries to flatten the JSON data. Have you tried looking at Drill's LateralJoin-Unnest feature? It was meant to address memory issues for some use cases of the FLATTEN operator. On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote: I am running a query with

Performace issue

2019-02-08 Thread PRAVEEN DEVERACHETTY
I am running a query with UNION ALL. as below select from ( select FLATTEN(t.jdata) as record from ((select convert_from(json string, json) union all (select conver_from(json_string,json) union all ... ) as jdata) ) as t) ems Reason for giving union all is because we are invoking a call using