Re: Performace issue

2019-02-14 Thread PRAVEEN DEVERACHETTY
On Thu, Feb 14, 2019 at 7:10 AM PRAVEEN DEVERACHETTY 
wrote:

> HI Sorabh,
>
> Here is the sample query passed using REST API. This query is sent as a
> body and submit job(REST).  convert_from function  convert json string to
> json string object. Then flatten on the resultset defined in you query.
> Please let me know if anythng else is required. Our main gao
>
> SELECT
>   'EMPSLS' as recordType,
>   ems.record.revenueCenterName as rvcName,
>   ems.record.revenueCenterNum as rvcNum,
>   ems.record.revenueCenterMasterName as rvcMasterName
> FROM (
>
> select FLATTEN(t.jdata) as record from
> (
> (select
> convert_from('[{"revenueCenterName":"Restaurant","revenueCenterNum":1,"revenueCenterMasterName":null,"revenueCenterMasterNum":null},
> {"revenueCenterName":"Restaurant","revenueCenterNum":2,"revenueCenterMasterName":null,"revenueCenterMasterNum":null}]',
> 'json') as jdata)
> ) as t) ems
>
> On Wed, Feb 13, 2019 at 11:18 PM Sorabh Hamirwasia 
> wrote:
>
>> Hi Praveen,
>> I am probably missing something here because I don't understand how are
>> you
>> feeding data to Drill in memory using the rest api. As you mentioned data
>> has to be stored on disk or some db for Drill to fetch it. Can you please
>> share the query profile for your query ?
>>
>> P.S. Attachments are not allowed through this mailing list. Would be great
>> if you can upload the profile somewhere (dropbox, gDrive, etc) and share
>> the link with us.
>>
>> Thanks,
>> Sorabh
>>
>> On Wed, Feb 13, 2019 at 7:58 AM PRAVEEN DEVERACHETTY > >
>> wrote:
>>
>> > As per my understanding with Apache drill, it is based on the file store
>> > only. Please help me if i can create any plugins for the following use
>> case
>> > 1. Create a json object and push to Apache drill in memory(cache). I can
>> > create json object in java, and if any api available from drill to push
>> > this file in the memory?
>> > 2. Read the file from the the memory(cahce) and execute queries by using
>> > that json object from the memory instead of the from the file system or
>> any
>> > data store.
>> >
>> > is it possible to do with apache drill? It would be really helpful to
>> > understand the changes i need to make.
>> >
>> > thanks,
>> > Praveen
>> >
>> > On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY <
>> pravee...@gmail.com
>> > >
>> > wrote:
>> >
>> > > Hi Sorabh, Data is in json string format, sent over rest api. Using
>> > > convert_from function to convert json string to json array and flatten
>> > the
>> > > result array into multiple rows. Data is not stored in the disk. All
>> data
>> > > is in the memory.
>> > >
>> > > Thanks,
>> > > Praveen
>> > >
>> > > On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia <
>> > > sohami.apa...@gmail.com> wrote:
>> > >
>> > >> Hi Praveen,
>> > >> Can you also share what is the schema of your entire dataset and in
>> what
>> > >> format it's stored?
>> > >>
>> > >> Thanks,
>> > >> Sorabh
>> > >>
>> > >> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua 
>> wrote:
>> > >>
>> > >> > You'll need to edit the memory settings in
>> > DRILL_HOME/conf/drill-env.sh
>> > >> > I suspect that your 5MB JSON data might be having a lot of objects,
>> > >> which
>> > >> > need to be serialized in memory.
>> > >> >
>> > >> > FLATTEN has the problem that it replicates the data parent data for
>> > each
>> > >> > child node that is being flattened into a row... so the resulting
>> data
>> > >> > being constructed in memory can grow significantly.
>> > >> > One way to work around (not elegant, but worth trying) would be to
>> > >> > generate intermediate flatten data and write temporary (if not
>> using
>> > >> WebUI)
>> > >> > tables and keep flattening out those records until you have a fully
>> > >> > flattened dataset to work with directly.
>> > >> >
>> > >> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY <
>> pravee...@gmail.com>
>> > >> > wrote:
>> > >> > Thnks a lot Kunal. I am looking into that. I have one observation.
>> > >> >
>> > >> > With out flatten also, i tried to run a query of size 5MB, it is
>> > taking
>> > >> 5GB
>> > >> > of heap? how do i control heap? Are there any settings i can
>> modify. i
>> > >> am
>> > >> > reading a lot, but nothing is working for me. It would be helpful
>> how
>> > to
>> > >> > control heap, i modified memory parameters based on the
>> documentation,
>> > >> it
>> > >> > is not working yet. it would be really helpful if i get some help
>> in
>> > >> this
>> > >> > regard. Thanks in advance.
>> > >> >
>> > >> > Regards
>> > >> > Praveen
>> > >> >
>> > >> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>> > >> >
>> > >> > > This is a good starting point for understanding LATERAL-UNNEST
>> and
>> > >> how it
>> > >> > > compares to the FLATTEN operator.
>> > >> > >
>> > >> > > https://drill.apache.org/docs/lateral-join/
>> > >> > >
>> > >> > >
>> > >> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
>> > >> > > Thanks Kunal.
>> > >> > > i am not getting how to use lateral-unrest 

RE: Performace issue

2019-02-13 Thread Lee, David
These days..

I use Python to read a jsonl file line by line into a python dictionary.
Convert the data (flatten, etc..) into a tabular record set.
Write the tabular data into parquet.
Read parquet using tools like Drill, etc.

JSON support is lacking in Drill and other tools because these SQL query tools 
are designed around structured data. JSON is not structured with a lot of 
gotchas.

You need a lot more flexibility to handle exceptions.

If you have in one json file something like:

[
{"name": "bob", "born": "1945-01-23", "died": "2019-04-03"},
{"name": "billy', "born": "1945-01-23", "died": "2019-04-03"},
{"name": "brian", "born": "1945-01-23", "died": "2019-04-03"},
]

This looks pretty clean and structured, but what if in a second file you have:

[
{"name": "moe", "born": "1945-01-23", "died": null},
{"name": "curly', "born": "1945-01-23", "died": null },
{"name": "larry", "born": "1945-01-23", "died": null },
]

What happens if you have a file with a lot of missing or extra data. Orphans..

[
{"name": "oliver", "born": null, "died": null, "adopted": null },
{"name": "annie', "born": null, "died": null. "adopted": {"by": "Daddy 
Warbucks", "when": "1934-01-04"}},
]

Changing the memory structure on the fly with copying, transforms, etc.. is 
very expensive especially if this type of operation is repeated.
Better to define up front what is possible.. We really need to include 
something like https://json-schema.org/ into the mix.

Streaming JSON data and dealing with records one at a time is a lot faster and 
memory efficient..

-Original Message-
From: PRAVEEN DEVERACHETTY  
Sent: Wednesday, February 13, 2019 5:41 PM
To: user@drill.apache.org
Subject: Re: Performace issue

External Email: Use caution with links and attachments


HI Sorabh,

Here is the sample query passed using REST API. This query is sent as a body 
and submit job(REST).  convert_from function  convert json string to json 
string object. Then flatten on the resultset defined in you query.
Please let me know if anythng else is required. Our main gao

SELECT
  'EMPSLS' as recordType,
  ems.record.revenueCenterName as rvcName,
  ems.record.revenueCenterNum as rvcNum,
  ems.record.revenueCenterMasterName as rvcMasterName FROM (

select FLATTEN(t.jdata) as record from
(
(select
convert_from('[{"revenueCenterName":"Restaurant","revenueCenterNum":1,"revenueCenterMasterName":null,"revenueCenterMasterNum":null},
{"revenueCenterName":"Restaurant","revenueCenterNum":2,"revenueCenterMasterName":null,"revenueCenterMasterNum":null}]',
'json') as jdata)
) as t) ems

On Wed, Feb 13, 2019 at 11:18 PM Sorabh Hamirwasia 
wrote:

> Hi Praveen,
> I am probably missing something here because I don't understand how 
> are you feeding data to Drill in memory using the rest api. As you 
> mentioned data has to be stored on disk or some db for Drill to fetch 
> it. Can you please share the query profile for your query ?
>
> P.S. Attachments are not allowed through this mailing list. Would be 
> great if you can upload the profile somewhere (dropbox, gDrive, etc) 
> and share the link with us.
>
> Thanks,
> Sorabh
>
> On Wed, Feb 13, 2019 at 7:58 AM PRAVEEN DEVERACHETTY 
> 
> wrote:
>
> > As per my understanding with Apache drill, it is based on the file 
> > store only. Please help me if i can create any plugins for the 
> > following use
> case
> > 1. Create a json object and push to Apache drill in memory(cache). I 
> > can create json object in java, and if any api available from drill 
> > to push this file in the memory?
> > 2. Read the file from the the memory(cahce) and execute queries by 
> > using that json object from the memory instead of the from the file 
> > system or
> any
> > data store.
> >
> > is it possible to do with apache drill? It would be really helpful 
> > to understand the changes i need to make.
> >
> > thanks,
> > Praveen
> >
> > On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY <
> pravee...@gmail.com
> > >
> > wrote:
> >
> > > Hi Sorabh, Data is in json string format, sent over rest api. 
> > > Using convert_from function to convert json string to json array 
> > > and flatten
> > the
> > > result array into multiple rows. Data is n

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
HI Sorabh,

Here is the sample query passed using REST API. This query is sent as a
body and submit job(REST).  convert_from function  convert json string to
json string object. Then flatten on the resultset defined in you query.
Please let me know if anythng else is required. Our main gao

SELECT
  'EMPSLS' as recordType,
  ems.record.revenueCenterName as rvcName,
  ems.record.revenueCenterNum as rvcNum,
  ems.record.revenueCenterMasterName as rvcMasterName
FROM (

select FLATTEN(t.jdata) as record from
(
(select
convert_from('[{"revenueCenterName":"Restaurant","revenueCenterNum":1,"revenueCenterMasterName":null,"revenueCenterMasterNum":null},
{"revenueCenterName":"Restaurant","revenueCenterNum":2,"revenueCenterMasterName":null,"revenueCenterMasterNum":null}]',
'json') as jdata)
) as t) ems

On Wed, Feb 13, 2019 at 11:18 PM Sorabh Hamirwasia 
wrote:

> Hi Praveen,
> I am probably missing something here because I don't understand how are you
> feeding data to Drill in memory using the rest api. As you mentioned data
> has to be stored on disk or some db for Drill to fetch it. Can you please
> share the query profile for your query ?
>
> P.S. Attachments are not allowed through this mailing list. Would be great
> if you can upload the profile somewhere (dropbox, gDrive, etc) and share
> the link with us.
>
> Thanks,
> Sorabh
>
> On Wed, Feb 13, 2019 at 7:58 AM PRAVEEN DEVERACHETTY 
> wrote:
>
> > As per my understanding with Apache drill, it is based on the file store
> > only. Please help me if i can create any plugins for the following use
> case
> > 1. Create a json object and push to Apache drill in memory(cache). I can
> > create json object in java, and if any api available from drill to push
> > this file in the memory?
> > 2. Read the file from the the memory(cahce) and execute queries by using
> > that json object from the memory instead of the from the file system or
> any
> > data store.
> >
> > is it possible to do with apache drill? It would be really helpful to
> > understand the changes i need to make.
> >
> > thanks,
> > Praveen
> >
> > On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY <
> pravee...@gmail.com
> > >
> > wrote:
> >
> > > Hi Sorabh, Data is in json string format, sent over rest api. Using
> > > convert_from function to convert json string to json array and flatten
> > the
> > > result array into multiple rows. Data is not stored in the disk. All
> data
> > > is in the memory.
> > >
> > > Thanks,
> > > Praveen
> > >
> > > On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia <
> > > sohami.apa...@gmail.com> wrote:
> > >
> > >> Hi Praveen,
> > >> Can you also share what is the schema of your entire dataset and in
> what
> > >> format it's stored?
> > >>
> > >> Thanks,
> > >> Sorabh
> > >>
> > >> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua 
> wrote:
> > >>
> > >> > You'll need to edit the memory settings in
> > DRILL_HOME/conf/drill-env.sh
> > >> > I suspect that your 5MB JSON data might be having a lot of objects,
> > >> which
> > >> > need to be serialized in memory.
> > >> >
> > >> > FLATTEN has the problem that it replicates the data parent data for
> > each
> > >> > child node that is being flattened into a row... so the resulting
> data
> > >> > being constructed in memory can grow significantly.
> > >> > One way to work around (not elegant, but worth trying) would be to
> > >> > generate intermediate flatten data and write temporary (if not using
> > >> WebUI)
> > >> > tables and keep flattening out those records until you have a fully
> > >> > flattened dataset to work with directly.
> > >> >
> > >> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY  >
> > >> > wrote:
> > >> > Thnks a lot Kunal. I am looking into that. I have one observation.
> > >> >
> > >> > With out flatten also, i tried to run a query of size 5MB, it is
> > taking
> > >> 5GB
> > >> > of heap? how do i control heap? Are there any settings i can
> modify. i
> > >> am
> > >> > reading a lot, but nothing is working for me. It would be helpful
> how
> > to
> > >> > control heap, i modified memory parameters based on the
> documentation,
> > >> it
> > >> > is not working yet. it would be really helpful if i get some help in
> > >> this
> > >> > regard. Thanks in advance.
> > >> >
> > >> > Regards
> > >> > Praveen
> > >> >
> > >> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
> > >> >
> > >> > > This is a good starting point for understanding LATERAL-UNNEST and
> > >> how it
> > >> > > compares to the FLATTEN operator.
> > >> > >
> > >> > > https://drill.apache.org/docs/lateral-join/
> > >> > >
> > >> > >
> > >> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > >> > > Thanks Kunal.
> > >> > > i am not getting how to use lateral-unrest as dataset does not
> have
> > >> child
> > >> > > rows. All data is in array of json objects(as mentioned below).
> > There
> > >> are
> > >> > > two json objects separated by comma and enclosed in squre bracket.
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> 

Re: Performace issue

2019-02-13 Thread Sorabh Hamirwasia
Hi Praveen,
I am probably missing something here because I don't understand how are you
feeding data to Drill in memory using the rest api. As you mentioned data
has to be stored on disk or some db for Drill to fetch it. Can you please
share the query profile for your query ?

P.S. Attachments are not allowed through this mailing list. Would be great
if you can upload the profile somewhere (dropbox, gDrive, etc) and share
the link with us.

Thanks,
Sorabh

On Wed, Feb 13, 2019 at 7:58 AM PRAVEEN DEVERACHETTY 
wrote:

> As per my understanding with Apache drill, it is based on the file store
> only. Please help me if i can create any plugins for the following use case
> 1. Create a json object and push to Apache drill in memory(cache). I can
> create json object in java, and if any api available from drill to push
> this file in the memory?
> 2. Read the file from the the memory(cahce) and execute queries by using
> that json object from the memory instead of the from the file system or any
> data store.
>
> is it possible to do with apache drill? It would be really helpful to
> understand the changes i need to make.
>
> thanks,
> Praveen
>
> On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY  >
> wrote:
>
> > Hi Sorabh, Data is in json string format, sent over rest api. Using
> > convert_from function to convert json string to json array and flatten
> the
> > result array into multiple rows. Data is not stored in the disk. All data
> > is in the memory.
> >
> > Thanks,
> > Praveen
> >
> > On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia <
> > sohami.apa...@gmail.com> wrote:
> >
> >> Hi Praveen,
> >> Can you also share what is the schema of your entire dataset and in what
> >> format it's stored?
> >>
> >> Thanks,
> >> Sorabh
> >>
> >> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua  wrote:
> >>
> >> > You'll need to edit the memory settings in
> DRILL_HOME/conf/drill-env.sh
> >> > I suspect that your 5MB JSON data might be having a lot of objects,
> >> which
> >> > need to be serialized in memory.
> >> >
> >> > FLATTEN has the problem that it replicates the data parent data for
> each
> >> > child node that is being flattened into a row... so the resulting data
> >> > being constructed in memory can grow significantly.
> >> > One way to work around (not elegant, but worth trying) would be to
> >> > generate intermediate flatten data and write temporary (if not using
> >> WebUI)
> >> > tables and keep flattening out those records until you have a fully
> >> > flattened dataset to work with directly.
> >> >
> >> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY 
> >> > wrote:
> >> > Thnks a lot Kunal. I am looking into that. I have one observation.
> >> >
> >> > With out flatten also, i tried to run a query of size 5MB, it is
> taking
> >> 5GB
> >> > of heap? how do i control heap? Are there any settings i can modify. i
> >> am
> >> > reading a lot, but nothing is working for me. It would be helpful how
> to
> >> > control heap, i modified memory parameters based on the documentation,
> >> it
> >> > is not working yet. it would be really helpful if i get some help in
> >> this
> >> > regard. Thanks in advance.
> >> >
> >> > Regards
> >> > Praveen
> >> >
> >> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
> >> >
> >> > > This is a good starting point for understanding LATERAL-UNNEST and
> >> how it
> >> > > compares to the FLATTEN operator.
> >> > >
> >> > > https://drill.apache.org/docs/lateral-join/
> >> > >
> >> > >
> >> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> >> > > Thanks Kunal.
> >> > > i am not getting how to use lateral-unrest as dataset does not have
> >> child
> >> > > rows. All data is in array of json objects(as mentioned below).
> There
> >> are
> >> > > two json objects separated by comma and enclosed in squre bracket.
> >> > >
> >> > >
> >> >
> >>
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> >> > >
> >> > > We are using drill from Java. Through a rest invocation. Not using
> >> json
> >> > > files. All data is sent over post as string. We are using
> convert_from
> >> > > function in the query to convert into json objects. As we are
> sending
> >> > array
> >> > > of json objects, using FLATTEN operator to convert into multiple
> >> rows. is
> >> > > there any way to avoid Flatten, as we see huge spike for 54MB data,
> >> going
> >> > > to 24GB and still failing with heap error. not sure what is wrong.
> >> Can i
> >> > > use FLATTEN on the entire data set? There are almost 54K records
> that
> >> is
> >> > > getting FLATTENED.
> >> > >
> >> > > example query: 1)first converted into array of json objects 2)
> >> flatten to
> >> > > convert into multiple rows
> >> > > select ems.* from (select flatten(t.jdata) as record from (select
> >> > >
> >> > >
> >> >
> >>
> 

Re: Performace issue

2019-02-13 Thread PRAVEEN DEVERACHETTY
As per my understanding with Apache drill, it is based on the file store
only. Please help me if i can create any plugins for the following use case
1. Create a json object and push to Apache drill in memory(cache). I can
create json object in java, and if any api available from drill to push
this file in the memory?
2. Read the file from the the memory(cahce) and execute queries by using
that json object from the memory instead of the from the file system or any
data store.

is it possible to do with apache drill? It would be really helpful to
understand the changes i need to make.

thanks,
Praveen

On Wed, Feb 13, 2019 at 11:46 AM PRAVEEN DEVERACHETTY 
wrote:

> Hi Sorabh, Data is in json string format, sent over rest api. Using
> convert_from function to convert json string to json array and flatten the
> result array into multiple rows. Data is not stored in the disk. All data
> is in the memory.
>
> Thanks,
> Praveen
>
> On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia <
> sohami.apa...@gmail.com> wrote:
>
>> Hi Praveen,
>> Can you also share what is the schema of your entire dataset and in what
>> format it's stored?
>>
>> Thanks,
>> Sorabh
>>
>> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua  wrote:
>>
>> > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
>> > I suspect that your 5MB JSON data might be having a lot of objects,
>> which
>> > need to be serialized in memory.
>> >
>> > FLATTEN has the problem that it replicates the data parent data for each
>> > child node that is being flattened into a row... so the resulting data
>> > being constructed in memory can grow significantly.
>> > One way to work around (not elegant, but worth trying) would be to
>> > generate intermediate flatten data and write temporary (if not using
>> WebUI)
>> > tables and keep flattening out those records until you have a fully
>> > flattened dataset to work with directly.
>> >
>> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY 
>> > wrote:
>> > Thnks a lot Kunal. I am looking into that. I have one observation.
>> >
>> > With out flatten also, i tried to run a query of size 5MB, it is taking
>> 5GB
>> > of heap? how do i control heap? Are there any settings i can modify. i
>> am
>> > reading a lot, but nothing is working for me. It would be helpful how to
>> > control heap, i modified memory parameters based on the documentation,
>> it
>> > is not working yet. it would be really helpful if i get some help in
>> this
>> > regard. Thanks in advance.
>> >
>> > Regards
>> > Praveen
>> >
>> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>> >
>> > > This is a good starting point for understanding LATERAL-UNNEST and
>> how it
>> > > compares to the FLATTEN operator.
>> > >
>> > > https://drill.apache.org/docs/lateral-join/
>> > >
>> > >
>> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
>> > > Thanks Kunal.
>> > > i am not getting how to use lateral-unrest as dataset does not have
>> child
>> > > rows. All data is in array of json objects(as mentioned below). There
>> are
>> > > two json objects separated by comma and enclosed in squre bracket.
>> > >
>> > >
>> >
>> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
>> > >
>> > > We are using drill from Java. Through a rest invocation. Not using
>> json
>> > > files. All data is sent over post as string. We are using convert_from
>> > > function in the query to convert into json objects. As we are sending
>> > array
>> > > of json objects, using FLATTEN operator to convert into multiple
>> rows. is
>> > > there any way to avoid Flatten, as we see huge spike for 54MB data,
>> going
>> > > to 24GB and still failing with heap error. not sure what is wrong.
>> Can i
>> > > use FLATTEN on the entire data set? There are almost 54K records that
>> is
>> > > getting FLATTENED.
>> > >
>> > > example query: 1)first converted into array of json objects 2)
>> flatten to
>> > > convert into multiple rows
>> > > select ems.* from (select flatten(t.jdata) as record from (select
>> > >
>> > >
>> >
>> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
>> > > as jdata) as t) ems
>> > >
>> > >
>> > > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
>> > >
>> > > > The memory (heap) would climb as it tries to flatten the JSON data.
>> > Have
>> > > > you tried looking at Drill's LateralJoin-Unnest feature? It was
>> meant
>> > to
>> > > > address memory issues for some use cases of the FLATTEN operator.
>> > > >
>> > > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
>> > > > I am running a query with UNION ALL. as below
>> > > >
>> > > > select
>> > > > from ( select FLATTEN(t.jdata) as record from
>> > > > ((select convert_from(json string, json) union all
>> > > > (select conver_from(json_string,json) union all
>> > > > ...
>> > > > ) as jdata) ) as t) ems
>> > > >
>> > > > Reason for giving 

Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Hi Sorabh, Data is in json string format, sent over rest api. Using
convert_from function to convert json string to json array and flatten the
result array into multiple rows. Data is not stored in the disk. All data
is in the memory.

Thanks,
Praveen

On Tue, Feb 12, 2019 at 11:49 PM Sorabh Hamirwasia 
wrote:

> Hi Praveen,
> Can you also share what is the schema of your entire dataset and in what
> format it's stored?
>
> Thanks,
> Sorabh
>
> On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua  wrote:
>
> > You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> > I suspect that your 5MB JSON data might be having a lot of objects, which
> > need to be serialized in memory.
> >
> > FLATTEN has the problem that it replicates the data parent data for each
> > child node that is being flattened into a row... so the resulting data
> > being constructed in memory can grow significantly.
> > One way to work around (not elegant, but worth trying) would be to
> > generate intermediate flatten data and write temporary (if not using
> WebUI)
> > tables and keep flattening out those records until you have a fully
> > flattened dataset to work with directly.
> >
> > On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY 
> > wrote:
> > Thnks a lot Kunal. I am looking into that. I have one observation.
> >
> > With out flatten also, i tried to run a query of size 5MB, it is taking
> 5GB
> > of heap? how do i control heap? Are there any settings i can modify. i am
> > reading a lot, but nothing is working for me. It would be helpful how to
> > control heap, i modified memory parameters based on the documentation, it
> > is not working yet. it would be really helpful if i get some help in this
> > regard. Thanks in advance.
> >
> > Regards
> > Praveen
> >
> > On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
> >
> > > This is a good starting point for understanding LATERAL-UNNEST and how
> it
> > > compares to the FLATTEN operator.
> > >
> > > https://drill.apache.org/docs/lateral-join/
> > >
> > >
> > > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > > Thanks Kunal.
> > > i am not getting how to use lateral-unrest as dataset does not have
> child
> > > rows. All data is in array of json objects(as mentioned below). There
> are
> > > two json objects separated by comma and enclosed in squre bracket.
> > >
> > >
> >
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> > >
> > > We are using drill from Java. Through a rest invocation. Not using json
> > > files. All data is sent over post as string. We are using convert_from
> > > function in the query to convert into json objects. As we are sending
> > array
> > > of json objects, using FLATTEN operator to convert into multiple rows.
> is
> > > there any way to avoid Flatten, as we see huge spike for 54MB data,
> going
> > > to 24GB and still failing with heap error. not sure what is wrong. Can
> i
> > > use FLATTEN on the entire data set? There are almost 54K records that
> is
> > > getting FLATTENED.
> > >
> > > example query: 1)first converted into array of json objects 2) flatten
> to
> > > convert into multiple rows
> > > select ems.* from (select flatten(t.jdata) as record from (select
> > >
> > >
> >
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> > > as jdata) as t) ems
> > >
> > >
> > > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
> > >
> > > > The memory (heap) would climb as it tries to flatten the JSON data.
> > Have
> > > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant
> > to
> > > > address memory issues for some use cases of the FLATTEN operator.
> > > >
> > > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > > > I am running a query with UNION ALL. as below
> > > >
> > > > select
> > > > from ( select FLATTEN(t.jdata) as record from
> > > > ((select convert_from(json string, json) union all
> > > > (select conver_from(json_string,json) union all
> > > > ...
> > > > ) as jdata) ) as t) ems
> > > >
> > > > Reason for giving union all is because we are invoking a call using
> > rest
> > > > app, there is limitation of 20,000 when we use convert_from function.
> > Our
> > > > heap size is 8GB, server is 8core. From profiling, it shows this
> > > perticula
> > > > query spikes from 100MB to 8GB continuously. is there anything i am
> > > > doing wrong?.
> > > >
> > > > Thanks,
> > > > Prveen
> > > >
> > >
> >
>


Re: Performace issue

2019-02-12 Thread PRAVEEN DEVERACHETTY
Our json data has 5000 objects, each object has around 40 attributes. Our
data does not have any child rows, the reason we are using FLATTEN because
we are sending the data using rest api post method. Using CONVERT_FROM
function to format it into json in the memory(no storage plugin), as it is
an array of json objects, we are flattenning to multiple rows. On the
flattened data, we are running query as discribed above.

Our data is already flat, there are no nested data. Yes i am passing all
the configuration and monitoring using jmc profiling.

 I suspect CONVERT_FROM function is causing the spike? not sure.

Our main requirement is to use drill with out any storage plugin as it
leads to security issues having data in the disk. We are looking a solution
to run queries using in-memory solution, like generate data from db and
pass this data as json string to drill using rest-api, along with the ansi
query on the json string. Is there anyway i can write plugin for this
response and convert it to json? looks like convert_from function is not
working as planned for us.

On Tue, Feb 12, 2019 at 11:32 PM Kunal Khatua  wrote:

> You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> I suspect that your 5MB JSON data might be having a lot of objects, which
> need to be serialized in memory.
>
> FLATTEN has the problem that it replicates the data parent data for each
> child node that is being flattened into a row... so the resulting data
> being constructed in memory can grow significantly.
> One way to work around (not elegant, but worth trying) would be to
> generate intermediate flatten data and write temporary (if not using WebUI)
> tables and keep flattening out those records until you have a fully
> flattened dataset to work with directly.
>
> On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY 
> wrote:
> Thnks a lot Kunal. I am looking into that. I have one observation.
>
> With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
> of heap? how do i control heap? Are there any settings i can modify. i am
> reading a lot, but nothing is working for me. It would be helpful how to
> control heap, i modified memory parameters based on the documentation, it
> is not working yet. it would be really helpful if i get some help in this
> regard. Thanks in advance.
>
> Regards
> Praveen
>
> On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>
> > This is a good starting point for understanding LATERAL-UNNEST and how it
> > compares to the FLATTEN operator.
> >
> > https://drill.apache.org/docs/lateral-join/
> >
> >
> > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > Thanks Kunal.
> > i am not getting how to use lateral-unrest as dataset does not have child
> > rows. All data is in array of json objects(as mentioned below). There are
> > two json objects separated by comma and enclosed in squre bracket.
> >
> >
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> >
> > We are using drill from Java. Through a rest invocation. Not using json
> > files. All data is sent over post as string. We are using convert_from
> > function in the query to convert into json objects. As we are sending
> array
> > of json objects, using FLATTEN operator to convert into multiple rows. is
> > there any way to avoid Flatten, as we see huge spike for 54MB data, going
> > to 24GB and still failing with heap error. not sure what is wrong. Can i
> > use FLATTEN on the entire data set? There are almost 54K records that is
> > getting FLATTENED.
> >
> > example query: 1)first converted into array of json objects 2) flatten to
> > convert into multiple rows
> > select ems.* from (select flatten(t.jdata) as record from (select
> >
> >
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> > as jdata) as t) ems
> >
> >
> > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
> >
> > > The memory (heap) would climb as it tries to flatten the JSON data.
> Have
> > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant
> to
> > > address memory issues for some use cases of the FLATTEN operator.
> > >
> > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > > I am running a query with UNION ALL. as below
> > >
> > > select
> > > from ( select FLATTEN(t.jdata) as record from
> > > ((select convert_from(json string, json) union all
> > > (select conver_from(json_string,json) union all
> > > ...
> > > ) as jdata) ) as t) ems
> > >
> > > Reason for giving union all is because we are invoking a call using
> rest
> > > app, there is limitation of 20,000 when we use convert_from function.
> Our
> > > heap size is 8GB, server is 8core. From profiling, it shows this
> > perticula
> > > query spikes from 100MB to 8GB continuously. is there anything i am
> > > doing wrong?.
> > >
> > > Thanks,
> > > Prveen
> > >
> >
>


Re: Performace issue

2019-02-12 Thread Sorabh Hamirwasia
Hi Praveen,
Can you also share what is the schema of your entire dataset and in what
format it's stored?

Thanks,
Sorabh

On Tue, Feb 12, 2019 at 10:02 AM Kunal Khatua  wrote:

> You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh
> I suspect that your 5MB JSON data might be having a lot of objects, which
> need to be serialized in memory.
>
> FLATTEN has the problem that it replicates the data parent data for each
> child node that is being flattened into a row... so the resulting data
> being constructed in memory can grow significantly.
> One way to work around (not elegant, but worth trying) would be to
> generate intermediate flatten data and write temporary (if not using WebUI)
> tables and keep flattening out those records until you have a fully
> flattened dataset to work with directly.
>
> On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY 
> wrote:
> Thnks a lot Kunal. I am looking into that. I have one observation.
>
> With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
> of heap? how do i control heap? Are there any settings i can modify. i am
> reading a lot, but nothing is working for me. It would be helpful how to
> control heap, i modified memory parameters based on the documentation, it
> is not working yet. it would be really helpful if i get some help in this
> regard. Thanks in advance.
>
> Regards
> Praveen
>
> On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:
>
> > This is a good starting point for understanding LATERAL-UNNEST and how it
> > compares to the FLATTEN operator.
> >
> > https://drill.apache.org/docs/lateral-join/
> >
> >
> > On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> > Thanks Kunal.
> > i am not getting how to use lateral-unrest as dataset does not have child
> > rows. All data is in array of json objects(as mentioned below). There are
> > two json objects separated by comma and enclosed in squre bracket.
> >
> >
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
> >
> > We are using drill from Java. Through a rest invocation. Not using json
> > files. All data is sent over post as string. We are using convert_from
> > function in the query to convert into json objects. As we are sending
> array
> > of json objects, using FLATTEN operator to convert into multiple rows. is
> > there any way to avoid Flatten, as we see huge spike for 54MB data, going
> > to 24GB and still failing with heap error. not sure what is wrong. Can i
> > use FLATTEN on the entire data set? There are almost 54K records that is
> > getting FLATTENED.
> >
> > example query: 1)first converted into array of json objects 2) flatten to
> > convert into multiple rows
> > select ems.* from (select flatten(t.jdata) as record from (select
> >
> >
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> > as jdata) as t) ems
> >
> >
> > On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
> >
> > > The memory (heap) would climb as it tries to flatten the JSON data.
> Have
> > > you tried looking at Drill's LateralJoin-Unnest feature? It was meant
> to
> > > address memory issues for some use cases of the FLATTEN operator.
> > >
> > > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > > I am running a query with UNION ALL. as below
> > >
> > > select
> > > from ( select FLATTEN(t.jdata) as record from
> > > ((select convert_from(json string, json) union all
> > > (select conver_from(json_string,json) union all
> > > ...
> > > ) as jdata) ) as t) ems
> > >
> > > Reason for giving union all is because we are invoking a call using
> rest
> > > app, there is limitation of 20,000 when we use convert_from function.
> Our
> > > heap size is 8GB, server is 8core. From profiling, it shows this
> > perticula
> > > query spikes from 100MB to 8GB continuously. is there anything i am
> > > doing wrong?.
> > >
> > > Thanks,
> > > Prveen
> > >
> >
>


Re: Performace issue

2019-02-12 Thread Kunal Khatua
You'll need to edit the memory settings in DRILL_HOME/conf/drill-env.sh 
I suspect that your 5MB JSON data might be having a lot of objects, which need 
to be serialized in memory.

FLATTEN has the problem that it replicates the data parent data for each child 
node that is being flattened into a row... so the resulting data being 
constructed in memory can grow significantly. 
One way to work around (not elegant, but worth trying) would be to generate 
intermediate flatten data and write temporary (if not using WebUI) tables and 
keep flattening out those records until you have a fully flattened dataset to 
work with directly.

On 2/11/2019 10:37:58 PM, PRAVEEN DEVERACHETTY  wrote:
Thnks a lot Kunal. I am looking into that. I have one observation.

With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
of heap? how do i control heap? Are there any settings i can modify. i am
reading a lot, but nothing is working for me. It would be helpful how to
control heap, i modified memory parameters based on the documentation, it
is not working yet. it would be really helpful if i get some help in this
regard. Thanks in advance.

Regards
Praveen

On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua wrote:

> This is a good starting point for understanding LATERAL-UNNEST and how it
> compares to the FLATTEN operator.
>
> https://drill.apache.org/docs/lateral-join/
>
>
> On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY wrote:
> Thanks Kunal.
> i am not getting how to use lateral-unrest as dataset does not have child
> rows. All data is in array of json objects(as mentioned below). There are
> two json objects separated by comma and enclosed in squre bracket.
>
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
>
> We are using drill from Java. Through a rest invocation. Not using json
> files. All data is sent over post as string. We are using convert_from
> function in the query to convert into json objects. As we are sending array
> of json objects, using FLATTEN operator to convert into multiple rows. is
> there any way to avoid Flatten, as we see huge spike for 54MB data, going
> to 24GB and still failing with heap error. not sure what is wrong. Can i
> use FLATTEN on the entire data set? There are almost 54K records that is
> getting FLATTENED.
>
> example query: 1)first converted into array of json objects 2) flatten to
> convert into multiple rows
> select ems.* from (select flatten(t.jdata) as record from (select
>
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> as jdata) as t) ems
>
>
> On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
>
> > The memory (heap) would climb as it tries to flatten the JSON data. Have
> > you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> > address memory issues for some use cases of the FLATTEN operator.
> >
> > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > I am running a query with UNION ALL. as below
> >
> > select
> > from ( select FLATTEN(t.jdata) as record from
> > ((select convert_from(json string, json) union all
> > (select conver_from(json_string,json) union all
> > ...
> > ) as jdata) ) as t) ems
> >
> > Reason for giving union all is because we are invoking a call using rest
> > app, there is limitation of 20,000 when we use convert_from function. Our
> > heap size is 8GB, server is 8core. From profiling, it shows this
> perticula
> > query spikes from 100MB to 8GB continuously. is there anything i am
> > doing wrong?.
> >
> > Thanks,
> > Prveen
> >
>


Re: Performace issue

2019-02-11 Thread PRAVEEN DEVERACHETTY
Thnks a lot Kunal. I am looking into that. I have one observation.

With out flatten also, i tried to run a query of size 5MB, it is taking 5GB
of heap? how do i control heap? Are there any settings i can modify. i am
reading a lot, but nothing is working for me. It would be helpful how to
control heap, i modified memory parameters based on the documentation, it
is not working yet. it would be really helpful if i get some help in this
regard. Thanks in advance.

Regards
Praveen

On Tue, Feb 12, 2019 at 11:18 AM Kunal Khatua  wrote:

> This is a good starting point for understanding LATERAL-UNNEST and how it
> compares to the FLATTEN operator.
>
> https://drill.apache.org/docs/lateral-join/
>
>
> On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY  wrote:
> Thanks Kunal.
> i am not getting how to use lateral-unrest as dataset does not have child
> rows. All data is in array of json objects(as mentioned below). There are
> two json objects separated by comma and enclosed in squre bracket.
>
> [{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]
>
> We are using drill from Java. Through a rest invocation. Not using json
> files. All data is sent over post as string. We are using convert_from
> function in the query to convert into json objects. As we are sending array
> of json objects, using FLATTEN operator to convert into multiple rows. is
> there any way to avoid Flatten, as we see huge spike for 54MB data, going
> to 24GB and still failing with heap error. not sure what is wrong. Can i
> use FLATTEN on the entire data set? There are almost 54K records that is
> getting FLATTENED.
>
> example query: 1)first converted into array of json objects 2) flatten to
> convert into multiple rows
> select ems.* from (select flatten(t.jdata) as record from (select
>
> convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
> as jdata) as t) ems
>
>
> On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:
>
> > The memory (heap) would climb as it tries to flatten the JSON data. Have
> > you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> > address memory issues for some use cases of the FLATTEN operator.
> >
> > On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> > I am running a query with UNION ALL. as below
> >
> > select
> > from ( select FLATTEN(t.jdata) as record from
> > ((select convert_from(json string, json) union all
> > (select conver_from(json_string,json) union all
> > ...
> > ) as jdata) ) as t) ems
> >
> > Reason for giving union all is because we are invoking a call using rest
> > app, there is limitation of 20,000 when we use convert_from function. Our
> > heap size is 8GB, server is 8core. From profiling, it shows this
> perticula
> > query spikes from 100MB to 8GB continuously. is there anything i am
> > doing wrong?.
> >
> > Thanks,
> > Prveen
> >
>


Re: Performace issue

2019-02-11 Thread Kunal Khatua
This is a good starting point for understanding LATERAL-UNNEST and how it 
compares to the FLATTEN operator.

https://drill.apache.org/docs/lateral-join/


On 2/11/2019 9:03:42 PM, PRAVEEN DEVERACHETTY  wrote:
Thanks Kunal.
i am not getting how to use lateral-unrest as dataset does not have child
rows. All data is in array of json objects(as mentioned below). There are
two json objects separated by comma and enclosed in squre bracket.
[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]

We are using drill from Java. Through a rest invocation. Not using json
files. All data is sent over post as string. We are using convert_from
function in the query to convert into json objects. As we are sending array
of json objects, using FLATTEN operator to convert into multiple rows. is
there any way to avoid Flatten, as we see huge spike for 54MB data, going
to 24GB and still failing with heap error. not sure what is wrong. Can i
use FLATTEN on the entire data set? There are almost 54K records that is
getting FLATTENED.

example query: 1)first converted into array of json objects 2) flatten to
convert into multiple rows
select ems.* from (select flatten(t.jdata) as record from (select
convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
as jdata) as t) ems


On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua wrote:

> The memory (heap) would climb as it tries to flatten the JSON data. Have
> you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> address memory issues for some use cases of the FLATTEN operator.
>
> On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY wrote:
> I am running a query with UNION ALL. as below
>
> select
> from ( select FLATTEN(t.jdata) as record from
> ((select convert_from(json string, json) union all
> (select conver_from(json_string,json) union all
> ...
> ) as jdata) ) as t) ems
>
> Reason for giving union all is because we are invoking a call using rest
> app, there is limitation of 20,000 when we use convert_from function. Our
> heap size is 8GB, server is 8core. From profiling, it shows this perticula
> query spikes from 100MB to 8GB continuously. is there anything i am
> doing wrong?.
>
> Thanks,
> Prveen
>


Re: Performace issue

2019-02-11 Thread PRAVEEN DEVERACHETTY
Thanks Kunal.
i am not getting how to use lateral-unrest as dataset does not have child
rows. All data is in array of json objects(as mentioned below). There are
two json objects separated by comma and enclosed in squre bracket.
[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}]

We are using drill from Java. Through a rest invocation. Not using json
files. All data is sent over post as string. We are using convert_from
function in the query to convert into json objects. As we are sending array
of json objects, using FLATTEN operator to convert into multiple rows. is
there any way to avoid Flatten, as we see huge spike for 54MB data, going
to 24GB and still failing with heap error. not sure what is wrong. Can i
use FLATTEN on the entire data set? There are almost 54K records that is
getting FLATTENED.

example query: 1)first converted into array of json objects 2) flatten to
convert into multiple rows
select ems.* from (select flatten(t.jdata) as record from (select
convert_from('[{"Location":"100","FirstName":"test1"},{"Location":"100","FirstName":"test2"},{"Location":"101","FirstName":"test3"}..]')
as jdata) as t) ems


On Sat, Feb 9, 2019 at 1:37 AM Kunal Khatua  wrote:

> The memory (heap) would climb as it tries to flatten the JSON data. Have
> you tried looking at Drill's LateralJoin-Unnest feature? It was meant to
> address memory issues for some use cases of the FLATTEN operator.
>
> On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY  wrote:
> I am running a query with UNION ALL. as below
>
> select
> from ( select FLATTEN(t.jdata) as record from
> ((select convert_from(json string, json) union all
> (select conver_from(json_string,json) union all
> ...
> ) as jdata) ) as t) ems
>
> Reason for giving union all is because we are invoking a call using rest
> app, there is limitation of 20,000 when we use convert_from function. Our
> heap size is 8GB, server is 8core. From profiling, it shows this perticula
> query spikes from 100MB to 8GB continuously. is there anything i am
> doing wrong?.
>
> Thanks,
> Prveen
>


Re: Performace issue

2019-02-08 Thread Kunal Khatua
The memory (heap) would climb as it tries to flatten the JSON data. Have you 
tried looking at Drill's LateralJoin-Unnest feature? It was meant to address 
memory issues for some use cases of the FLATTEN operator.

On 2/8/2019 5:17:01 AM, PRAVEEN DEVERACHETTY  wrote:
I am running a query with UNION ALL. as below

select
from ( select FLATTEN(t.jdata) as record from
((select convert_from(json string, json) union all
(select conver_from(json_string,json) union all
...
) as jdata) ) as t) ems

Reason for giving union all is because we are invoking a call using rest
app, there is limitation of 20,000 when we use convert_from function. Our
heap size is 8GB, server is 8core. From profiling, it shows this perticula
query spikes from 100MB to 8GB continuously. is there anything i am
doing wrong?.

Thanks,
Prveen


Performace issue

2019-02-08 Thread PRAVEEN DEVERACHETTY
I am running a query with UNION ALL. as below

select 
from ( select FLATTEN(t.jdata) as record from
((select convert_from(json string, json) union all
 (select conver_from(json_string,json) union all
...
) as jdata) ) as t) ems

Reason for giving union all is because we are invoking a call using rest
app, there is limitation of 20,000 when we use convert_from function. Our
heap size is 8GB, server is 8core. From profiling, it shows this perticula
query spikes from 100MB to 8GB continuously. is there anything i am
doing wrong?.

Thanks,
Prveen