RE: Drill error with large sort

2016-02-26 Thread Paul Friedman
Thanks for this, these parameters fixed it!

---Paul


-Original Message-
From: Abdel Hakim Deneche [mailto:adene...@maprtech.com]
Sent: Thursday, February 25, 2016 5:32 PM
To: user <user@drill.apache.org>
Subject: Re: Drill error with large sort

Not so short answer:

In Drill 1.5 (I assume you are using 1.5) we have an improved allocator that
better tracks how much memory each operator is using. In your case it seems
that the date has very wide columns that are causing Sort to choke on the
very first batch of data (1024 records taking up 224MB!!!) because it's way
more than it's memory limit (around 178MB in your particular case).
Drill uses a fancy equation to compute this limit and increasing the
aforementioned option will increase the sort limit. More details here:

http://drill.apache.org/docs/configuring-drill-memory/

On Thu, Feb 25, 2016 at 5:26 PM, Abdel Hakim Deneche <adene...@maprtech.com>
wrote:

> Short answer:
>
> increase the value of planner.memory.max_query_memory_per_node, by
> default it's set to 2GB, try setting to 4 or even 8GB. This should get
> the query to pass.
>
> On Thu, Feb 25, 2016 at 5:24 PM, Jeff Maass <jma...@cccis.com> wrote:
>
>>
>> If you are open to changing the query:
>>   # try removing the functions on the 5th column
>>   # is there any way you could further limit the query?
>>   # does the query finish if u add a limit / top clause?
>>   # what do the logs say?
>>
>> 
>> From: Paul Friedman <paul.fried...@streetlightdata.com>
>> Sent: Thursday, February 25, 2016 7:07:12 PM
>> To: user@drill.apache.org
>> Subject: Drill error with large sort
>>
>> I’ve got a query reading from a large directory of parquet files (41
>> GB) and I’m consistently getting this error:
>>
>>
>>
>> Error: RESOURCE ERROR: One or more nodes ran out of memory while
>> executing the query.
>>
>>
>>
>> Unable to allocate sv2 for 1023 records, and not enough batchGroups
>> to spill.
>>
>> batchGroups.size 0
>>
>> spilledBatchGroups.size 0
>>
>> allocated memory 224287987
>>
>> allocator limit 178956970
>>
>> Fragment 0:0
>>
>>
>>
>> [Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
>> chai.dev.streetlightdata.com:31010] (state=,code=0)
>>
>>
>>
>> Direct memory is set to 48GB and heap is 8GB.
>>
>>
>>
>> The query is:
>>
>>
>>
>> select probe_id, provider_id, is_moving, mode,
>> cast(convert_to(points,
>> 'JSON') as varchar(1))
>>
>> from dfs.`/home/paul/data`
>>
>> where
>>
>> start_lat between 24.4873780449008 and 60.0108911181433 and
>>
>> start_lon between -139.065890469841 and -52.8305074899881 and
>>
>> provider_id = '343' and
>>
>> mod(abs(hash(probe_id)),  100) = 0
>>
>> order by probe_id, start_time;
>>
>>
>>
>> I’m also using the “example” drill-override configuration.
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> ---Paul
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   <http://www.mapr.com/>
>
>
> Now Available - Free Hadoop On-Demand Training
> <http://www.mapr.com/training?utm_source=Email_medium=Signature
> m_campaign=Free%20available>
>



-- 

Abdelhakim Deneche

Software Engineer

  <http://www.mapr.com/>


Now Available - Free Hadoop On-Demand Training
<http://www.mapr.com/training?utm_source=Email_medium=Signature_campaign=Free%20available>


Re: Drill error with large sort

2016-02-25 Thread Abdel Hakim Deneche
Not so short answer:

In Drill 1.5 (I assume you are using 1.5) we have an improved allocator
that better tracks how much memory each operator is using. In your case it
seems that the date has very wide columns that are causing Sort to choke on
the very first batch of data (1024 records taking up 224MB!!!) because it's
way more than it's memory limit (around 178MB in your particular case).
Drill uses a fancy equation to compute this limit and increasing the
aforementioned option will increase the sort limit. More details here:

http://drill.apache.org/docs/configuring-drill-memory/

On Thu, Feb 25, 2016 at 5:26 PM, Abdel Hakim Deneche 
wrote:

> Short answer:
>
> increase the value of planner.memory.max_query_memory_per_node, by default
> it's set to 2GB, try setting to 4 or even 8GB. This should get the query to
> pass.
>
> On Thu, Feb 25, 2016 at 5:24 PM, Jeff Maass  wrote:
>
>>
>> If you are open to changing the query:
>>   # try removing the functions on the 5th column
>>   # is there any way you could further limit the query?
>>   # does the query finish if u add a limit / top clause?
>>   # what do the logs say?
>>
>> 
>> From: Paul Friedman 
>> Sent: Thursday, February 25, 2016 7:07:12 PM
>> To: user@drill.apache.org
>> Subject: Drill error with large sort
>>
>> I’ve got a query reading from a large directory of parquet files (41 GB)
>> and I’m consistently getting this error:
>>
>>
>>
>> Error: RESOURCE ERROR: One or more nodes ran out of memory while executing
>> the query.
>>
>>
>>
>> Unable to allocate sv2 for 1023 records, and not enough batchGroups to
>> spill.
>>
>> batchGroups.size 0
>>
>> spilledBatchGroups.size 0
>>
>> allocated memory 224287987
>>
>> allocator limit 178956970
>>
>> Fragment 0:0
>>
>>
>>
>> [Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
>> chai.dev.streetlightdata.com:31010] (state=,code=0)
>>
>>
>>
>> Direct memory is set to 48GB and heap is 8GB.
>>
>>
>>
>> The query is:
>>
>>
>>
>> select probe_id, provider_id, is_moving, mode,  cast(convert_to(points,
>> 'JSON') as varchar(1))
>>
>> from dfs.`/home/paul/data`
>>
>> where
>>
>> start_lat between 24.4873780449008 and 60.0108911181433 and
>>
>> start_lon between -139.065890469841 and -52.8305074899881 and
>>
>> provider_id = '343' and
>>
>> mod(abs(hash(probe_id)),  100) = 0
>>
>> order by probe_id, start_time;
>>
>>
>>
>> I’m also using the “example” drill-override configuration.
>>
>>
>>
>> Any help would be appreciated.
>>
>>
>>
>> Thanks.
>>
>>
>>
>> ---Paul
>>
>
>
>
> --
>
> Abdelhakim Deneche
>
> Software Engineer
>
>   
>
>
> Now Available - Free Hadoop On-Demand Training
> 
>



-- 

Abdelhakim Deneche

Software Engineer

  


Now Available - Free Hadoop On-Demand Training



Re: Drill error with large sort

2016-02-25 Thread Jeff Maass

If you are open to changing the query:
  # try removing the functions on the 5th column
  # is there any way you could further limit the query?
  # does the query finish if u add a limit / top clause?
  # what do the logs say?


From: Paul Friedman 
Sent: Thursday, February 25, 2016 7:07:12 PM
To: user@drill.apache.org
Subject: Drill error with large sort

I’ve got a query reading from a large directory of parquet files (41 GB)
and I’m consistently getting this error:



Error: RESOURCE ERROR: One or more nodes ran out of memory while executing
the query.



Unable to allocate sv2 for 1023 records, and not enough batchGroups to
spill.

batchGroups.size 0

spilledBatchGroups.size 0

allocated memory 224287987

allocator limit 178956970

Fragment 0:0



[Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on
chai.dev.streetlightdata.com:31010] (state=,code=0)



Direct memory is set to 48GB and heap is 8GB.



The query is:



select probe_id, provider_id, is_moving, mode,  cast(convert_to(points,
'JSON') as varchar(1))

from dfs.`/home/paul/data`

where

start_lat between 24.4873780449008 and 60.0108911181433 and

start_lon between -139.065890469841 and -52.8305074899881 and

provider_id = '343' and

mod(abs(hash(probe_id)),  100) = 0

order by probe_id, start_time;



I’m also using the “example” drill-override configuration.



Any help would be appreciated.



Thanks.



---Paul