If you are open to changing the query: # try removing the functions on the 5th column # is there any way you could further limit the query? # does the query finish if u add a limit / top clause? # what do the logs say?
________________________________________ From: Paul Friedman <paul.fried...@streetlightdata.com> Sent: Thursday, February 25, 2016 7:07:12 PM To: user@drill.apache.org Subject: Drill error with large sort I’ve got a query reading from a large directory of parquet files (41 GB) and I’m consistently getting this error: Error: RESOURCE ERROR: One or more nodes ran out of memory while executing the query. Unable to allocate sv2 for 1023 records, and not enough batchGroups to spill. batchGroups.size 0 spilledBatchGroups.size 0 allocated memory 224287987 allocator limit 178956970 Fragment 0:0 [Error Id: 878d604c-4656-4a5a-8b46-ff38a6ae020d on chai.dev.streetlightdata.com:31010] (state=,code=0) Direct memory is set to 48GB and heap is 8GB. The query is: select probe_id, provider_id, is_moving, mode, cast(convert_to(points, 'JSON') as varchar(100000000)) from dfs.`/home/paul/data` where start_lat between 24.4873780449008 and 60.0108911181433 and start_lon between -139.065890469841 and -52.8305074899881 and provider_id = '343' and mod(abs(hash(probe_id)), 100) = 0 order by probe_id, start_time; I’m also using the “example” drill-override configuration. Any help would be appreciated. Thanks. ---Paul