Hi Sameera,

If a sorter (or a TOP-N node in your case) doesn't fit into memory then it
does a partial sort for the data that fits into memory (called a "run" if
I'm not mistaken) and writes them into disk. After this it loads another
subset of the data into memory, does a partial sort again, and spills to
disk. It does this until the node runs out of rows to be processed. Once
this happens a merging sort is applied on the partially sorted "runs".
Specifically in your query the 6 executor nodes each do these partial sorts
and then the merging sort to get the 2 rows as a result due to the LIMIT
clause for each executor node. A finishing step is a MERGING-EXCHANGE that
receives 2 rows from each 6 executors and gets the final result of the
"ORDER BY LIMIT 2".
As a result the more memory you allocate to Impala the bigger these
partially sorted "runs" could be, the easier the merging sort becomes.
Additionally, I haven't checked but I assume if you have a LIMIT clause
Impala doesn't have to spill the whole run into memory just the part that
survives the LIMIT clause (2 rows per run in this case) so the less runs
you have the less you have to deal with IO to disk.

Update: I gave this a second thought and I assume that a top-n node (when
you have a LIMIT clause) doesn't even have to spill to disk either as in
your case it only has to maintain the highest 2 values due to the LIMIT
clause.

Hope this helps.

On Mon, Jul 18, 2022 at 8:26 PM sameera <sameeranadis...@gmail.com> wrote:

> Hi Team,
>
> I'm seeing this behaviour regarding memory utilization in impala
> executors.
> I have a table with 298GB of parquet data in S3 and I'm trying to scan all
> partitions and do a Top-N operation to understand the memory usage in
> executors.
>
> When i set executor startup option mem_limit to 1GB, the query takes 600s.
> With 10GB mem_limit it completes within 200s while utilizing memory upto
> 10GB. This could be because Impala has decided to set
> NumScannerThreadsStarted 1 in 1GB mem_limit and NumScannerThreadsStarted
> are 16 in 10GB mem_limit settings.
>
> During both tests I didn't see any intermediate files created in scratch
> dirs. My question is how does Impala manage to complete the entire table
> sorting when memory is limited to 1GB? Please help me to understand how it
> internally works. Any design document would be really helpful.
>
> select * from execution_report_s3_lse order by time_sequence limit 1;
>
> Impala version 3.4
> 6 executor node cluster with dedicated coordinators.
> Node spec - 16 Core, 32 GB memory
>
> Thank you,
> Sameera.
>

Reply via email to