Gabor is right on the last point - the Top-N node is used when there is an
ORDER BY with a LIMIT <n> clause in the same SELECT block. Each thread
executing the Top-N node only needs to keep n rows from the input during
processing regardless of the input size. This is really very effective and
explains the results you're seeing.

I don't think a design doc was ever written for the Top-N operator. The
comments in the implementation explain how it works at a high level and are
probably the best place to understand how it works
https://github.com/apache/impala/blob/master/be/src/exec/topn-node.h#L120.

"External Merge Sort for Top-K queries" by Chronis et al -
https://dl.acm.org/doi/pdf/10.1145/3318464.3389729 - has a good overview of
techniques for this problem in sections 1 and 2. Impala does 2.3 and falls
back to 2.4 for larger limits, at least last time I looked. I think Gabor
is suggesting something like 2.5 which is a nice optimisation that Impala
doesn't have afaik.

On Tue, 19 Jul 2022 at 02:59, Gabor Kaszab <gaborkas...@cloudera.com> wrote:

> Hi Sameera,
>
> If a sorter (or a TOP-N node in your case) doesn't fit into memory then it
> does a partial sort for the data that fits into memory (called a "run" if
> I'm not mistaken) and writes them into disk. After this it loads another
> subset of the data into memory, does a partial sort again, and spills to
> disk. It does this until the node runs out of rows to be processed. Once
> this happens a merging sort is applied on the partially sorted "runs".
> Specifically in your query the 6 executor nodes each do these partial sorts
> and then the merging sort to get the 2 rows as a result due to the LIMIT
> clause for each executor node. A finishing step is a MERGING-EXCHANGE that
> receives 2 rows from each 6 executors and gets the final result of the
> "ORDER BY LIMIT 2".
> As a result the more memory you allocate to Impala the bigger these
> partially sorted "runs" could be, the easier the merging sort becomes.
> Additionally, I haven't checked but I assume if you have a LIMIT clause
> Impala doesn't have to spill the whole run into memory just the part that
> survives the LIMIT clause (2 rows per run in this case) so the less runs
> you have the less you have to deal with IO to disk.
>
> Update: I gave this a second thought and I assume that a top-n node (when
> you have a LIMIT clause) doesn't even have to spill to disk either as in
> your case it only has to maintain the highest 2 values due to the LIMIT
> clause.
>
> Hope this helps.
>
> On Mon, Jul 18, 2022 at 8:26 PM sameera <sameeranadis...@gmail.com> wrote:
>
>> Hi Team,
>>
>> I'm seeing this behaviour regarding memory utilization in impala
>> executors.
>> I have a table with 298GB of parquet data in S3 and I'm trying to scan
>> all partitions and do a Top-N operation to understand the memory usage in
>> executors.
>>
>> When i set executor startup option mem_limit to 1GB, the query takes
>> 600s. With 10GB mem_limit it completes within 200s while utilizing memory
>> upto 10GB. This could be because Impala has decided to set
>> NumScannerThreadsStarted 1 in 1GB mem_limit and NumScannerThreadsStarted
>> are 16 in 10GB mem_limit settings.
>>
>> During both tests I didn't see any intermediate files created in scratch
>> dirs. My question is how does Impala manage to complete the entire table
>> sorting when memory is limited to 1GB? Please help me to understand how it
>> internally works. Any design document would be really helpful.
>>
>> select * from execution_report_s3_lse order by time_sequence limit 1;
>>
>> Impala version 3.4
>> 6 executor node cluster with dedicated coordinators.
>> Node spec - 16 Core, 32 GB memory
>>
>> Thank you,
>> Sameera.
>>
>

Reply via email to