Hi Community, I'm investigating the peak times where the impala daeomns memory were consumed so i can distribute my queries in the right way.
While looking into one scenario, i see one query with the below stats: *** The query has several joins and this is the reason why it take much time. Duration: 4.1m Rows Produced: 30102 Aggregate Peak Memory Usage: 6.5 GiB Per Node Peak Memory Usage: 6.5 GiB The total number of bytes read from HDFS :727 MiB Memory Accrual: 86228522168 byte seconds Pool: default-pool Query State: FINISHED Threads: CPU Time: 8m ==================== I'm intersting to understand why Aggregate Peak Memory Usage and Per Node Peak Memory Usage are identical, while looking in the query profile i see it ran several fragemments on different nodes. Also i see that this query all the times it ran, it has the mentioned 2 parmeters with identical values:( Daily scheduled query) What i'm trying to understand: 1) If the query has several fragments, shouldn't the 2 parmeters be different? 2) Is this scenrio can happen since the HDFS reads byte is small and it may cause all the data to be read from single node? 3) Since i see that the node that was with peak memory is the coordinator, and i see that the 2 parameters are identical, is it mean that the cordinator also executed most of the query? 4) while thinking when these 2 metrics can have the same value is that at a particular time there was one node participating in the query coordinations and execution which is for sure will be the coordinator node and at that point the query has the highest aggregarte memory consumption. Is my assumption true? 5) If my previous assumption is true, then, is there anyway to force the coordinator to un participate in the query consumption? ( Since have 2-3 queries running at the same time with such scenrio will fail) 6) While looking in the fragments i see the following on one of the fragements: PeakMemoryUsage: 141.1 MiB PerHostPeakMemUsage: 6.5 GiB IS the peakMemoryUsage is refer to the node that ran the specific fragment and the perHostPeakMemUsage refers to the node with the peak memory cross the cluster in this soecifc query? -- Take Care Fawze Abujaber
