Hello, Kalesh! > > Hi Uladzislau, > > I've tried your patches (v3) on Android with 6.1.43 kernel. > > The test cycles 10 apps (including camera) sequentially for 100 > iterations. > > I've set rcu_normal to override the rcu_expedited in the boot > parameters: > > adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu > > rcupdate.rcu_normal=1 > rcupdate.rcu_expedited=1 > rcu_nocbs=0-7 > > > The configurations are: > > A - echo 0 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp > B - echo 1 >/sys/module/rcutree/parameters/rcu_normal_wake_from_gp > > Results: > > = APP LAUNCH TIME = > delta (B-A) ratio(%) > overall_app_launch_time(ms) -11399.00 -6.65 > > > == camera_launch_time > type delta(B-A %) A_count B_count > HOT -7.05 99 99 > COLD -6.33 1 1 > > If i interpret it correctly you also see that this series reduces a launch time by 6/7% on your app set. Is that correct?
> === Function Latencies ===
>
> Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit
> Tracing synchronize_rcu_expedited. Hit Ctrl-C to exit
>
> nsec : count distribution
> nsec : count distribution
> 0 -> 1 : 0 |
> | 0 -> 1 : 0 |
> |
> 2 -> 3 : 0 |
> | 2 -> 3 : 0 |
> |
> 4 -> 7 : 0 |
> | 4 -> 7 : 0 |
> |
> 8 -> 15 : 0 |
> | 8 -> 15 : 0 |
> |
> 16 -> 31 : 0 |
> | 16 -> 31 : 0 |
> |
> 32 -> 63 : 0 |
> | 32 -> 63 : 0 |
> |
> 64 -> 127 : 0 |
> | 64 -> 127 : 0 |
> |
> 128 -> 255 : 0 |
> | 128 -> 255 : 0 |
> |
> 256 -> 511 : 0 |
> | 256 -> 511 : 0 |
> |
> 512 -> 1023 : 0 |
> | 512 -> 1023 : 0 |
> |
> 1024 -> 2047 : 0 |
> | 1024 -> 2047 : 0 |
> |
> 2048 -> 4095 : 0 |
> | 2048 -> 4095 : 0 |
> |
> 4096 -> 8191 : 0 |
> | 4096 -> 8191 : 0 |
> |
> 8192 -> 16383 : 0 |
> | 8192 -> 16383 : 0 |
> |
> 16384 -> 32767 : 0 |
> | 16384 -> 32767 : 0 |
> |
> 32768 -> 65535 : 0 |
> | 32768 -> 65535 : 0 |
> |
> 65536 -> 131071 : 0 |
> | 65536 -> 131071 : 0 |
> |
> 131072 -> 262143 : 0 |
> | 131072 -> 262143 : 0 |
> |
> 262144 -> 524287 : 0 |
> | 262144 -> 524287 : 0 |
> |
> 524288 -> 1048575 : 0 |
> | 524288 -> 1048575 : 0 |
> |
> 1048576 -> 2097151 : 0 |
> | 1048576 -> 2097151 : 0 |
> |
> 2097152 -> 4194303 : 0 |
> | 2097152 -> 4194303 : 0 |
> |
> 4194304 -> 8388607 : 871 |**
> | 4194304 -> 8388607 : 1180 |****
> |
> 8388608 -> 16777215 : 3204 |********
> | 8388608 -> 16777215 : 7020 |*************************
> |
> 16777216 -> 33554431 : 15013
> |****************************************| 16777216 -> 33554431 :
> 10952 |****************************************|
> Exiting trace of synchronize_rcu_expedited
> Exiting trace of synchronize_rcu_expedited
>
>
> Tracing synchronize_rcu. Hit Ctrl-C to exit
> Tracing synchronize_rcu. Hit Ctrl-C to exit
>
> nsec : count distribution
> nsec : count distribution
> 0 -> 1 : 0 |
> | 0 -> 1 : 0 |
> |
> 2 -> 3 : 0 |
> | 2 -> 3 : 0 |
> |
> 4 -> 7 : 0 |
> | 4 -> 7 : 0 |
> |
> 8 -> 15 : 0 |
> | 8 -> 15 : 0 |
> |
> 16 -> 31 : 0 |
> | 16 -> 31 : 0 |
> |
> 32 -> 63 : 0 |
> | 32 -> 63 : 0 |
> |
> 64 -> 127 : 0 |
> | 64 -> 127 : 0 |
> |
> 128 -> 255 : 0 |
> | 128 -> 255 : 0 |
> |
> 256 -> 511 : 0 |
> | 256 -> 511 : 0 |
> |
> 512 -> 1023 : 0 |
> | 512 -> 1023 : 0 |
> |
> 1024 -> 2047 : 0 |
> | 1024 -> 2047 : 0 |
> |
> 2048 -> 4095 : 0 |
> | 2048 -> 4095 : 0 |
> |
> 4096 -> 8191 : 0 |
> | 4096 -> 8191 : 0 |
> |
> 8192 -> 16383 : 0 |
> | 8192 -> 16383 : 0 |
> |
> 16384 -> 32767 : 0 |
> | 16384 -> 32767 : 0 |
> |
> 32768 -> 65535 : 0 |
> | 32768 -> 65535 : 0 |
> |
> 65536 -> 131071 : 0 |
> | 65536 -> 131071 : 0 |
> |
> 131072 -> 262143 : 0 |
> | 131072 -> 262143 : 0 |
> |
> 262144 -> 524287 : 0 |
> | 262144 -> 524287 : 0 |
> |
> 524288 -> 1048575 : 0 |
> | 524288 -> 1048575 : 0 |
> |
> 1048576 -> 2097151 : 0 |
> | 1048576 -> 2097151 : 0 |
> |
> 2097152 -> 4194303 : 0 |
> | 2097152 -> 4194303 : 0 |
> |
> 4194304 -> 8388607 : 861 |**
> | 4194304 -> 8388607 : 1136 |****
> |
> 8388608 -> 16777215 : 3111 |********
> | 8388608 -> 16777215 : 6320 |************************
> |
> 16777216 -> 33554431 : 13901
> |****************************************| 16777216 -> 33554431 :
> 10484 |****************************************|
> Exiting trace of synchronize_rcu
> Exiting trace of synchronize_rcu
>
Who is B and who is A?
>
> Interestingly I tried the same experiment without rcu_normal=1 (leaving
> rcu_expedited=1):
>
> adb shell cat /proc/cmdline | tr ' ' '\n' | grep rcu
> rcupdate.rcu_expedited=1
> rcu_nocbs=0-7
>
> In this case I also saw the -6 to -7% decrease in the app launch times
> but I don't have a good explanation why that would be? (The fucntion
> latency histograms in this case didn't show any significant difference).
> Do you have any insight why this may happen?
>
When rcu_expedited=1 is set and rcu_normal=0 is disabled. The
synchronize_rcu() call is converted into synchronize_rcu_expidited():
<snip>
void synchronize_rcu(void)
{
unsigned long flags;
struct rcu_node *rnp;
RCU_LOCKDEP_WARN(lock_is_held(&rcu_bh_lock_map) ||
lock_is_held(&rcu_lock_map) ||
lock_is_held(&rcu_sched_lock_map),
"Illegal synchronize_rcu() in RCU read-side critical
section");
if (!rcu_blocking_is_gp()) {
if (rcu_gp_is_expedited())
synchronize_rcu_expedited();
else
synchronize_rcu_normal();
return;
}
...
<snip>
rcu_gp_is_expidited() is true, so invoke "expedited" version.
I see some concerns in preferring an expedited version as a global
replacement. First of all it is related to latency sensitive workloads
because in order to expedite a grace period it sends out IPIs on all
online CPUs to force them to report a quiescent-state asap. I have not
investigated yet how it affects such workloads.
Therefore, in your case, you also see a performance boost of your app sets.
Thank you for looking at it!
--
Uladzislau Rezki
