Hi,
On Thu, Nov 01, 2007 at 09:37:11PM -0700, Arjan van de Ven wrote:
> Alexey Kuznetsov wrote:
> >And even more if hash table is large enough.
> >
> >Obviously, it should scan more than one hash bucket per tick to keep
> >timer frequency reasonable. The question is what is reasonable?
> >
>
> there are two possible answers for this ;)
>
> 1) For really low power syatems, "once every 30 seconds" is nice :0
> But to be realistic, if you use round_jiffies(), it won't be all
> that bad
No, since we're talking about a huge hash table with a slightly
challenging timeout (9000ms/2 in my case) which applies to *each* entry,
this timer has to be invoked very regularly to be able to round-trip
through all entries in time.
> or answer 2
>
> 2) use deferrable timers. These are like normal timers, but don't
> happen when the system is totally idle, but just instead fire when you
> get out of idle (say, when a network interrupt happens). This could be
> the best of both worlds in this case; if there's no network traffic or
> any other activity, the timer doesn't happen, but if there's activity
> it'll happen as usual
Such a solution is not optimal, since even with light load you still
have a huge amount of wakeups. Would be nice to find something better
than that, to reduce neigh table wakeups in general.
To add more meat to the discussion table:
I did this:
/* Cycle through all hash buckets every base_reachable_time/2
* ticks.
* ARP entry timeouts range from 1/2 base_reachable_time to 3/2
* base_reachable_time.
*/
expire = tbl->parms.base_reachable_time >> 1;
expire /= (tbl->hash_mask + 1);
// if (!(now % 277))
// printk(KERN_WARNING "base_reachable_time %i hash_mask %u
// expire %lu HZ %i\n", tbl->parms.base_reachable_time,
// tbl->hash_mask, expire, HZ);
if (!expire)
expire = 1;
and got:
base_reachable_time 9000 hash_mask 255 expire 17 HZ 300
each time.
Since base_reachable_time is 9000(ms) and we expect to have cycled
through
all hash buckets every base_reachable_time/2 (__NOT__ ticks) to ensure
all
hash array entries have been checked within the minimum timeout,
we expect to have a full hash cycle every 9000ms / 2 == 4500ms.
IOW, one hash array entry should be checked every 4500ms / (255 + 1) ==
17.57ms.
Then why does my powertop measure around 17.7 wakeups per second and not
56.7 wakeups/sec???
Easy, that's because the neigh table "expire" value calculation currently
isn't based on HZ, but on a millisecond-sized value even though a timer expiry
calculation is supposed to be based on HZ.
Since I have HZ=300, this means that I have 300 / 17.57 == 17.07
wakeups/second.
And on HZ=100, this is 100 / 17.57 == 5.69 wakeups/second.
On HZ=1000, this would be 1000 / 17.57 == 56.91 wakeups/second (the
expected
value for a millisecond timeout input value)
Now, pray tell, why would the amount of wakeups *differ* between different
HZ
configurations despite the neigh table timeouts being a fixed ms
value??
Because the neigh timer code is BUGGY since it doesn't take HZ into
account.
The funny thing is that this bug caused less wakeups to happen than
should
happen to be able to ensure timely neigh timeout checking,
yet even those currently fewer wakeups are __way__ too much for me ;)
BTW, experimentally reducing the wakeups by omitting the / hash_mask + 1
calculation doesn't yield any noticeable performance gain when doing a
time gzip linux-2.6.0.tar.bz2
IOW, it's almost purely a power optimization, no performance gain.
So what I'd want is:
a) expiry time calc to be fixed to calculate it via HZ
b) an improved mechanism to be used (e.g. one which takes the actual
number
of hash chain items into account and then checks a number of buckets
per timer tick)
a) can be done easily; any thoughts about b)?
Thanks,
Andreas Mohr
_______________________________________________
Power mailing list
[email protected]
http://www.bughost.org/mailman/listinfo/power