[PATCH] stopmachine: add stopmachine_timeout v4

2008-07-17 Thread Hidetoshi Seto
If stop_machine() invoked while one of onlined cpu is locked up by some reason, stop_machine cannot finish its work because the locked cpu cannot stop. This means all other healthy cpus will be blocked infinitely by one dead cpu. This patch allows stop_machine to return -EBUSY with some printk

Re: [PATCH] stopmachine: add stopmachine_timeout v2

2008-07-16 Thread Hidetoshi Seto
Max Krasnyansky wrote: I'd set the default to zero. I beleive that's what Heiko suggested too. Oh, yes, you are right. I missed to catch the suggestion. I'll post fixed version soon. Wait a minutes... Thanks, H.Seto ___ Virtualization mailing list

[PATCH] stopmachine: add stopmachine_timeout v3

2008-07-16 Thread Hidetoshi Seto
If stop_machine() invoked while one of onlined cpu is locked up by some reason, stop_machine cannot finish its work because the locked cpu cannot stop. This means all other healthy cpus will be blocked infinitely by one dead cpu. This patch allows stop_machine to return -EBUSY with some printk

Re: [PATCH] stopmachine: add stopmachine_timeout v3

2008-07-16 Thread Hidetoshi Seto
Peter Zijlstra wrote: I really don't like this, it means the system is really screwed up and doesn't deserve to continue. It can be said that after timeout we just back to previous state, where machine already limp(=partially screwed up), but have some degree of performance. We might be able

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-16 Thread Christian Borntraeger
Am Dienstag, 15. Juli 2008 schrieb Rusty Russell: btw Rusty, I just had this why didn't I think of that moments. This is actually another way of handling my workload. I mean it certainly does not fix the root case of the problems and we still need other things that we talked about

Re: [PATCH] stopmachine: add stopmachine_timeout v2

2008-07-16 Thread Jeremy Fitzhardinge
Hidetoshi Seto wrote: +#ifdef CONFIG_STOP_MACHINE +extern unsigned long stopmachine_timeout; +#endif No externs in C files. Put it in an appropriate header. I'll do a proper review soon. J ___ Virtualization mailing list

Re: [PATCH] stopmachine: add stopmachine_timeout v3

2008-07-16 Thread Peter Zijlstra
On Wed, 2008-07-16 at 15:51 +0900, Hidetoshi Seto wrote: If stop_machine() invoked while one of onlined cpu is locked up by some reason, stop_machine cannot finish its work because the locked cpu cannot stop. This means all other healthy cpus will be blocked infinitely by one dead cpu.

Re: [PATCH] stopmachine: add stopmachine_timeout v2

2008-07-16 Thread Jeremy Fitzhardinge
Hidetoshi Seto wrote: sysctl.c already has many externs... but I can fix at least the above. Yeah, but it's an ugly pattern we'd rather not encourage. I'll do a proper review soon. Is it better to postpone v4 until your comment comes? No. J

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Heiko Carstens
So I think monotonic wallclock time actually makes the most sense here. This is asking for trouble... a config option to disable this would be nice. But as I don't know which problem this patch originally addresses it might be that this is needed anyway. So lets see why we need it

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-15 Thread Rusty Russell
On Tuesday 15 July 2008 12:24:54 Max Krasnyansky wrote: Heiko Carstens wrote: On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: This is asking for trouble... a config option to disable this would be nice. But as I don't know which problem this patch originally addresses

[PATCH] stopmachine: add stopmachine_timeout v2

2008-07-15 Thread Hidetoshi Seto
Thank you for useful feedbacks! Here is the updated version. Could you put this on top of your patches, Rusty? Thanks, H.Seto If stop_machine() invoked while one of onlined cpu is locked up by some reason, stop_machine cannot finish its work because the locked cpu cannot stop. This means all

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Rusty Russell
On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: + /* Wait all others come to life */ + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { + if (time_is_before_jiffies(limit)) + goto

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Jeremy Fitzhardinge
Rusty Russell wrote: On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: + /* Wait all others come to life */ + while (cpus_weight(prepared_cpus) != num_online_cpus() - 1) { + if (time_is_before_jiffies(limit))

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Heiko Carstens
On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: Rusty Russell wrote: On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: + /* Wait all others come to life */ + while (cpus_weight(prepared_cpus) !=

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Max Krasnyansky
Heiko Carstens wrote: On Mon, Jul 14, 2008 at 11:56:18AM -0700, Jeremy Fitzhardinge wrote: Rusty Russell wrote: On Monday 14 July 2008 21:51:25 Christian Borntraeger wrote: Am Montag, 14. Juli 2008 schrieb Hidetoshi Seto: + /* Wait all others come to life */ + while

Re: [PATCH] stopmachine: add stopmachine_timeout

2008-07-14 Thread Max Krasnyansky
Hidetoshi Seto wrote: Heiko Carstens wrote: Hmm.. probably a stupid question: but what could happen that a real cpu (not virtual) becomes unresponsive so that it won't schedule a MAX_RT_PRIO-1 prioritized task for 5 seconds? The original problem (once I heard and easily reproduced) was