Re: Fw: suspicious behaviour in pcwd driver.

2005-08-31 Thread Wim Van Sebroeck
Hi Dave,

> drivers/char/watchdog/pcwd.c does this if it detects
> a temperature out of range..
> 
> if (temp_panic) {
> printk (KERN_INFO PFX "Temperature overheat trip!\n");
> machine_power_off();
> }
> 
> Two problems here are..
> 
> 1. machine_power_off() isn't exported on ppc64. (patch below)
> 2. that printk will never hit the logs, so the admin will just find
> a powered off box with no idea what happened.
> Should we at least sync block devices before doing the power off ?

First you need to enable the "temp_panic" by setting the WDIOS_TEMPPANIC 
option flag. (I'm curious who actually uses this and when they use this, but 
that is something totally different). And then you need to read the card's
status before this piece of code will be triggered.
So in my opinion this isn't used a lott and is simply an option to protect 
your hardware. But your comment is valid: chances that the warning is written 
to the log files is very small.
So I think that it might indeed be better to sync most devices if that can be 
done in a few seconds...

Greetings,
Wim.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Fw: suspicious behaviour in pcwd driver.

2005-08-31 Thread Wim Van Sebroeck
Hi Dave,

 drivers/char/watchdog/pcwd.c does this if it detects
 a temperature out of range..
 
 if (temp_panic) {
 printk (KERN_INFO PFX Temperature overheat trip!\n);
 machine_power_off();
 }
 
 Two problems here are..
 
 1. machine_power_off() isn't exported on ppc64. (patch below)
 2. that printk will never hit the logs, so the admin will just find
 a powered off box with no idea what happened.
 Should we at least sync block devices before doing the power off ?

First you need to enable the temp_panic by setting the WDIOS_TEMPPANIC 
option flag. (I'm curious who actually uses this and when they use this, but 
that is something totally different). And then you need to read the card's
status before this piece of code will be triggered.
So in my opinion this isn't used a lott and is simply an option to protect 
your hardware. But your comment is valid: chances that the warning is written 
to the log files is very small.
So I think that it might indeed be better to sync most devices if that can be 
done in a few seconds...

Greetings,
Wim.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-26 Thread Pavel Machek
Hi!


>  > 2. that printk will never hit the logs, so the admin will just find
>  > a powered off box with no idea what happened.
>  > Should we at least sync block devices before doing the power off ?
> 
> AFAICS, this is still a problem with kernel_power_off() though ?
> 

Look at how acpi does this; we probably want to trigger clean shutdown.
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-26 Thread Pavel Machek
Hi!


   2. that printk will never hit the logs, so the admin will just find
   a powered off box with no idea what happened.
   Should we at least sync block devices before doing the power off ?
 
 AFAICS, this is still a problem with kernel_power_off() though ?
 

Look at how acpi does this; we probably want to trigger clean shutdown.
-- 
64 bytes from 195.113.31.123: icmp_seq=28 ttl=51 time=448769.1 ms 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-22 Thread Benjamin Herrenschmidt

> Export machine_power_off() on ppc64, as the pcwd watchdog driver needs it.
> 
> Signed-off-by: Dave Jones <[EMAIL PROTECTED]>
> 
> --- linux-2.6.12/arch/ppc64/kernel/setup.c~   2005-08-09 17:37:36.0 
> -0400
> +++ linux-2.6.12/arch/ppc64/kernel/setup.c2005-08-09 17:37:53.0 
> -0400
> @@ -706,6 +706,7 @@ void machine_power_off(void)
>   local_irq_disable();
>   while (1) ;
>  }
> +EXPORT_SYMBOL(machine_power_off);
>  
>  void machine_halt(void)
>  {
> 

In fact, we need that for the G5 thermal driver too. I wonder why/how
this export got removed ... Some over-zealous janitors ?

Hrm... /me plays with gitk

Ahhh, ok, so that is this patch:

<<
machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with.  Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.
>>

Well, I think for now, it's safe for therm_pm72 to call
machine_power_off() in case of critical overtemp. I'll have a look at
kernel_* equivalents later.

Can you still slip that patch into 2.6.13 ?

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-22 Thread Dave Jones
On Mon, Aug 22, 2005 at 02:30:06PM -0400, Dave Jones wrote:
 > drivers/char/watchdog/pcwd.c does this if it detects
 > a temperature out of range..
 > 
 > if (temp_panic) {
 > printk (KERN_INFO PFX "Temperature overheat trip!\n");
 > machine_power_off();
 > }
 > 
 > Two problems here are..
 > 
 > 1. machine_power_off() isn't exported on ppc64. (patch below)

I was looking at an old tree, and this is now kernel_power_off()
so this isn't a problem for pcwd, however the export is still needed
for drivers/macintosh/therm_pm72.c

 > 2. that printk will never hit the logs, so the admin will just find
 > a powered off box with no idea what happened.
 > Should we at least sync block devices before doing the power off ?

AFAICS, this is still a problem with kernel_power_off() though ?

Dave

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


suspicious behaviour in pcwd driver.

2005-08-22 Thread Dave Jones
drivers/char/watchdog/pcwd.c does this if it detects
a temperature out of range..

if (temp_panic) {
printk (KERN_INFO PFX "Temperature overheat trip!\n");
machine_power_off();
}

Two problems here are..

1. machine_power_off() isn't exported on ppc64. (patch below)
2. that printk will never hit the logs, so the admin will just find
a powered off box with no idea what happened.
Should we at least sync block devices before doing the power off ?

Dave



Export machine_power_off() on ppc64, as the pcwd watchdog driver needs it.

Signed-off-by: Dave Jones <[EMAIL PROTECTED]>

--- linux-2.6.12/arch/ppc64/kernel/setup.c~ 2005-08-09 17:37:36.0 
-0400
+++ linux-2.6.12/arch/ppc64/kernel/setup.c  2005-08-09 17:37:53.0 
-0400
@@ -706,6 +706,7 @@ void machine_power_off(void)
local_irq_disable();
while (1) ;
 }
+EXPORT_SYMBOL(machine_power_off);
 
 void machine_halt(void)
 {


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


suspicious behaviour in pcwd driver.

2005-08-22 Thread Dave Jones
drivers/char/watchdog/pcwd.c does this if it detects
a temperature out of range..

if (temp_panic) {
printk (KERN_INFO PFX Temperature overheat trip!\n);
machine_power_off();
}

Two problems here are..

1. machine_power_off() isn't exported on ppc64. (patch below)
2. that printk will never hit the logs, so the admin will just find
a powered off box with no idea what happened.
Should we at least sync block devices before doing the power off ?

Dave



Export machine_power_off() on ppc64, as the pcwd watchdog driver needs it.

Signed-off-by: Dave Jones [EMAIL PROTECTED]

--- linux-2.6.12/arch/ppc64/kernel/setup.c~ 2005-08-09 17:37:36.0 
-0400
+++ linux-2.6.12/arch/ppc64/kernel/setup.c  2005-08-09 17:37:53.0 
-0400
@@ -706,6 +706,7 @@ void machine_power_off(void)
local_irq_disable();
while (1) ;
 }
+EXPORT_SYMBOL(machine_power_off);
 
 void machine_halt(void)
 {


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-22 Thread Dave Jones
On Mon, Aug 22, 2005 at 02:30:06PM -0400, Dave Jones wrote:
  drivers/char/watchdog/pcwd.c does this if it detects
  a temperature out of range..
  
  if (temp_panic) {
  printk (KERN_INFO PFX Temperature overheat trip!\n);
  machine_power_off();
  }
  
  Two problems here are..
  
  1. machine_power_off() isn't exported on ppc64. (patch below)

I was looking at an old tree, and this is now kernel_power_off()
so this isn't a problem for pcwd, however the export is still needed
for drivers/macintosh/therm_pm72.c

  2. that printk will never hit the logs, so the admin will just find
  a powered off box with no idea what happened.
  Should we at least sync block devices before doing the power off ?

AFAICS, this is still a problem with kernel_power_off() though ?

Dave

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: suspicious behaviour in pcwd driver.

2005-08-22 Thread Benjamin Herrenschmidt

 Export machine_power_off() on ppc64, as the pcwd watchdog driver needs it.
 
 Signed-off-by: Dave Jones [EMAIL PROTECTED]
 
 --- linux-2.6.12/arch/ppc64/kernel/setup.c~   2005-08-09 17:37:36.0 
 -0400
 +++ linux-2.6.12/arch/ppc64/kernel/setup.c2005-08-09 17:37:53.0 
 -0400
 @@ -706,6 +706,7 @@ void machine_power_off(void)
   local_irq_disable();
   while (1) ;
  }
 +EXPORT_SYMBOL(machine_power_off);
  
  void machine_halt(void)
  {
 

In fact, we need that for the G5 thermal driver too. I wonder why/how
this export got removed ... Some over-zealous janitors ?

Hrm... /me plays with gitk

Ahhh, ok, so that is this patch:


machine_restart, machine_halt and machine_power_off are machine
specific hooks deep into the reboot logic, that modules
have no business messing with.  Usually code should be calling
kernel_restart, kernel_halt, kernel_power_off, or
emergency_restart. So don't export machine_restart,
machine_halt, and machine_power_off so we can catch buggy users.


Well, I think for now, it's safe for therm_pm72 to call
machine_power_off() in case of critical overtemp. I'll have a look at
kernel_* equivalents later.

Can you still slip that patch into 2.6.13 ?

Ben.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/