Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread TimO

Vojtech Pavlik wrote:
> 
> I'm *not* sure. It just looks like a reasonable explanation. It doesn't
> happen on Intel chips and older VIA chips, it only happens on new VIA
> chips, and the code is the same all the time. Also, it happens both with
> 2.2 and 2.4 kernels ...
> 
> --
> Vojtech Pavlik
> SuSE Labs
>

Do you have a method guaranteed to reproduce this?  I have a newer VIA
chipset and haven't (yet) observed this problem.

Host bridge: VIA Technologies, Inc. VT8371 [KX133] (rev 2).
PCI bridge: VIA Technologies, Inc. VT8371 [KX133 AGP]  (rev 0).
ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 34).
IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 16).
Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
48).
Multimedia audio controller: VIA Technologies, Inc. AC97 Audio
Controller (rev 32).


===
-- TimO
==++==
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 02:04:58PM +0200, [EMAIL PROTECTED] wrote:

> > Interesting. If it's caused by SCSI as well (might be), then it's not
> > caused by heavy IDE activity but rather than that it could be heavy
> > BusMastering activity instead (The IDE chip does BM as well).
> > 
> > I'm still wondering if it could be a Linux kernel bug (bad/concurrent
> > accesses to the i8253 registers), this has to be checked.
> > 
> 
> How sure are you that the chip is actually buggy? I ran into something
> similar a while ago, when I mixed the two arguments to an outb in a driver, 
> and ended up writing MYPORT into the timer instead of 0x40 into MYPORT.

I'm *not* sure. It just looks like a reasonable explanation. It doesn't
happen on Intel chips and older VIA chips, it only happens on new VIA
chips, and the code is the same all the time. Also, it happens both with
2.2 and 2.4 kernels ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread bart

On 26 Oct, Vojtech Pavlik wrote:
> On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:
> 
>> > On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
>> > 
>> > > ...
>> > > 
>> > > Have you any idea what is the relation between time and this chip ?
>> > > 
>> > > Also, I'm experiencing the problem for several month on my 
>> > > workstation and I never could find where it was comming from...
>> > > how did you do ?
>> > 
>> > Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
>> > 
>> > I first located the wrong time was coming from gettimeofday() and not
>> > from the other sources of time the kernel provides. And then I was
>> > tracking the problem (which actually is an underflow - the chip bug
>> > causes some time offset variables go negative - 0x microseconds
>> > is about 1:20 hours). And this way I got to the spot where the patch
>> > cures the problem.
>> 
>> Ok, here is what I experienced :
>> 
>> First what is strange is that :
>> - I'm using SCSI
>> - I just have an IDE disk for mp3.
>> The IDE subsystem is never used heavilly...
>> 
>> I've experienced the problem after some time of 
>> heavy scsi IO, my screen under X was going black (like with dpms)
>> When I was moving the mouse, the image was coming back
>> for < 1 seconds, then black screen...
>> 
>> The only fix was to kill X then to reboot.
>> 
>> Anyway, thanks for your explaination...
>> I'll do a feedback for this patch ASAP.
> 
> Interesting. If it's caused by SCSI as well (might be), then it's not
> caused by heavy IDE activity but rather than that it could be heavy
> BusMastering activity instead (The IDE chip does BM as well).
> 
> I'm still wondering if it could be a Linux kernel bug (bad/concurrent
> accesses to the i8253 registers), this has to be checked.
> 

How sure are you that the chip is actually buggy? I ran into something
similar a while ago, when I mixed the two arguments to an outb in a driver, 
and ended up writing MYPORT into the timer instead of 0x40 into MYPORT.

Bart
-- 
Bart Hartgers - TUE Eindhoven 
Get my GPG key at http://etpmod.phys.tue.nl/bart/pubkey.gpg 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 01:16:34PM +0200, Yoann Vandoorselaere wrote:

> > Which part of the chipset you mean? The PIT (programmable interrupt
> > timer)? That one is standard since XT times. The rest of the ISA bridge?
> > Maybe, but that's mostly BIOS work and shouldn't impact the PIT
> > under sane conditions.
> 
> What is strange is that a number of persons seem to be hit by this
> problem... And if VIA didn't corrected it it's probably because
> they are not aware of it...
> 
> I think that if such problem occured under windows 
> (thinking to the windows user base), VIA would be already in touch.

It can't happen under Windows, because Windows timer runs at 18 Hz
(timer programmed to 65535), while Linux uses 100 Hz (timer programmed
to approx 11920), so when the timer unprograms itself due to the bug to
65535, only Linux notices it, Windows can't.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Fri, Oct 27, 2000 at 12:58:12PM +0200, Yoann Vandoorselaere wrote:
> 
> > > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > > they're not probably the cause of the problem we see here.
> > > > 
> > > > BTW what about trying to modify your work-around code to make it
> > > > attempt to read the timer again? This way we could test whether it was
> > > > a race condition during timer read or really timer jumping to a bogus
> > > > value.
> > > 
> > > Actually if I don't reprogram the timer (and just ignore the value for
> > > example), the work-around code keeps being called again and again very
> > > often (between 1x/minute to 100x/second) after the first failure, even
> > > when the system is idle.
> > > 
> > > When reprogramming, next failure happens only after stressing the system
> > > again.
> > > 
> > > So it's not just a race, the impact of the failure on the chip is
> > > permanent and stays till it's reprogrammed.
> > 
> > Are you sure there is not an error in the way the 
> > chipset is programmed ?
> 
> Which part of the chipset you mean? The PIT (programmable interrupt
> timer)? That one is standard since XT times. The rest of the ISA bridge?
> Maybe, but that's mostly BIOS work and shouldn't impact the PIT
> under sane conditions.

What is strange is that a number of persons seem to be hit by this
problem... And if VIA didn't corrected it it's probably because
they are not aware of it...

I think that if such problem occured under windows 
(thinking to the windows user base), VIA would be already in touch.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
Tiniest "mesures unities?"
- lenght : millimeter
- volume : milliliter
- intelligence : military man
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 12:58:12PM +0200, Yoann Vandoorselaere wrote:

> > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > they're not probably the cause of the problem we see here.
> > > 
> > > BTW what about trying to modify your work-around code to make it
> > > attempt to read the timer again? This way we could test whether it was
> > > a race condition during timer read or really timer jumping to a bogus
> > > value.
> > 
> > Actually if I don't reprogram the timer (and just ignore the value for
> > example), the work-around code keeps being called again and again very
> > often (between 1x/minute to 100x/second) after the first failure, even
> > when the system is idle.
> > 
> > When reprogramming, next failure happens only after stressing the system
> > again.
> > 
> > So it's not just a race, the impact of the failure on the chip is
> > permanent and stays till it's reprogrammed.
> 
> Are you sure there is not an error in the way the 
> chipset is programmed ?

Which part of the chipset you mean? The PIT (programmable interrupt
timer)? That one is standard since XT times. The rest of the ISA bridge?
Maybe, but that's mostly BIOS work and shouldn't impact the PIT
under sane conditions.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Fri, Oct 27, 2000 at 12:02:20PM +0200, Martin Mares wrote:
> 
> > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > they're not probably the cause of the problem we see here.
> > 
> > BTW what about trying to modify your work-around code to make it
> > attempt to read the timer again? This way we could test whether it was
> > a race condition during timer read or really timer jumping to a bogus
> > value.
> 
> Actually if I don't reprogram the timer (and just ignore the value for
> example), the work-around code keeps being called again and again very
> often (between 1x/minute to 100x/second) after the first failure, even
> when the system is idle.
> 
> When reprogramming, next failure happens only after stressing the system
> again.
> 
> So it's not just a race, the impact of the failure on the chip is
> permanent and stays till it's reprogrammed.

Are you sure there is not an error in the way the 
chipset is programmed ?

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
"Programming is a race between programmers, who try and make more and more
idiot-proof software, and universe, which produces more and more remarkable
idiots. Until now, universe leads the race"  -- R. Cook
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 12:02:20PM +0200, Martin Mares wrote:

> > So this is not our problem here. Anyway I guess it's time to hunt for
> > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > they're not probably the cause of the problem we see here.
> 
> BTW what about trying to modify your work-around code to make it
> attempt to read the timer again? This way we could test whether it was
> a race condition during timer read or really timer jumping to a bogus
> value.

Actually if I don't reprogram the timer (and just ignore the value for
example), the work-around code keeps being called again and again very
often (between 1x/minute to 100x/second) after the first failure, even
when the system is idle.

When reprogramming, next failure happens only after stressing the system
again.

So it's not just a race, the impact of the failure on the chip is
permanent and stays till it's reprogrammed.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Martin Mares

Hi!

> So this is not our problem here. Anyway I guess it's time to hunt for
> i8259 accesses in the kernel that lack the necessary spinlock, even when
> they're not probably the cause of the problem we see here.

BTW what about trying to modify your work-around code to make it
attempt to read the timer again? This way we could test whether it was
a race condition during timer read or really timer jumping to a bogus
value.

Have a nice fortnight
-- 
Martin `MJ' Mares <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> http://atrey.karlin.mff.cuni.cz/~mj/
"This line is umop apisdn."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Martin Mares

Hi!

 So this is not our problem here. Anyway I guess it's time to hunt for
 i8259 accesses in the kernel that lack the necessary spinlock, even when
 they're not probably the cause of the problem we see here.

BTW what about trying to modify your work-around code to make it
attempt to read the timer again? This way we could test whether it was
a race condition during timer read or really timer jumping to a bogus
value.

Have a nice fortnight
-- 
Martin `MJ' Mares [EMAIL PROTECTED] [EMAIL PROTECTED] http://atrey.karlin.mff.cuni.cz/~mj/
"This line is umop apisdn."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 12:02:20PM +0200, Martin Mares wrote:

  So this is not our problem here. Anyway I guess it's time to hunt for
  i8259 accesses in the kernel that lack the necessary spinlock, even when
  they're not probably the cause of the problem we see here.
 
 BTW what about trying to modify your work-around code to make it
 attempt to read the timer again? This way we could test whether it was
 a race condition during timer read or really timer jumping to a bogus
 value.

Actually if I don't reprogram the timer (and just ignore the value for
example), the work-around code keeps being called again and again very
often (between 1x/minute to 100x/second) after the first failure, even
when the system is idle.

When reprogramming, next failure happens only after stressing the system
again.

So it's not just a race, the impact of the failure on the chip is
permanent and stays till it's reprogrammed.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Fri, Oct 27, 2000 at 12:02:20PM +0200, Martin Mares wrote:
 
   So this is not our problem here. Anyway I guess it's time to hunt for
   i8259 accesses in the kernel that lack the necessary spinlock, even when
   they're not probably the cause of the problem we see here.
  
  BTW what about trying to modify your work-around code to make it
  attempt to read the timer again? This way we could test whether it was
  a race condition during timer read or really timer jumping to a bogus
  value.
 
 Actually if I don't reprogram the timer (and just ignore the value for
 example), the work-around code keeps being called again and again very
 often (between 1x/minute to 100x/second) after the first failure, even
 when the system is idle.
 
 When reprogramming, next failure happens only after stressing the system
 again.
 
 So it's not just a race, the impact of the failure on the chip is
 permanent and stays till it's reprogrammed.

Are you sure there is not an error in the way the 
chipset is programmed ?

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
"Programming is a race between programmers, who try and make more and more
idiot-proof software, and universe, which produces more and more remarkable
idiots. Until now, universe leads the race"  -- R. Cook
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 12:58:12PM +0200, Yoann Vandoorselaere wrote:

So this is not our problem here. Anyway I guess it's time to hunt for
i8259 accesses in the kernel that lack the necessary spinlock, even when
they're not probably the cause of the problem we see here.
   
   BTW what about trying to modify your work-around code to make it
   attempt to read the timer again? This way we could test whether it was
   a race condition during timer read or really timer jumping to a bogus
   value.
  
  Actually if I don't reprogram the timer (and just ignore the value for
  example), the work-around code keeps being called again and again very
  often (between 1x/minute to 100x/second) after the first failure, even
  when the system is idle.
  
  When reprogramming, next failure happens only after stressing the system
  again.
  
  So it's not just a race, the impact of the failure on the chip is
  permanent and stays till it's reprogrammed.
 
 Are you sure there is not an error in the way the 
 chipset is programmed ?

Which part of the chipset you mean? The PIT (programmable interrupt
timer)? That one is standard since XT times. The rest of the ISA bridge?
Maybe, but that's mostly BIOS work and shouldn't impact the PIT
under sane conditions.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Fri, Oct 27, 2000 at 12:58:12PM +0200, Yoann Vandoorselaere wrote:
 
 So this is not our problem here. Anyway I guess it's time to hunt for
 i8259 accesses in the kernel that lack the necessary spinlock, even when
 they're not probably the cause of the problem we see here.

BTW what about trying to modify your work-around code to make it
attempt to read the timer again? This way we could test whether it was
a race condition during timer read or really timer jumping to a bogus
value.
   
   Actually if I don't reprogram the timer (and just ignore the value for
   example), the work-around code keeps being called again and again very
   often (between 1x/minute to 100x/second) after the first failure, even
   when the system is idle.
   
   When reprogramming, next failure happens only after stressing the system
   again.
   
   So it's not just a race, the impact of the failure on the chip is
   permanent and stays till it's reprogrammed.
  
  Are you sure there is not an error in the way the 
  chipset is programmed ?
 
 Which part of the chipset you mean? The PIT (programmable interrupt
 timer)? That one is standard since XT times. The rest of the ISA bridge?
 Maybe, but that's mostly BIOS work and shouldn't impact the PIT
 under sane conditions.

What is strange is that a number of persons seem to be hit by this
problem... And if VIA didn't corrected it it's probably because
they are not aware of it...

I think that if such problem occured under windows 
(thinking to the windows user base), VIA would be already in touch.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
Tiniest "mesures unities?"
- lenght : millimeter
- volume : milliliter
- intelligence : military man
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 01:16:34PM +0200, Yoann Vandoorselaere wrote:

  Which part of the chipset you mean? The PIT (programmable interrupt
  timer)? That one is standard since XT times. The rest of the ISA bridge?
  Maybe, but that's mostly BIOS work and shouldn't impact the PIT
  under sane conditions.
 
 What is strange is that a number of persons seem to be hit by this
 problem... And if VIA didn't corrected it it's probably because
 they are not aware of it...
 
 I think that if such problem occured under windows 
 (thinking to the windows user base), VIA would be already in touch.

It can't happen under Windows, because Windows timer runs at 18 Hz
(timer programmed to 65535), while Linux uses 100 Hz (timer programmed
to approx 11920), so when the timer unprograms itself due to the bug to
65535, only Linux notices it, Windows can't.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread bart

On 26 Oct, Vojtech Pavlik wrote:
 On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:
 
  On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
  
   ...
   
   Have you any idea what is the relation between time and this chip ?
   
   Also, I'm experiencing the problem for several month on my 
   workstation and I never could find where it was comming from...
   how did you do ?
  
  Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
  
  I first located the wrong time was coming from gettimeofday() and not
  from the other sources of time the kernel provides. And then I was
  tracking the problem (which actually is an underflow - the chip bug
  causes some time offset variables go negative - 0x microseconds
  is about 1:20 hours). And this way I got to the spot where the patch
  cures the problem.
 
 Ok, here is what I experienced :
 
 First what is strange is that :
 - I'm using SCSI
 - I just have an IDE disk for mp3.
 The IDE subsystem is never used heavilly...
 
 I've experienced the problem after some time of 
 heavy scsi IO, my screen under X was going black (like with dpms)
 When I was moving the mouse, the image was coming back
 for  1 seconds, then black screen...
 
 The only fix was to kill X then to reboot.
 
 Anyway, thanks for your explaination...
 I'll do a feedback for this patch ASAP.
 
 Interesting. If it's caused by SCSI as well (might be), then it's not
 caused by heavy IDE activity but rather than that it could be heavy
 BusMastering activity instead (The IDE chip does BM as well).
 
 I'm still wondering if it could be a Linux kernel bug (bad/concurrent
 accesses to the i8253 registers), this has to be checked.
 

How sure are you that the chip is actually buggy? I ran into something
similar a while ago, when I mixed the two arguments to an outb in a driver, 
and ended up writing MYPORT into the timer instead of 0x40 into MYPORT.

Bart
-- 
Bart Hartgers - TUE Eindhoven 
Get my GPG key at http://etpmod.phys.tue.nl/bart/pubkey.gpg 

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread Vojtech Pavlik

On Fri, Oct 27, 2000 at 02:04:58PM +0200, [EMAIL PROTECTED] wrote:

  Interesting. If it's caused by SCSI as well (might be), then it's not
  caused by heavy IDE activity but rather than that it could be heavy
  BusMastering activity instead (The IDE chip does BM as well).
  
  I'm still wondering if it could be a Linux kernel bug (bad/concurrent
  accesses to the i8253 registers), this has to be checked.
  
 
 How sure are you that the chip is actually buggy? I ran into something
 similar a while ago, when I mixed the two arguments to an outb in a driver, 
 and ended up writing MYPORT into the timer instead of 0x40 into MYPORT.

I'm *not* sure. It just looks like a reasonable explanation. It doesn't
happen on Intel chips and older VIA chips, it only happens on new VIA
chips, and the code is the same all the time. Also, it happens both with
2.2 and 2.4 kernels ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-27 Thread TimO

Vojtech Pavlik wrote:
 
 I'm *not* sure. It just looks like a reasonable explanation. It doesn't
 happen on Intel chips and older VIA chips, it only happens on new VIA
 chips, and the code is the same all the time. Also, it happens both with
 2.2 and 2.4 kernels ...
 
 --
 Vojtech Pavlik
 SuSE Labs


Do you have a method guaranteed to reproduce this?  I have a newer VIA
chipset and haven't (yet) observed this problem.

Host bridge: VIA Technologies, Inc. VT8371 [KX133] (rev 2).
PCI bridge: VIA Technologies, Inc. VT8371 [KX133 AGP]  (rev 0).
ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 34).
IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 16).
Bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev
48).
Multimedia audio controller: VIA Technologies, Inc. AC97 Audio
Controller (rev 32).


===
-- TimO
==++==
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 11:24:38PM +0200, Yoann Vandoorselaere wrote:
> Vojtech Pavlik <[EMAIL PROTECTED]> writes:
> 
> > On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
> > 
> > > yop, I 've done :
> > > 
> > > make -j10 World 
> > > in the xfree tree and simulateously :
> > > 
> > > while true; do make dep && make clean && make bzImage; done
> > > in the kernel tree
> > 
> > Now it'd be nice to verify that the problem also happens when the system
> > is not running out of memory (which -j10 quite causes I think) ...
> 
> Nope, my system was loaded, but was usable
> (at least until the problem occured)...

Good to know.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
> 
> > yop, I 've done :
> > 
> > make -j10 World 
> > in the xfree tree and simulateously :
> > 
> > while true; do make dep && make clean && make bzImage; done
> > in the kernel tree
> 
> Now it'd be nice to verify that the problem also happens when the system
> is not running out of memory (which -j10 quite causes I think) ...

Nope, my system was loaded, but was usable
(at least until the problem occured)...

Athlon 750 with 128mb of ram and 103mb of swap.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:

> yop, I 've done :
> 
> make -j10 World 
> in the xfree tree and simulateously :
> 
> while true; do make dep && make clean && make bzImage; done
> in the kernel tree

Now it'd be nice to verify that the problem also happens when the system
is not running out of memory (which -j10 quite causes I think) ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:
> 
> > > > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > > > does the following:
> > > > [Snipped...]
> > > > >  
> > > > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > > > include/linux/ide.h.
> > > > > 
> > > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > > they're not probably the cause of the problem we see here.
> > > > 
> > > > Okay, good.
> > > 
> > > Ok, here is a list of places within the kernel that access the PIT
> > > timer, plus the method of locking (i386 arch only):
> > 
> > [...]
> > 
> > Ok, I just tested if the problem was always present without
> > the IDE subsystem...
> > 
> > The answer is it is not... so it isn't an IDE problem.
> 
> Uh, guess too many negations. You wanted to say that the problem was
> present even when you disabled the IDE subsystem, right?

yop

> 
> So now it seems that possibly enough PCI traffic / busmastering traffic
> can cause the problem ...

yop, I 've done :

make -j10 World 
in the xfree tree and simulateously :

while true; do make dep && make clean && make bzImage; done
in the kernel tree


-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:

> > > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > > does the following:
> > > [Snipped...]
> > > >  
> > > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > > include/linux/ide.h.
> > > > 
> > > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > > they're not probably the cause of the problem we see here.
> > > 
> > > Okay, good.
> > 
> > Ok, here is a list of places within the kernel that access the PIT
> > timer, plus the method of locking (i386 arch only):
> 
> [...]
> 
> Ok, I just tested if the problem was always present without
> the IDE subsystem...
> 
> The answer is it is not... so it isn't an IDE problem.

Uh, guess too many negations. You wanted to say that the problem was
present even when you disabled the IDE subsystem, right?

So now it seems that possibly enough PCI traffic / busmastering traffic
can cause the problem ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:
> 
> > > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > > to the timer. It writes 0 to the control-word for timer 0. This
> > > > does the following:
> > [Snipped...]
> > >  
> > > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > > include/linux/ide.h.
> > > 
> > > So this is not our problem here. Anyway I guess it's time to hunt for
> > > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > > they're not probably the cause of the problem we see here.
> > 
> > Okay, good.
> 
> Ok, here is a list of places within the kernel that access the PIT
> timer, plus the method of locking (i386 arch only):

[...]

Ok, I just tested if the problem was always present without
the IDE subsystem...

The answer is it is not... so it isn't an IDE problem.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:

> > > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > > to the timer. It writes 0 to the control-word for timer 0. This
> > > does the following:
> [Snipped...]
> >  
> > Well, at least on 2.4.0-test9, the above timing code is #ifed to
> > DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> > include/linux/ide.h.
> > 
> > So this is not our problem here. Anyway I guess it's time to hunt for
> > i8259 accesses in the kernel that lack the necessary spinlock, even when
> > they're not probably the cause of the problem we see here.
> 
> Okay, good.

Ok, here is a list of places within the kernel that access the PIT
timer, plus the method of locking (i386 arch only):

Usage:  Lock method:

arch/i386/kernel/time.c:170:spin_lock()
arch/i386/kernel/time.c:491:spin_lock()
arch/i386/kernel/time.c:575:none (init)
arch/i386/kernel/i8259.c:491:   none (init)
arch/i386/kernel/apm.c:871: cli()
arch/i386/kernel/apic.c:398:spin_lock_irqsave()

drivers/char/vt.c:121:  cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:80:  cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:99:  cli()
drivers/char/joystick/analog.c:142: cli() __cli()
drivers/char/joystick/gameport.c:66:cli()
drivers/ide/hd.c:137:   cli()
drivers/ide/ide.c:206:  __cli()

I guess we'll need to fix this. While races here are not likely (the
most likely is a beep by vt.c at a wrong moment), they're possible.

However, these don't seem to be the cause of the problem we see here
anyway.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Richard B. Johnson

On Thu, 26 Oct 2000, Vojtech Pavlik wrote:

> On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:
> 
> > ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> > to the timer. It writes 0 to the control-word for timer 0. This
> > does the following:
[Snipped...]
>  
> Well, at least on 2.4.0-test9, the above timing code is #ifed to
> DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
> include/linux/ide.h.
> 
> So this is not our problem here. Anyway I guess it's time to hunt for
> i8259 accesses in the kernel that lack the necessary spinlock, even when
> they're not probably the cause of the problem we see here.

Okay, good.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:

> ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
> to the timer. It writes 0 to the control-word for timer 0. This
> does the following:
> 
> o Selects timer 0.
> o Latches the timer.
> o Selects mode 0.
> o Programs it to a 16 bit counter.
> 
> The result is a latched (stopped) counter. Bits 5 and 4 should have been
> selected. Then you read bits 0-7 from 0x40, followed by bits 8-15  from
> the same port.
> 
> Also, there is no spin-lock protecting access to these ports. If anybody
> else is mucking with the timer, all bets are off.
 
Well, at least on 2.4.0-test9, the above timing code is #ifed to
DISK_RECOVERY_TIME > 0, which in turn is #defined to 0 in
include/linux/ide.h.

So this is not our problem here. Anyway I guess it's time to hunt for
i8259 accesses in the kernel that lack the necessary spinlock, even when
they're not probably the cause of the problem we see here.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Richard B. Johnson

On 26 Oct 2000, Yoann Vandoorselaere wrote:
> Vojtech Pavlik <[EMAIL PROTECTED]> writes:
[Snipped...]

../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
to the timer. It writes 0 to the control-word for timer 0. This
does the following:

o   Selects timer 0.
o   Latches the timer.
o   Selects mode 0.
o   Programs it to a 16 bit counter.

The result is a latched (stopped) counter. Bits 5 and 4 should have been
selected. Then you read bits 0-7 from 0x40, followed by bits 8-15  from
the same port.

Also, there is no spin-lock protecting access to these ports. If anybody
else is mucking with the timer, all bets are off.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik <[EMAIL PROTECTED]> writes:

> On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:
> 
> > > On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
> > > 
> > > > ...
> > > > 
> > > > Have you any idea what is the relation between time and this chip ?
> > > > 
> > > > Also, I'm experiencing the problem for several month on my 
> > > > workstation and I never could find where it was comming from...
> > > > how did you do ?
> > > 
> > > Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
> > > 
> > > I first located the wrong time was coming from gettimeofday() and not
> > > from the other sources of time the kernel provides. And then I was
> > > tracking the problem (which actually is an underflow - the chip bug
> > > causes some time offset variables go negative - 0x microseconds
> > > is about 1:20 hours). And this way I got to the spot where the patch
> > > cures the problem.
> > 
> > Ok, here is what I experienced :
> > 
> > First what is strange is that :
> > - I'm using SCSI
> > - I just have an IDE disk for mp3.
> > The IDE subsystem is never used heavilly...
> > 
> > I've experienced the problem after some time of 
> > heavy scsi IO, my screen under X was going black (like with dpms)
> > When I was moving the mouse, the image was coming back
> > for < 1 seconds, then black screen...
> > 
> > The only fix was to kill X then to reboot.
> > 
> > Anyway, thanks for your explaination...
> > I'll do a feedback for this patch ASAP.
> 
> Interesting. If it's caused by SCSI as well (might be), then it's not
> caused by heavy IDE activity but rather than that it could be heavy
> BusMastering activity instead (The IDE chip does BM as well).
> 
> I'm still wondering if it could be a Linux kernel bug (bad/concurrent
> accesses to the i8253 registers), this has to be checked.

An easy way to verify the problem is to start 'dbench 128',
I'm gonna do that with and without IDE subsystem to see what
happen.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
I worry about my child and the Internet all the time, even though she's too 
young to have logged on yet. Here's what I worry about. I worry that 10 or 15 
years from now, she will come to me and say 'Daddy, where were you when they 
took freedom of the press away from the Internet?'  -- Mike Godwin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:

> > On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
> > 
> > > ...
> > > 
> > > Have you any idea what is the relation between time and this chip ?
> > > 
> > > Also, I'm experiencing the problem for several month on my 
> > > workstation and I never could find where it was comming from...
> > > how did you do ?
> > 
> > Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
> > 
> > I first located the wrong time was coming from gettimeofday() and not
> > from the other sources of time the kernel provides. And then I was
> > tracking the problem (which actually is an underflow - the chip bug
> > causes some time offset variables go negative - 0x microseconds
> > is about 1:20 hours). And this way I got to the spot where the patch
> > cures the problem.
> 
> Ok, here is what I experienced :
> 
> First what is strange is that :
> - I'm using SCSI
> - I just have an IDE disk for mp3.
> The IDE subsystem is never used heavilly...
> 
> I've experienced the problem after some time of 
> heavy scsi IO, my screen under X was going black (like with dpms)
> When I was moving the mouse, the image was coming back
> for < 1 seconds, then black screen...
> 
> The only fix was to kill X then to reboot.
> 
> Anyway, thanks for your explaination...
> I'll do a feedback for this patch ASAP.

Interesting. If it's caused by SCSI as well (might be), then it's not
caused by heavy IDE activity but rather than that it could be heavy
BusMastering activity instead (The IDE chip does BM as well).

I'm still wondering if it could be a Linux kernel bug (bad/concurrent
accesses to the i8253 registers), this has to be checked.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:

  On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
  
   ...
   
   Have you any idea what is the relation between time and this chip ?
   
   Also, I'm experiencing the problem for several month on my 
   workstation and I never could find where it was comming from...
   how did you do ?
  
  Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
  
  I first located the wrong time was coming from gettimeofday() and not
  from the other sources of time the kernel provides. And then I was
  tracking the problem (which actually is an underflow - the chip bug
  causes some time offset variables go negative - 0x microseconds
  is about 1:20 hours). And this way I got to the spot where the patch
  cures the problem.
 
 Ok, here is what I experienced :
 
 First what is strange is that :
 - I'm using SCSI
 - I just have an IDE disk for mp3.
 The IDE subsystem is never used heavilly...
 
 I've experienced the problem after some time of 
 heavy scsi IO, my screen under X was going black (like with dpms)
 When I was moving the mouse, the image was coming back
 for  1 seconds, then black screen...
 
 The only fix was to kill X then to reboot.
 
 Anyway, thanks for your explaination...
 I'll do a feedback for this patch ASAP.

Interesting. If it's caused by SCSI as well (might be), then it's not
caused by heavy IDE activity but rather than that it could be heavy
BusMastering activity instead (The IDE chip does BM as well).

I'm still wondering if it could be a Linux kernel bug (bad/concurrent
accesses to the i8253 registers), this has to be checked.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Thu, Oct 26, 2000 at 04:42:31PM +0200, Yoann Vandoorselaere wrote:
 
   On Thu, Oct 26, 2000 at 04:20:43PM +0200, Yoann Vandoorselaere wrote:
   
...

Have you any idea what is the relation between time and this chip ?

Also, I'm experiencing the problem for several month on my 
workstation and I never could find where it was comming from...
how did you do ?
   
   Well, it integrates both the i8253 PIT and the vt82c586 IDE controller.
   
   I first located the wrong time was coming from gettimeofday() and not
   from the other sources of time the kernel provides. And then I was
   tracking the problem (which actually is an underflow - the chip bug
   causes some time offset variables go negative - 0x microseconds
   is about 1:20 hours). And this way I got to the spot where the patch
   cures the problem.
  
  Ok, here is what I experienced :
  
  First what is strange is that :
  - I'm using SCSI
  - I just have an IDE disk for mp3.
  The IDE subsystem is never used heavilly...
  
  I've experienced the problem after some time of 
  heavy scsi IO, my screen under X was going black (like with dpms)
  When I was moving the mouse, the image was coming back
  for  1 seconds, then black screen...
  
  The only fix was to kill X then to reboot.
  
  Anyway, thanks for your explaination...
  I'll do a feedback for this patch ASAP.
 
 Interesting. If it's caused by SCSI as well (might be), then it's not
 caused by heavy IDE activity but rather than that it could be heavy
 BusMastering activity instead (The IDE chip does BM as well).
 
 I'm still wondering if it could be a Linux kernel bug (bad/concurrent
 accesses to the i8253 registers), this has to be checked.

An easy way to verify the problem is to start 'dbench 128',
I'm gonna do that with and without IDE subsystem to see what
happen.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
I worry about my child and the Internet all the time, even though she's too 
young to have logged on yet. Here's what I worry about. I worry that 10 or 15 
years from now, she will come to me and say 'Daddy, where were you when they 
took freedom of the press away from the Internet?'  -- Mike Godwin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Richard B. Johnson

On 26 Oct 2000, Yoann Vandoorselaere wrote:
 Vojtech Pavlik [EMAIL PROTECTED] writes:
[Snipped...]

../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
to the timer. It writes 0 to the control-word for timer 0. This
does the following:

o   Selects timer 0.
o   Latches the timer.
o   Selects mode 0.
o   Programs it to a 16 bit counter.

The result is a latched (stopped) counter. Bits 5 and 4 should have been
selected. Then you read bits 0-7 from 0x40, followed by bits 8-15  from
the same port.

Also, there is no spin-lock protecting access to these ports. If anybody
else is mucking with the timer, all bets are off.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:

 ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
 to the timer. It writes 0 to the control-word for timer 0. This
 does the following:
 
 o Selects timer 0.
 o Latches the timer.
 o Selects mode 0.
 o Programs it to a 16 bit counter.
 
 The result is a latched (stopped) counter. Bits 5 and 4 should have been
 selected. Then you read bits 0-7 from 0x40, followed by bits 8-15  from
 the same port.
 
 Also, there is no spin-lock protecting access to these ports. If anybody
 else is mucking with the timer, all bets are off.
 
Well, at least on 2.4.0-test9, the above timing code is #ifed to
DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
include/linux/ide.h.

So this is not our problem here. Anyway I guess it's time to hunt for
i8259 accesses in the kernel that lack the necessary spinlock, even when
they're not probably the cause of the problem we see here.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Richard B. Johnson

On Thu, 26 Oct 2000, Vojtech Pavlik wrote:

 On Thu, Oct 26, 2000 at 12:04:21PM -0400, Richard B. Johnson wrote:
 
  ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
  to the timer. It writes 0 to the control-word for timer 0. This
  does the following:
[Snipped...]
  
 Well, at least on 2.4.0-test9, the above timing code is #ifed to
 DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
 include/linux/ide.h.
 
 So this is not our problem here. Anyway I guess it's time to hunt for
 i8259 accesses in the kernel that lack the necessary spinlock, even when
 they're not probably the cause of the problem we see here.

Okay, good.

Cheers,
Dick Johnson

Penguin : Linux version 2.2.17 on an i686 machine (801.18 BogoMips).

"Memory is like gasoline. You use it up when you are running. Of
course you get it all back when you reboot..."; Actual explanation
obtained from the Micro$oft help desk.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:

   ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
   to the timer. It writes 0 to the control-word for timer 0. This
   does the following:
 [Snipped...]
   
  Well, at least on 2.4.0-test9, the above timing code is #ifed to
  DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
  include/linux/ide.h.
  
  So this is not our problem here. Anyway I guess it's time to hunt for
  i8259 accesses in the kernel that lack the necessary spinlock, even when
  they're not probably the cause of the problem we see here.
 
 Okay, good.

Ok, here is a list of places within the kernel that access the PIT
timer, plus the method of locking (i386 arch only):

Usage:  Lock method:

arch/i386/kernel/time.c:170:spin_lock()
arch/i386/kernel/time.c:491:spin_lock()
arch/i386/kernel/time.c:575:none (init)
arch/i386/kernel/i8259.c:491:   none (init)
arch/i386/kernel/apm.c:871: cli()
arch/i386/kernel/apic.c:398:spin_lock_irqsave()

drivers/char/vt.c:121:  cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:80:  cli()
drivers/char/ftape/lowlevel/ftape-calibr.c:99:  cli()
drivers/char/joystick/analog.c:142: cli() __cli()
drivers/char/joystick/gameport.c:66:cli()
drivers/ide/hd.c:137:   cli()
drivers/ide/ide.c:206:  __cli()

I guess we'll need to fix this. While races here are not likely (the
most likely is a beep by vt.c at a wrong moment), they're possible.

However, these don't seem to be the cause of the problem we see here
anyway.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Thu, Oct 26, 2000 at 01:42:29PM -0400, Richard B. Johnson wrote:
 
../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
to the timer. It writes 0 to the control-word for timer 0. This
does the following:
  [Snipped...]

   Well, at least on 2.4.0-test9, the above timing code is #ifed to
   DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
   include/linux/ide.h.
   
   So this is not our problem here. Anyway I guess it's time to hunt for
   i8259 accesses in the kernel that lack the necessary spinlock, even when
   they're not probably the cause of the problem we see here.
  
  Okay, good.
 
 Ok, here is a list of places within the kernel that access the PIT
 timer, plus the method of locking (i386 arch only):

[...]

Ok, I just tested if the problem was always present without
the IDE subsystem...

The answer is it is not... so it isn't an IDE problem.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:

 ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
 to the timer. It writes 0 to the control-word for timer 0. This
 does the following:
   [Snipped...]
 
Well, at least on 2.4.0-test9, the above timing code is #ifed to
DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
include/linux/ide.h.

So this is not our problem here. Anyway I guess it's time to hunt for
i8259 accesses in the kernel that lack the necessary spinlock, even when
they're not probably the cause of the problem we see here.
   
   Okay, good.
  
  Ok, here is a list of places within the kernel that access the PIT
  timer, plus the method of locking (i386 arch only):
 
 [...]
 
 Ok, I just tested if the problem was always present without
 the IDE subsystem...
 
 The answer is it is not... so it isn't an IDE problem.

Uh, guess too many negations. You wanted to say that the problem was
present even when you disabled the IDE subsystem, right?

So now it seems that possibly enough PCI traffic / busmastering traffic
can cause the problem ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Thu, Oct 26, 2000 at 10:11:54PM +0200, Yoann Vandoorselaere wrote:
 
  ../drivers/block/ide.c, line 162, on version 2.2.17 does bad things
  to the timer. It writes 0 to the control-word for timer 0. This
  does the following:
[Snipped...]
  
 Well, at least on 2.4.0-test9, the above timing code is #ifed to
 DISK_RECOVERY_TIME  0, which in turn is #defined to 0 in
 include/linux/ide.h.
 
 So this is not our problem here. Anyway I guess it's time to hunt for
 i8259 accesses in the kernel that lack the necessary spinlock, even when
 they're not probably the cause of the problem we see here.

Okay, good.
   
   Ok, here is a list of places within the kernel that access the PIT
   timer, plus the method of locking (i386 arch only):
  
  [...]
  
  Ok, I just tested if the problem was always present without
  the IDE subsystem...
  
  The answer is it is not... so it isn't an IDE problem.
 
 Uh, guess too many negations. You wanted to say that the problem was
 present even when you disabled the IDE subsystem, right?

yop

 
 So now it seems that possibly enough PCI traffic / busmastering traffic
 can cause the problem ...

yop, I 've done :

make -j10 World 
in the xfree tree and simulateously :

while true; do make dep  make clean  make bzImage; done
in the kernel tree


-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:

 yop, I 've done :
 
 make -j10 World 
 in the xfree tree and simulateously :
 
 while true; do make dep  make clean  make bzImage; done
 in the kernel tree

Now it'd be nice to verify that the problem also happens when the system
is not running out of memory (which -j10 quite causes I think) ...

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Yoann Vandoorselaere

Vojtech Pavlik [EMAIL PROTECTED] writes:

 On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
 
  yop, I 've done :
  
  make -j10 World 
  in the xfree tree and simulateously :
  
  while true; do make dep  make clean  make bzImage; done
  in the kernel tree
 
 Now it'd be nice to verify that the problem also happens when the system
 is not running out of memory (which -j10 quite causes I think) ...

Nope, my system was loaded, but was usable
(at least until the problem occured)...

Athlon 750 with 128mb of ram and 103mb of swap.

-- 
-- Yoann http://www.mandrakesoft.com/~yoann/
   An engineer from NVidia, while asking him to release cards specs said :
"Actually, we do write our drivers without documentation."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/



Re: Possible critical VIA vt82c686a chip bug (private question)

2000-10-26 Thread Vojtech Pavlik

On Thu, Oct 26, 2000 at 11:24:38PM +0200, Yoann Vandoorselaere wrote:
 Vojtech Pavlik [EMAIL PROTECTED] writes:
 
  On Thu, Oct 26, 2000 at 11:05:04PM +0200, Yoann Vandoorselaere wrote:
  
   yop, I 've done :
   
   make -j10 World 
   in the xfree tree and simulateously :
   
   while true; do make dep  make clean  make bzImage; done
   in the kernel tree
  
  Now it'd be nice to verify that the problem also happens when the system
  is not running out of memory (which -j10 quite causes I think) ...
 
 Nope, my system was loaded, but was usable
 (at least until the problem occured)...

Good to know.

-- 
Vojtech Pavlik
SuSE Labs
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/