Re: Problem with ata layer in 2.6.24

2008-02-01 Thread Tejun Heo
Kasper Sandberg wrote:
 to put some timeline perspective into this.
 i believe it was in 2005 i assembled the system, and when i realized it
 was faulty, on old ide driver, i stopped using it - that miht have been
 in beginning of 2006. then for almost a year i werent using it, hoping
 to somehow fix it, but in january 2007 i think it was, atleast in the
 very beginning of 2007, i hit upon the idea of trying libata, and ever
 since the system has been running 24/7 - doing these errors around 2
 times a day.
 
 i have multiple times reported my problems to lkml, but nothing has
 happened, i also tried to aproeach jgarzik direcly, but he was not
 interested.
 
 i really hope this can be solved now, its a huge problem
 
 my fileserver has an asus k8v motherboard, with via chipset (k8t880 i
 think it is, or something like it). currently using the promise
 controller again(strangely enough all the timeouts seems to happen here,
 and when the ITE was on, there, not the onboard one), in conjunction
 with the onboard via.

Timeouts are nasty to debug.  It can be caused by whole range of
different problems including transmission errors, bad power, faulty
drive, mishandled media error, IRQ misrouting, dumb hardware bug.  It's
basically 'uh... I told the controller to do something but it never
called me back'.

If you see timeouts on multiple devices connected to different
controllers, the chance is that you have problem somewhere else.  The
most likely culprit is bad power.  Please...

* Post the result of 'lspci -nn' and kernel log including full boot log
and error messages.

* Try to isolate the problem.  ie. Does removing several number of
drives fix the problem?  If the problem is localized to certain device,
what happens if you move it?  Does the problem follow the drive or stay
with the port?  If the failing drives are SATA, it's a good idea to
power some of the failing drives with a separate PSU and see whether
anything is different.

By trying to isolate the hardware problem, more can be learned about the
error condition and even when the problem actually isn't hardware
problem, it gives us much deeper insight of the problem and clues
regarding where to look.

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
 say, would be really welcome and, I'm guessing, widely used.

We don't see very many libata problems at the distro level and they for
the most part boil down to

- error messages looking different - Most bugs I get are things like
media errors (timeout looks different, UNC report looks different)

- broken hardware - I've closed a whole raft of bugs that turn out to be
new PC systems where even the BIOS doesn't see the drives

- faulty hardware being picked up because we actually do real error
checking now. We now check for and give some devices more slack while
still doing error checking. Both IDE layers also added blacklists for
stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway
 
- pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and
as it affects only a few chip variants hard to figure out. Workaround
libata.dma=1

- CS handling. On a few boxes using cable select (particularly on one
drive and not the other) shows up a problem, normally a failed SRST.
That's still under investigation.

- Promise timeouts. The old IDE times out then polls the device and finds
the IRQ was never sent and then recovers so the user sees a short stall
but no errors. The new libata doesn't do this and pdc202xx_old thus
produces some error messages on some boxes. Backup polling is on my todo
list.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Alan Cox wrote:
 not one problem but lots---is sufficiently widespread that a Mini HOWTO,
 say, would be really welcome and, I'm guessing, widely used.

We don't see very many libata problems at the distro level and they for
the most part boil down to

- error messages looking different - Most bugs I get are things like
media errors (timeout looks different, UNC report looks different)

- broken hardware - I've closed a whole raft of bugs that turn out to be
new PC systems where even the BIOS doesn't see the drives

- faulty hardware being picked up because we actually do real error
checking now. We now check for and give some devices more slack while
still doing error checking. Both IDE layers also added blacklists for
stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway

- pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and
as it affects only a few chip variants hard to figure out. Workaround
libata.dma=1

- CS handling. On a few boxes using cable select (particularly on one
drive and not the other) shows up a problem, normally a failed SRST.
That's still under investigation.

- Promise timeouts. The old IDE times out then polls the device and finds
the IRQ was never sent and then recovers so the user sees a short stall
but no errors. The new libata doesn't do this and pdc202xx_old thus
produces some error messages on some boxes. Backup polling is on my todo
list.

I have not had a problem, no errors at all, since I rebooted to 
2.6.24-rc8 with the added argument in the kernel line in grub 
(from dmesg):
[0.00] Kernel command line: ro root=/dev/VolGroup00/LogVol00 
acpi_use_timer_override rhgb quiet

which causes dmesg to log, some time later:

[   27.581823] ENABLING IO-APIC IRQs
[   27.582014] ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
[   27.592017] ..MP-BIOS bug: 8254 timer not connected to IO-APIC
[   27.592068] ...trying to set up timer (IRQ0) through the 8259A ...  failed.
[   27.592071] ...trying to set up timer as Virtual Wire IRQ... works.
[   27.703623] Brought up 1 CPUs

This was about noonish yesterday, and the logs have been silent 
regarding this 'exception Emask' error since then.  The drive itself
has also passed a smartctl -t long test with no errors since then.

Now, the last boot that had the problem was to 2.6.24, which did 
NOT have that 'acpi_use_timer_override' argument, and its dmesg logged:

[   24.934176] ENABLING IO-APIC IRQs
[   24.934367] ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1
[   25.045973] Brought up 1 CPUs

Now, my question is, did the use of that argument, while it looked
like it failed, cause the setup code to do something correct that
the default path didn't do?  Is this the clue we're all looking for?

Since libata is apparently the path taken by TPTB, I'm going to build
and boot to a 2.6.24 using libata, but add that argument to grubs kernel
line in only one of 2 copies of that stanza.

Wish me luck.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
The intelligence of any discussion diminishes with the square of the
number of participants.
-- Adam Walinsky
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 As slight change here, I was going to use the same .config as 2.6.24-rc8, but 
 just discovered that neither rc8 nor final is finding the drivers for my

If it is not finding a driver that is nothing to do with libata. It means
it's not being loaded by the distribution, or the distribution kernel is
too old (2.6.22) for the hardware - in which case see the Fedora respins
which are on 2.6.23.something right now.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
  
 Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB.

Depends how the memory is mapped. Any memory physically above the 4GB
boundary

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread rgheck

Alan Cox wrote:
not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
say, would be really welcome and, I'm guessing, widely used.



We don't see very many libata problems at the distro level and they for
the most part boil down to

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway
  

Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB.

Richard

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mikael Pettersson
Gene Heskett writes:
  On Tuesday 29 January 2008, Alan Cox wrote:
   As slight change here, I was going to use the same .config as 2.6.24-rc8,
   but just discovered that neither rc8 nor final is finding the drivers for
   my
  
  If it is not finding a driver that is nothing to do with libata. It means
  it's not being loaded by the distribution, or the distribution kernel is
  too old (2.6.22) for the hardware - in which case see the Fedora respins
  which are on 2.6.23.something right now.
  
  Alan
  
  Home built kernel Alan.  But you are as good as anyone to tell me what I 
  need to turn on in order for this dvdwriter to be enabled:
  [   28.862478] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66
  
  [   28.908647] ata2.00: limited to UDMA/33 due to 40-wire cable
  [   29.081253] ata2.00: configured for UDMA/33
  
  it has had several 80 wire cables tried, hasn't fixed this, and does not
  seem to effect its operation when it does work.
  
  [   29.132405] scsi 1:0:0:0: CD-ROMLITE-ON  DVDRW SHM-165H6S 
  HS06 PQ: 0 ANSI: 5
  
  [   43.450795] scsi 1:0:0:0: Attached scsi generic sg1 type 5
  ---
  No further mention of it in dmesg, and k3b cannot find the drive at any 
  /dev/sgX address.
  
  .config attached, what else do I need to turn on?

...

  # CONFIG_BLK_DEV_SR is not set

For starters, enable CONFIG_BLK_DEV_SR.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Mark Lord wrote:
Gene Heskett wrote:
..
 Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number
 when dmesg says its found ok at ata2.00?  I've turned on an option that
 says something about using the bios for device access this build, but I'll
 be surprised if that's it. :)

..

It should show up as /dev/scd0 or something very similar.

Tisn't.  Darnit.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
clock speed
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mark Lord

Gene Heskett wrote:

..
Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number 
when dmesg says its found ok at ata2.00?  I've turned on an option that says 
something about using the bios for device access this build, but I'll be 
surprised if that's it. :)

..

It should show up as /dev/scd0 or something very similar.

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Alan Cox wrote:
 not one problem but lots---is sufficiently widespread that a Mini HOWTO,
 say, would be really welcome and, I'm guessing, widely used.

We don't see very many libata problems at the distro level and they for
the most part boil down to

- error messages looking different - Most bugs I get are things like
media errors (timeout looks different, UNC report looks different)

- broken hardware - I've closed a whole raft of bugs that turn out to be
new PC systems where even the BIOS doesn't see the drives

- faulty hardware being picked up because we actually do real error
checking now. We now check for and give some devices more slack while
still doing error checking. Both IDE layers also added blacklists for
stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway

- pata_ali MWDMA with ATAPI, PIO works fine, all a bit of a mystery and
as it affects only a few chip variants hard to figure out. Workaround
libata.dma=1

- CS handling. On a few boxes using cable select (particularly on one
drive and not the other) shows up a problem, normally a failed SRST.
That's still under investigation.

- Promise timeouts. The old IDE times out then polls the device and finds
the IRQ was never sent and then recovers so the user sees a short stall
but no errors. The new libata doesn't do this and pdc202xx_old thus
produces some error messages on some boxes. Backup polling is on my todo
list.

As slight change here, I was going to use the same .config as 2.6.24-rc8, but 
just discovered that neither rc8 nor final is finding the drivers for my
dvd writer while using libata, so its not useable.  So I've enable a couple of 
things in the 2.6.24 build that aren't in the 2.6.24-rc8.  When I find the 
magic twanger, I'll rebuild -rc8 with it too.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
River: He didn't lie down.  They never lie down.
--Serenity
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Jeff Garzik wrote:
Gene Heskett wrote:
 Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number
 when dmesg says its found ok at ata2.00?  I've turned on an option that
 says something about using the bios for device access this build, but I'll
 be surprised if that's it. :)

I think you mean /dev/scdx not /dev/sdx.  Make sure you have the 'sr'
driver compiled and load (CONFIG_BLK_DEV_SR).

That menu item COULD be moved, I don't have any REAL scsi stuff, so I didn't 
look there.  My bad, with help from hiding it like that. :-)

The bios-for-dev-access thing definitely won't help, and may hurt (by
taking over the device you wanted to test).

Ok, if BLK_DEV_SR fails, I'll take that back out.  I'm heating the room making 
kernels here. :)

Thanks Jeff.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Life sucks, but death doesn't put out at all.
-- Thomas J. Kopp
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Mikael Pettersson wrote:
Gene Heskett writes:
  On Tuesday 29 January 2008, Alan Cox wrote:
   As slight change here, I was going to use the same .config as
   2.6.24-rc8, but just discovered that neither rc8 nor final is finding
   the drivers for my
  
  If it is not finding a driver that is nothing to do with libata. It
   means it's not being loaded by the distribution, or the distribution
   kernel is too old (2.6.22) for the hardware - in which case see the
   Fedora respins which are on 2.6.23.something right now.
  
  Alan
 
  Home built kernel Alan.  But you are as good as anyone to tell me what I
  need to turn on in order for this dvdwriter to be enabled:
  [   28.862478] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max
  UDMA/66 
  [   28.908647] ata2.00: limited to UDMA/33 due to 40-wire cable
  [   29.081253] ata2.00: configured for UDMA/33
  
  it has had several 80 wire cables tried, hasn't fixed this, and does not
  seem to effect its operation when it does work.
  
  [   29.132405] scsi 1:0:0:0: CD-ROMLITE-ON  DVDRW SHM-165H6S
  HS06 PQ: 0 ANSI: 5 
  [   43.450795] scsi 1:0:0:0: Attached scsi generic sg1 type 5
  ---
  No further mention of it in dmesg, and k3b cannot find the drive at any
  /dev/sgX address.
 
  .config attached, what else do I need to turn on?

...

  # CONFIG_BLK_DEV_SR is not set

For starters, enable CONFIG_BLK_DEV_SR.

That could stand to be moved or renamed, it is well buried in the menu for the 
REAL scsi stuffs, which I don't have any of.  Enabled  building now.  
Thanks.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
An air of FRENCH FRIES permeates my nostrils!!
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread rgheck

Mark Lord wrote:

Gene Heskett wrote:

..
Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' 
number when dmesg says its found ok at ata2.00?  I've turned on an 
option that says something about using the bios for device access 
this build, but I'll be surprised if that's it. :)

..

It should show up as /dev/scd0 or something very similar.


Does it appear as /dev/sr0? Try ll /dev/s* and see what you get.

Anyway, these /dev/ entries are produced by udev, not by libata.

rh

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Florian Attenberger wrote:
On Mon, 28 Jan 2008 14:13:21 -0500

Gene Heskett [EMAIL PROTECTED] wrote:
  I had to reboot early this morning due to a freezeup, and I had a
  bunch of these in the messages log:
  ==
  Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask
  0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel:
  [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0
  dma 4096 out Jan 27 19:42:11 coyote kernel: [42461.915974]  res
  40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11
  coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11
  coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12
  coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27
  19:42:12 coyote kernel: [42462.078232] ata1: EH complete
  Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda]
  390721968 512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote
  kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27
  19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache:
  enabled, read cache: enabled, doesn't support DPO or FUA
  ===

I had this error too, or maybe only a similar one, and another, neither
of which of i still have the error output laying around, so I'm posting both
fixes, that i found here on lkml:
1) disabling ncq like that:
echo 1  /sys/block/sda/device/queue_depth

Interesting..

2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch
( applies to 2.6.24 too )

Signed-off-by: Mark Lord [EMAIL PROTECTED]
---

--- old/drivers/ata/libata-sff.c   2007-09-28 09:29:22.0 -0400
+++ linux/drivers/ata/libata-sff.c 2007-09-28 09:39:44.0 -0400
@@ -420,6 +420,28 @@
   ap-ops-irq_on(ap);
 }

+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+  u8 stat = ata_chk_status(ap);
+  /*
+   * Try to clear stuck DRQ if necessary,
+   * by reading/discarding up to two sectors worth of data.
+   */
+  if ((stat  ATA_DRQ)  (!qc || qc-dma_dir != DMA_TO_DEVICE)) {
+  unsigned int i;
+  unsigned int limit = qc ? qc-sect_size : ATA_SECT_SIZE;
+
+  printk(KERN_WARNING Draining up to %u words from data FIFO.\n,
+  limit);
+  for (i = 0; i  limit ; ++i) {
+  ioread16(ap-ioaddr.data_addr);
+  if (!(ata_chk_status(ap)  ATA_DRQ))
+  break;
+  }
+  printk(KERN_WARNING Drained %u/%u words.\n, i, limit);
+  }
+}
+
 /**
  *ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  *@ap: port to handle error for
@@ -476,7 +498,7 @@
   }

   ata_altstatus(ap);
-  ata_chk_status(ap);
+  ata_drain_fifo(ap, qc);
   ap-ops-irq_clear(ap);

   spin_unlock_irqrestore(ap-lock, flags);
-

This too.  Thanks Florian.  I'll keep these in mind as there may be more than 
one cat in need of skinning here.

See a couple of posts I made to lkml this morning for the investigation I'm 
doing re the kernel argument 'acpi_use_timer_override', experimental builds 
under way right now.

Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number 
when dmesg says its found ok at ata2.00?  I've turned on an option that says 
something about using the bios for device access this build, but I'll be 
surprised if that's it. :)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Ah, sweet Springtime, when a young man lightly turns his fancy over!
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Daniel Barkalow wrote:
On Tue, 29 Jan 2008, Gene Heskett wrote:
 For starters, enable CONFIG_BLK_DEV_SR.

 That could stand to be moved or renamed, it is well buried in the menu for
 the REAL scsi stuffs, which I don't have any of.  Enabled  building now.

The SCSI support type (disk, tape, CD-ROM) section of that menu actually
applies to all ATA-command-set devices that don't use the old IDE code.
For example, usb-storage uses SCSI disk out of that section, and
I've only seen Probe all LUNs on each SCSI device be needed for a
particular USB card reader with two slots. At this point, most of the
things in the kernel that refer to SCSI probably should say storage (or
ATA, really, but that would make the acronyms confusing).

Incidentally, you should be able to save debugging time for problems like
missing sr by building it as a module, which will build really quickly
and not require a reboot to test.

   -Daniel
*This .sig left intentionally blank*

I did, Daniel, but while that has worked, its not been 100% foolproof in the 
past, so I just waste the 9 minutes building a new kernel as cheap insurance.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Mal: If it's Alliance trouble you got, you might want to consider another
ship. Some onboard here fought for the Independents.
--Episode #8, Out of Gas
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 things in the kernel that refer to SCSI probably should say storage (or 
 ATA, really, but that would make the acronyms confusing).

SCSI is a command protocol. It is what your CD-ROM drive and USB storage
devices talk (albeit with a bit of an accent).

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox

 Don't know. Is there an easy way to find out?

E820 map on boot shows you I think.

 By the way, and on a totally different subject. I wonder if this:
 MODULE_DESCRIPTION(low-level driver for AMD PATA IDE);
 mightn't be changed to something like:
 MODULE_DESCRIPTION(low-level driver for AMD and nVidia PATA IDE);

Fair point. I'll add that so people can find the early Nvidia stuff.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Gene Heskett wrote:

 For starters, enable CONFIG_BLK_DEV_SR.
 
 That could stand to be moved or renamed, it is well buried in the menu for 
 the 
 REAL scsi stuffs, which I don't have any of.  Enabled  building now.  

The SCSI support type (disk, tape, CD-ROM) section of that menu actually 
applies to all ATA-command-set devices that don't use the old IDE code. 
For example, usb-storage uses SCSI disk out of that section, and 
I've only seen Probe all LUNs on each SCSI device be needed for a 
particular USB card reader with two slots. At this point, most of the 
things in the kernel that refer to SCSI probably should say storage (or 
ATA, really, but that would make the acronyms confusing).

Incidentally, you should be able to save debugging time for problems like 
missing sr by building it as a module, which will build really quickly 
and not require a reboot to test.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread rgheck

Alan Cox wrote:
  
  

Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB.



Depends how the memory is mapped. Any memory physically above the 4GB
boundary

  

Don't know. Is there an easy way to find out?

By the way, and on a totally different subject. I wonder if this:
MODULE_DESCRIPTION(low-level driver for AMD PATA IDE);
mightn't be changed to something like:
MODULE_DESCRIPTION(low-level driver for AMD and nVidia PATA IDE);
It took a fair bit if digging in /sys/ to figure out why I was loading 
pata_amd.


Richard

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Jeff Garzik

Gene Heskett wrote:
Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number 
when dmesg says its found ok at ata2.00?  I've turned on an option that says 
something about using the bios for device access this build, but I'll be 
surprised if that's it. :)


I think you mean /dev/scdx not /dev/sdx.  Make sure you have the 'sr' 
driver compiled and load (CONFIG_BLK_DEV_SR).


The bios-for-dev-access thing definitely won't help, and may hurt (by 
taking over the device you wanted to test).


Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  things in the kernel that refer to SCSI probably should say storage (or 
  ATA, really, but that would make the acronyms confusing).
 
 SCSI is a command protocol. It is what your CD-ROM drive and USB storage
 devices talk (albeit with a bit of an accent).

Among other things, yes. But SCSI standards also specify electrical 
interfaces that aren't at all related to the electrical interfaces used by 
a lot of devices, and a lot of the places the kernel uses the term suggest 
that it's also talking about the electrical interface (or, at least, 
connector shape). For example, it's misleading to talk about SCSI CDROM 
support meaning the command protocol when hardly anybody has ever seen a 
CDROM drive that doesn't use the SCSI command protocol, but most people 
know about both SCSI-connector and PATA-connector CDROM drives.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mark Lord

rgheck wrote:

Alan Cox wrote:
not one problem but lots---is sufficiently widespread that a Mini 
HOWTO, say, would be really welcome and, I'm guessing, widely used.



We don't see very many libata problems at the distro level and they for
the most part boil down to

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway
  

Is this 4GB or =4GB? I've seen contradictory reports, and I've got 4GB.

..

For all practical purposes, most memory over 3GB (or sometimes even 2GB)
on a 32-bit x86 system is treated as 4GB by the motherboard.

Because it's not the amount of *memory* that matters so much,
but rather the amount of *used address space*.  Video cards,
PCI devices, other motherboard resources etc.. can all subtract
from the available address space, leaving much less than 4GB
for your RAM. 


-ml

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mark Lord

Gene Heskett wrote:

On Tuesday 29 January 2008, Mark Lord wrote:

Gene Heskett wrote:

..
Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx' number
when dmesg says its found ok at ata2.00?  I've turned on an option that
says something about using the bios for device access this build, but I'll
be surprised if that's it. :)

..

It should show up as /dev/scd0 or something very similar.


Tisn't.  Darnit.

..

It requires CONFIG_SCSI, CONFIG_BLK_DEV_SD, CONFIG_BLK_DEV_SR, in the kernel 
.config.

The _SR one (SCSI Reader) is for CD/DVD support.

Cheers

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
  say, would be really welcome and, I'm guessing, widely used.
 
 We don't see very many libata problems at the distro level and they for
 the most part boil down to
 
 - error messages looking different - Most bugs I get are things like
 media errors (timeout looks different, UNC report looks different)

The SCSI error reporting really ought to include a simple interpretation 
of the error for end users (The drive doesn't support this command A 
sector's data got lost The drive timed out The drive failed The 
drive is entirely gone). There's too much similarity between the message 
you get when you try a SMART test that doesn't apply to the drive and what 
you get when the drive is broken.

 - faulty hardware being picked up because we actually do real error
 checking now. We now check for and give some devices more slack while
 still doing error checking. Both IDE layers also added blacklists for
 stuff like the TSScorp DVD drives. Qemu has now had its bugs patched.

I think this is the big source of unhappy users (and, of course, they all 
look the same and the reports stay findable by Google, so it looks a lot 
worse than it is). People getting this problem in distro kernels probably 
really do want to have a way to report it with enough detail from logs to 
get it dealt with and then switch back to old IDE until the fix propagates 
through.

And it's possible that the error recovery is suboptimal in some cases. It 
seems to like resetting drives too much; perhaps if it keeps seeing the 
same problem and resetting the drive, it should decide that the drive's 
error reporting is just bad and just ignore that error like the old IDE 
did (but, in this case, after saying what it's doing).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 The SCSI error reporting really ought to include a simple interpretation 
 of the error for end users (The drive doesn't support this command A 
 sector's data got lost The drive timed out The drive failed The 
 drive is entirely gone). There's too much similarity between the message 
 you get when you try a SMART test that doesn't apply to the drive and what 
 you get when the drive is broken.

That would be the SCSI verbose messages option. I think the Eric
Youngdale consortium added it about Linux 1.2. Nowdays its always built
that way.

 And it's possible that the error recovery is suboptimal in some cases. It 
 seems to like resetting drives too much; perhaps if it keeps seeing the 
 same problem and resetting the drive, it should decide that the drive's 
 error reporting is just bad and just ignore that error like the old IDE 
 did (but, in this case, after saying what it's doing).

Nothing like casually praying the users data hasn't gone for a walk is
there. If we don't act on them the users don't report them until
something really bad occurs so that isn't an option.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Jeff Garzik wrote:
Gene Heskett wrote:
 On Tuesday 29 January 2008, Jeff Garzik wrote:
 Gene Heskett wrote:
 Does anyone know why my dvdwriter isn't being assigned a '/dev/sdx'
 number when dmesg says its found ok at ata2.00?  I've turned on an
 option that says something about using the bios for device access this
 build, but I'll be surprised if that's it. :)

 I think you mean /dev/scdx not /dev/sdx.  Make sure you have the 'sr'
 driver compiled and load (CONFIG_BLK_DEV_SR).

 That menu item COULD be moved, I don't have any REAL scsi stuff, so I
 didn't look there.  My bad, with help from hiding it like that. :-)

 The bios-for-dev-access thing definitely won't help, and may hurt (by
 taking over the device you wanted to test).

 Ok, if BLK_DEV_SR fails, I'll take that back out.  I'm heating the room
 making kernels here. :)

I can say with 100% certainty that 'sr' is required in order to use your
dvd writer with libata.  :)

   Jeff

And as usual, you are 100% correct, thanks.

And now back to our regularly scheduled testing for 'exception Emask' 
errors. :)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Main's Law:
For every action there is an equal and opposite government program.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 That could stand to be moved or renamed, it is well buried in the menu for 
 the 
 REAL scsi stuffs, which I don't have any of.  

Yes you do - USB storage and ATAPI are SCSI
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mark Lord

rgheck wrote:

Mark Lord wrote:

rgheck wrote:

Alan Cox wrote:
not one problem but lots---is sufficiently widespread that a Mini 
HOWTO, say, would be really welcome and, I'm guessing, widely used.



We don't see very many libata problems at the distro level and they for
the most part boil down to

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway
  
Is this 4GB or =4GB? I've seen contradictory reports, and I've got 
4GB.

..

For all practical purposes, most memory over 3GB (or sometimes even 2GB)
on a 32-bit x86 system is treated as 4GB by the motherboard.

Because it's not the amount of *memory* that matters so much,
but rather the amount of *used address space*.  Video cards,
PCI devices, other motherboard resources etc.. can all subtract
from the available address space, leaving much less than 4GB
for your RAM.


Right. So it looks like I do have this issue, though I haven't seen any 
actual problems on 24. Is there a known workaround?

..

For now, the workaround is to not enable the RAM above 4GB.
Your kernel .config file should therefore have these two lines:

CONFIG_HIGHMEM4G=y
# CONFIG_HIGHMEM64G is not set

Later, once the issue is fixed at the driver level (soon),
you can get your high memory back again by enabling CONFIG_HIGHMEM64G,
though this will cost a few percent of performance in the extra
page table overhead it creates.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread rgheck

Mark Lord wrote:

rgheck wrote:

Alan Cox wrote:
not one problem but lots---is sufficiently widespread that a Mini 
HOWTO, say, would be really welcome and, I'm guessing, widely used.



We don't see very many libata problems at the distro level and they for
the most part boil down to

- sata_nv with 4GB of RAM, knowing being worked on, no old IDE driver
anyway
  
Is this 4GB or =4GB? I've seen contradictory reports, and I've got 
4GB.

..

For all practical purposes, most memory over 3GB (or sometimes even 2GB)
on a 32-bit x86 system is treated as 4GB by the motherboard.

Because it's not the amount of *memory* that matters so much,
but rather the amount of *used address space*.  Video cards,
PCI devices, other motherboard resources etc.. can all subtract
from the available address space, leaving much less than 4GB
for your RAM.


Right. So it looks like I do have this issue, though I haven't seen any 
actual problems on 24. Is there a known workaround?


rh

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Daniel Barkalow
On Tue, 29 Jan 2008, Alan Cox wrote:

  The SCSI error reporting really ought to include a simple interpretation 
  of the error for end users (The drive doesn't support this command A 
  sector's data got lost The drive timed out The drive failed The 
  drive is entirely gone). There's too much similarity between the message 
  you get when you try a SMART test that doesn't apply to the drive and what 
  you get when the drive is broken.
 
 That would be the SCSI verbose messages option. I think the Eric
 Youngdale consortium added it about Linux 1.2. Nowdays its always built
 that way.

I've seen a lot of verbosity out of SCSI messages, but I haven't seen a 
straightforward interpretation of the problem in there. It's all 
information useful for debugging, not information useful for system 
administration.

  And it's possible that the error recovery is suboptimal in some cases. It 
  seems to like resetting drives too much; perhaps if it keeps seeing the 
  same problem and resetting the drive, it should decide that the drive's 
  error reporting is just bad and just ignore that error like the old IDE 
  did (but, in this case, after saying what it's doing).
 
 Nothing like casually praying the users data hasn't gone for a walk is
 there. If we don't act on them the users don't report them until
 something really bad occurs so that isn't an option.

On the other hand, bringing the system down because a device is 
misbehaving is a poor idea. I've personally recovered most of the data off 
of a dying drive because the system was willing to let me keep using the 
drive anyway; IIRC, the drive didn't work at all after a reboot, so I 
would have lost all the data instead of only a little had the system 
insisted on a perfectly functioning drive in order to use it at all.

There ought to be some middle ground between doing nothing until the 
computer really breaks and breaking the computer before then, but that's 
an issue not specific to libata.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 I've seen a lot of verbosity out of SCSI messages, but I haven't seen a 
 straightforward interpretation of the problem in there. It's all 
 information useful for debugging, not information useful for system 
 administration.

It tells you what is going on. Unfortunately that frequently requires
some basic knowledge of how to interpret the error report. Drive
interface behaviour simply doesn't boil down to a fault light on the
dashboard or a tighten the cable. For most common fault types you'll
get errors most administrators should find meaningful - like Media error

 On the other hand, bringing the system down because a device is 
 misbehaving is a poor idea. I've personally recovered most of the data off 

Hence we have RAID and SATA hotplug.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Gene Heskett
On Tuesday 29 January 2008, Alan Cox wrote:
 That could stand to be moved or renamed, it is well buried in the menu for
 the REAL scsi stuffs, which I don't have any of.

Yes you do - USB storage and ATAPI are SCSI

By the linux software definition maybe.  But I've defined scsi as that which 
uses a 50 wire cable using 50 contact centronics connectors since the 
mid '70's, and which often needs a ready supply of nubile virgins to 
sacrifice to make it work, particularly with the old resistor pack 
terminations  psu's whose 5 volt line is only 4.85 volts due to old age.  
That's what I call REAL scsi.  Its also a REAL PITA if the terms aren't 
active.

You can call what you are doing 'scsi' because you are using much the same 
command structure, and that is good, but its not the real thing with all its 
hardware warts and/or capabilities.  For one thing, this version usually 
works. :)

Furinstance, you can tell 2 scsi devices on the same controller to talk to 
each other, moving files from one to the other, and the host controller can 
then goto sleep  the cpu isn't involved until the devices send it a wakeup 
to advise the controller that the transfer has been done, and the controller 
may or may not then interrupt and advise the cpu.  You can do that with 
separate controllers too as long as they have a compatible DMA channel 
available to both.

I doubt libata has that capability now, or ever will, cuz these ide/atapi 
devices are generally dumber than rocks about that.  But any device claiming 
to be scsi-II is supposed to be able to do those sorts of things while the 
cpu is off crunching numbers for BOINC or whatever.

But that puts my mild objections to classifying this as 'scsi' in a more 
understandable context.  :-)

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
When some people decide it's time for everyone to make big changes,
it means that they want you to change first.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Alan Cox
 By the linux software definition maybe.  But I've defined scsi as that which 
 uses a 50 wire cable using 50 contact centronics connectors since the 
 mid '70's, and which often needs a ready supply of nubile virgins t

25, 50 or 68, with multiple voltage levels, plus of course it might be
over fibre or copper FC loop and ..

SCSI is a protocol.

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread rgheck


Gene,

If you still want to try it, I did manage to get the old IDE subsystem 
working. The issue with pata_amd concerns modprobe.conf. You probably 
have an alias to it there, as Fedora seems to insert these. (I don't 
know if they're actually needed or not.) If you comment out that line, 
then mkinitrd will run successfully, and you can try it that way.


By the way, is there an easy way to use different modprobe.conf files 
with different kernels?


Do make sure that you're building whatever drivers you need for your 
particular IDE chipset. (This is under IDE chipset support.) I suppose 
it's safe to build them all as modules. You may also want to compile 
ide-scsi (SCSI emulation support), as some older CD drives seem to need 
this, in the form of an hdx=ide-scsi command line option.


Richard

Gene Heskett wrote:

On Tuesday 29 January 2008, Alan Cox wrote:
  

That could stand to be moved or renamed, it is well buried in the menu for
the REAL scsi stuffs, which I don't have any of.
  

Yes you do - USB storage and ATAPI are SCSI



By the linux software definition maybe.  But I've defined scsi as that which 
uses a 50 wire cable using 50 contact centronics connectors since the 
mid '70's, and which often needs a ready supply of nubile virgins to 
sacrifice to make it work, particularly with the old resistor pack 
terminations  psu's whose 5 volt line is only 4.85 volts due to old age.  
That's what I call REAL scsi.  Its also a REAL PITA if the terms aren't 
active.


You can call what you are doing 'scsi' because you are using much the same 
command structure, and that is good, but its not the real thing with all its 
hardware warts and/or capabilities.  For one thing, this version usually 
works. :)


Furinstance, you can tell 2 scsi devices on the same controller to talk to 
each other, moving files from one to the other, and the host controller can 
then goto sleep  the cpu isn't involved until the devices send it a wakeup 
to advise the controller that the transfer has been done, and the controller 
may or may not then interrupt and advise the cpu.  You can do that with 
separate controllers too as long as they have a compatible DMA channel 
available to both.


I doubt libata has that capability now, or ever will, cuz these ide/atapi 
devices are generally dumber than rocks about that.  But any device claiming 
to be scsi-II is supposed to be able to do those sorts of things while the 
cpu is off crunching numbers for BOINC or whatever.


But that puts my mild objections to classifying this as 'scsi' in a more 
understandable context.  :-)


  


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-29 Thread Mark Lord

Gene Heskett wrote:


I doubt libata has that capability now, or ever will, cuz these ide/atapi 
devices are generally dumber than rocks about that.  But any device claiming 
to be scsi-II is supposed to be able to do those sorts of things while the 
cpu is off crunching numbers for BOINC or whatever.

..

The CD/DVD drives all all MMC devices internally, which means they speak
a SCSI command protocol.  Regardless of the electrical or optical interface.

Linux is software, and the software protocol is exactly the same for them,
no matter what the cable/bus type happens to be.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Gene Heskett wrote:
[    0.00] If you got timer trouble try acpi_use_timer_override
This is from the dmesg of my previous post.

Can anyone tell me what it actually means?


-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I have a simple rule in life: If I don't understand something, it must be bad.

- Linus Torvalds
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Peter Zijlstra wrote:
On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
 1. Wrong mailing list; use linux-ide (@vger) instead.

What, and keep all us other interested people in the dark?

As a test, I tried rebooting to the latest fedora kernel and found it kills X, 
so I'm back to the second to last fedora version ATM, and the 
third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first two 
completed with no errors.

I've added the linux-ide list to refresh those people of the problem, 
the logs are being spammed by this message stanza:

 Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 
35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out
Jan 28 04:46:25 coyote kernel: [26550.290029]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY }
Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link
Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100
Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: 
enabled, read cache: enabled, doesn't 
support DPO or FUA


And it just did it again, using the fedora kernel but without logging 
anything at all when it froze.  In other words I had to reboot between 
the word list and the word to above.  So now I'm booted to 2.6.24-rc7.

Before it crashes again, here is the dmesg:
[0.00] Linux version 2.6.24-rc7 ([EMAIL PROTECTED]) (gcc version 4.1.2 
20070925 (Red Hat 4.1.2-33)) #1 SMP 
Mon Jan 14 10:00:40 EST 2008
[0.00] BIOS-provided physical RAM map:
[0.00]  BIOS-e820:  - 0009f800 (usable)
[0.00]  BIOS-e820: 0009f800 - 000a (reserved)
[0.00]  BIOS-e820: 000f - 0010 (reserved)
[0.00]  BIOS-e820: 0010 - 3fff (usable)
[0.00]  BIOS-e820: 3fff - 3fff3000 (ACPI NVS)
[0.00]  BIOS-e820: 3fff3000 - 4000 (ACPI data)
[0.00]  BIOS-e820: fec0 - fec01000 (reserved)
[0.00]  BIOS-e820: fee0 - fee01000 (reserved)
[0.00]  BIOS-e820:  - 0001 (reserved)
[0.00] 127MB HIGHMEM available.
[0.00] 896MB LOWMEM available.
[0.00] Entering add_active_range(0, 0, 262128) 0 entries of 256 used
[0.00] Zone PFN ranges:
[0.00]   DMA 0 - 4096
[0.00]   Normal   4096 -   229376
[0.00]   HighMem229376 -   262128
[0.00] Movable zone start PFN for each node
[0.00] early_node_map[1] active PFN ranges
[0.00] 0:0 -   262128
[0.00] On node 0 totalpages: 262128
[0.00]   DMA zone: 32 pages used for memmap
[0.00]   DMA zone: 0 pages reserved
[0.00]   DMA zone: 4064 pages, LIFO batch:0
[0.00]   Normal zone: 1760 pages used for memmap
[0.00]   Normal zone: 223520 pages, LIFO batch:31
[0.00]   HighMem zone: 255 pages used for memmap
[0.00]   HighMem zone: 32497 pages, LIFO batch:7
[0.00]   Movable zone: 0 pages used for memmap
[0.00] DMI 2.2 present.
[0.00] ACPI: RSDP 000F7220, 0014 (r0 Nvidia)
[0.00] ACPI: RSDT 3FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD  
  0)
[0.00] ACPI: FACP 3FFF3040, 0074 (r1 Nvidia AWRDACPI 42302E31 AWRD  
  0)
[0.00] ACPI: DSDT 3FFF30C0, 4CC4 (r1 NVIDIA AWRDACPI 1000 MSFT  
10E)
[0.00] ACPI: FACS 3FFF, 0040
[0.00] ACPI: APIC 3FFF7DC0, 006E (r1 Nvidia AWRDACPI 42302E31 AWRD  
  0)
[0.00] Nvidia board detected. Ignoring ACPI timer override.
[0.00] If you got timer trouble try acpi_use_timer_override
[0.00] ACPI: PM-Timer IO Port: 0x4008
[0.00] ACPI: Local APIC address 0xfee0
[0.00] ACPI: LAPIC (acpi_id[0x00] lapic_id[0x00] enabled)
[0.00] Processor #0 6:10 APIC version 16
[0.00] ACPI: LAPIC_NMI (acpi_id[0x00] high edge lint[0x1])
[0.00] ACPI: IOAPIC (id[0x02] address[0xfec0] gsi_base[0])
[0.00] IOAPIC[0]: apic_id 2, version 17, address 0xfec0, GSI 0-23
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[0.00] ACPI: BIOS IRQ0 pin2 override ignored.
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 14 global_irq 14 high edge)
[0.00] ACPI: INT_SRC_OVR (bus 0 bus_irq 15 

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Gene Heskett wrote:
On Monday 28 January 2008, Zan Lynx wrote:
On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote:
 On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
  On Monday 28 January 2008, Mikael Pettersson wrote:
  Unfortunately we also see:
[   48.285456] nvidia: module license 'NVIDIA' taints kernel.
[   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4]
- GSI 19 (level, high) - IRQ 20 [   48.550149] NVRM: loading
NVIDIA UNIX x86 Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
  
  We have no way of debugging that module, so please try 2.6.24 without
   it.
 
  Sorry, I can't do this and have a working machine.  The nv driver has
  suffered bit rot or something since the FC2 days when it COULD run a
  19 crt at 1600x1200, and will not drive this 20 wide screen lcd
  1680x1050 monitor at more than 800x600, which is absolutely butt ugly
  fuzzy, looking like a jpg compressed to 10%.  The system is not usable
  on a day to basis without the nvidia driver.

 You should probably give the nouveau[1] driver a try, if only for
 testing purposes; if you are running an NV4x (G6x or G7x) card in
 particular, it works a lot better than the nv driver for 2d support.

 1. http://nouveau.freedesktop.org/wiki/InstallNouveau

But nouveau is much less stable than nv.  For testing purposes, go with
stable.

I believe at this point, its moot.  I captured quite a few instances of that
error message while rebooting the last time, all of which occurred long
before I logged in and did a startx (I boot to runlevel 3 here), so the
kernel was NOT tainted at that point.  That dmesg has been posted and some
questions asked.

As this has gone on for a while, it seems to me that with 14,800 google hits
on this problem, Linus should call a halt until this is found and fixed. 
 But I'm not Linus.  I'm also locking up for 30 at a time,  probably ready
 for reboot #7 today.

I'm not sure why it won't run his screen though.  I can use nv to run a
1920x1200 laptop LCD.  It *is* dog slow (although nouveau was not any
better with a NV17 / 440-Go -- render support for AA fonts seems to be
missing), but it does work.

I've been trying to run a long selftest on that drive, but the constant 
reboots are fscking that up.  I have attached the last smartctl -a output, 
indicating that the test was aborted probably from all the resets that are 
being issued, the last one froze me for around 5 minutes but I haven't 
rebooted yet.  Its attached.  Can anyone see if there is actually anything 
wrong with the drive?  If a boot will last long enough for the -t long to 
complete, then it passes with no errors, but this was interrupted now for the 
3rd time.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Well begun is half done.
-- Aristotle
smartctl version 5.37 [i386-redhat-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar SE family
Device Model: WDC WD2000JB-00EVA0
Serial Number:WD-WMAEH2782398
Firmware Version: 15.05R15
User Capacity:200,049,647,616 bytes
Device is:In smartctl database [for details use: -P show]
ATA Version is:   6
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:Mon Jan 28 12:39:08 2008 EST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85)	Offline data collection activity
	was aborted by an interrupting command from host.
	Auto Offline Data Collection: Enabled.
Self-test execution status:  ( 249)	Self-test routine in progress...
	90% of test remaining.
Total time to complete Offline 
data collection: 		 (6942) seconds.
Offline data collection
capabilities: 			 (0x79) SMART execute Offline immediate.
	No Auto Offline data collection support.
	Suspend Offline collection upon new
	command.
	Offline surface scan supported.
	Self-test supported.
	Conveyance Self-test supported.
	Selective Self-test supported.
SMART capabilities:(0x0003)	Saves SMART data before entering
	power-saving mode.
	Supports SMART auto save timer.
Error logging capability:(0x01)	Error logging supported.
	No General Purpose Logging support.
Short self-test routine 
recommended polling time: 	 (   2) minutes.
Extended self-test routine
recommended polling time: 	 (  88) minutes.
Conveyance self-test routine
recommended polling time: 	 (   5) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED  

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Added Alan to CC: list.


[   30.703188] scsi0 : pata_amd
[   30.709313] scsi1 : pata_amd
[   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 14
[   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 15
[   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100
[   30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 
[   30.871629] ata1.00: configured for UDMA/100

..

Gene, please confirm with us that your primary/master hard drive (above)
is connected with an 80-wire UDMA cable, as opposed to the older 40-wire cables.


[   31.195305] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66
[   31.243813] ata2.01: ATA-7: MAXTOR STM3320620A, 3.AAE, max UDMA/100
[   31.243816] ata2.01: 625142448 sectors, multi 16: LBA48 
[   31.243825] ata2.00: limited to UDMA/33 due to 40-wire cable

[   31.417074] ata2.00: configured for UDMA/33
[   31.451769] ata2.01: configured for UDMA/100

..

That looks like an unrelated bug to me: the driver says 40-wire cable
but then goes and chooses UDMA/100 on one of the drives.

Alan?


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Gene Heskett wrote:

Greeting;

I had to reboot early this morning due to a freezeup, and I had a bunch of 
these in the messages log:
==
Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x2 frozen
Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd 
ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out
Jan 27 19:42:11 coyote kernel: [42461.915974]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY }
Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link
Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100
Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete
Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
===
That one showed up about 2 hours ago, so I expect I'll be locked up again 
before I've managed a 24 hour uptime.  This drive passed
a 'smartctl -t long /dev/sda' with flying colors after the reboot
this morning.

Two instances were logged after I had rebooted to 2.6.24 from 2.6.24-rc8:

Jan 24 20:46:33 coyote kernel: [0.00] Linux version 2.6.24 ([EMAIL 
PROTECTED]) (gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Thu Jan 24 
20:17:55 EST 2008

Jan 27 02:28:29 coyote kernel: [193207.445158] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 27 02:28:29 coyote kernel: [193207.445170] ata1.00: cmd 
35/00:08:f9:24:0a/00:00:17:00:00/e0 tag 0 dma 4096 out
Jan 27 02:28:29 coyote kernel: [193207.445172]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 02:28:29 coyote kernel: [193207.445175] ata1.00: status: { DRDY }
Jan 27 02:28:29 coyote kernel: [193207.445202] ata1: soft resetting link
Jan 27 02:28:29 coyote kernel: [193207.607384] ata1.00: configured for UDMA/100
Jan 27 02:28:29 coyote kernel: [193207.607399] ata1: EH complete
Jan 27 02:28:29 coyote kernel: [193207.609681] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 02:28:29 coyote kernel: [193207.619277] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 02:28:29 coyote kernel: [193207.649041] sd 0:0:0:0: [sda] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA
Jan 27 02:30:06 coyote kernel: [193304.336929] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 27 02:30:06 coyote kernel: [193304.336940] ata1.00: cmd 
ca/00:20:69:22:a6/00:00:00:00:00/e7 tag 0 dma 16384 out
Jan 27 02:30:06 coyote kernel: [193304.336942]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 02:30:06 coyote kernel: [193304.336945] ata1.00: status: { DRDY }
Jan 27 02:30:06 coyote kernel: [193304.336972] ata1: soft resetting link
Jan 27 02:30:06 coyote kernel: [193304.499210] ata1.00: configured for UDMA/100
Jan 27 02:30:06 coyote kernel: [193304.499226] ata1: EH complete
Jan 27 02:30:06 coyote kernel: [193304.499714] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 02:30:06 coyote kernel: [193304.499857] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 02:30:06 coyote kernel: [193304.502315] sd 0:0:0:0: [sda] Write cache: 
enabled, read cache: enabled, doesn't support DPO or FUA

None were logged during the time I was running an -rc7 or -rc8.

The previous hits on this resulted in the udma speed being downgraded till it 
was actually running in pio just before the freeze that required the hardware 
reset button.

I'll reboot to -rc8 right now and resume.  If its the drive, I should see it.
If not, then 2.6.24 is where I'll point the finger.

..

The only libata change I can see that could possibly affect your setup,
is this one here, which went in sometime between -rc7 and -final:

--- linux-2.6.24-rc7/drivers/ata/libata-eh.c2008-01-06 16:45:38.0 
-0500
+++ linux-2.6.24/drivers/ata/libata-eh.c2008-01-24 17:58:37.0 
-0500
@@ -1733,11 +1733,15 @@
  ehc-i.action = ~ATA_EH_PERDEV_MASK;
  }

-   /* consider speeding down */
+   /* propagate timeout to host link */
+   if ((all_err_mask  AC_ERR_TIMEOUT)  !ata_is_host_link(link))
+   ap-link.eh_context.i.err_mask |= AC_ERR_TIMEOUT;
+

It looks pretty innocent to me, though.
If you want to try reverting just that change
(comment out the two lines and rebuild),
then that might provide useful information here.

If -final is still b0rked even with those two lines changed back,
then I suspect you're just getting lucky when switching between
the -rc7/-rc8 kernel and the -final kernel.

Lucky in a bad way, that is.

The real test would be to rebuild the kernel without libata,
and *with* the old IDE 

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Richard Heck wrote:
I've recently seen this kind of error myself, under Fedora 8, using the

Fedora 2.6.23 kernels: I'd see a train of the same sort of error:
  Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
 [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma
 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]  res
 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)

usually associated with the optical drive, and then it seems as if the
whole SATA subsystem would lock up, and the machine then becomes
useless: I get journal commit errors if I'm lucky; if I'm not, it just
locks up. My system is also using the pata_amd driver.

I have not seen these sorts of errors with the 2.6.24 kernels.

Richard Heck

Unforch, this is my only bootable drive, and its raising hell with things, 
about 6 hardware reset initiated reboots so far today since 6:15 am.  If it 
persists I'll go see if Circuit City still has any pata drives left as this 
mobo won't boot from a sata card.

Gene Heskett wrote:
 On Monday 28 January 2008, Peter Zijlstra wrote:
 On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
 1. Wrong mailing list; use linux-ide (@vger) instead.

 What, and keep all us other interested people in the dark?

 As a test, I tried rebooting to the latest fedora kernel and found it
 kills X, so I'm back to the second to last fedora version ATM, and the
 third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
 two completed with no errors.

 I've added the linux-ide list to refresh those people of the problem,
 the logs are being spammed by this message stanza:

  Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
 0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
 [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma
 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]  res
 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25
 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25
 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25
 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28
 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
 Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968
 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel:
 [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25
 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled, read
 cache: enabled, doesn't support DPO or FUA


 And it just did it again, using the fedora kernel but without logging
 anything at all when it froze.  In other words I had to reboot between
 the word list and the word to above.  So now I'm booted to 2.6.24-rc7.

 Before it crashes again, here is the dmesg:
 [0.00] Linux version 2.6.24-rc7 ([EMAIL PROTECTED]) (gcc
 version 4.1.2 20070925 (Red Hat 4.1.2-33)) #1 SMP Mon Jan 14 10:00:40 EST
 2008
 [0.00] BIOS-provided physical RAM map:
 [0.00]  BIOS-e820:  - 0009f800 (usable)
 [0.00]  BIOS-e820: 0009f800 - 000a (reserved)
 [0.00]  BIOS-e820: 000f - 0010 (reserved)
 [0.00]  BIOS-e820: 0010 - 3fff (usable)
 [0.00]  BIOS-e820: 3fff - 3fff3000 (ACPI NVS)
 [0.00]  BIOS-e820: 3fff3000 - 4000 (ACPI data)
 [0.00]  BIOS-e820: fec0 - fec01000 (reserved)
 [0.00]  BIOS-e820: fee0 - fee01000 (reserved)
 [0.00]  BIOS-e820:  - 0001 (reserved)
 [0.00] 127MB HIGHMEM available.
 [0.00] 896MB LOWMEM available.
 [0.00] Entering add_active_range(0, 0, 262128) 0 entries of 256
 used [0.00] Zone PFN ranges:
 [0.00]   DMA 0 - 4096
 [0.00]   Normal   4096 -   229376
 [0.00]   HighMem229376 -   262128
 [0.00] Movable zone start PFN for each node
 [0.00] early_node_map[1] active PFN ranges
 [0.00] 0:0 -   262128
 [0.00] On node 0 totalpages: 262128
 [0.00]   DMA zone: 32 pages used for memmap
 [0.00]   DMA zone: 0 pages reserved
 [0.00]   DMA zone: 4064 pages, LIFO batch:0
 [0.00]   Normal zone: 1760 pages used for memmap
 [0.00]   Normal zone: 223520 pages, LIFO batch:31
 [0.00]   HighMem zone: 255 pages used for memmap
 [0.00]   HighMem zone: 32497 pages, LIFO batch:7
 [0.00]   Movable zone: 0 pages used for memmap
 [0.00] DMI 2.2 present.
 [0.00] ACPI: RSDP 000F7220, 0014 (r0 Nvidia)
 [0.00] ACPI: RSDT 3FFF3000, 002C (r1 Nvidia AWRDACPI 42302E31 AWRD

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Mikael Pettersson wrote:
Gene Heskett writes:
  On Monday 28 January 2008, Peter Zijlstra wrote:
  On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
   1. Wrong mailing list; use linux-ide (@vger) instead.
  
  What, and keep all us other interested people in the dark?
 
  As a test, I tried rebooting to the latest fedora kernel and found it
  kills X, so I'm back to the second to last fedora version ATM, and the
  third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
  two completed with no errors.
 
  I've added the linux-ide list to refresh those people of the problem,
  the logs are being spammed by this message stanza:
 
   Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
  0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
  [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma
  176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]  res
  40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25
  coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25
  coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25
  coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28
  04:46:25 coyote kernel: [26550.452318] ata1: EH complete
  Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968
  512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel:
  [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25
  coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled,
  read cache: enabled, doesn't support DPO or FUA

It's not obvious from this incomplete dmesg log what HW or driver
is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,

it should be pata_amd driving a WDC disk:
  [   30.702887] pata_amd :00:09.0: version 0.3.10
  [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
  [   30.703188] scsi0 : pata_amd
  [   30.709313] scsi1 : pata_amd
  [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000
  irq 14 [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma
  0xf008 irq 15 [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0,
  15.05R15, max UDMA/100 [   30.864756] ata1.00: 390721968 sectors, multi
  16: LBA48
  [   30.871629] ata1.00: configured for UDMA/100

Unfortunately we also see:
  [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
  [   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI
  19 (level, high) - IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
  Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007

We have no way of debugging that module, so please try 2.6.24 without it.

Sorry, I can't do this and have a working machine.  The nv driver has suffered 
bit rot or something since the FC2 days when it COULD run a 19 crt at 
1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at 
more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg 
compressed to 10%.  The system is not usable on a day to basis without the 
nvidia driver.

Fix the nv driver so it will run this screen at its native resolution and I'll 
be glad to run it even if it won't run google earth, which I do use from time 
to time.  Now, if in all the hits you can get from google on this, currently 
14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of 
the complainers are running nvidia drivers also, then I see a legit 
complaint.  Again, fix the nv driver so it will run my screen  I'll be glad 
to switch.  I can see the reason, sure, but the machine must be capable of 
doing its common day to day stuff, while using that driver, like running kde 
for kmail, and browsers that work.

If the problems persist, please try to capture a complete log from the
failing kernel -- the interesting bits are everything from initial boot
up to and including the first few errors. You may need to increase the
kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).

If by log you mean /var/log/messages, I have several megabytes of those.
If you mean a live dmesg capture taken right now, its attached. It contains 
several of these at the bottom.  I long ago made the kernel log buffer 
bigger, cuz it couldn't even show the start immediately after the boot, and 
even the dump to syslog was truncated.

There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.

That is what I was afraid of.  I've done some limited grepping in that branch 
of the kernel tree, and cannot seem to locate where this EH handler is being 
invoked from.

There is 2 lines of interest in the dmesg:

[0.00] Nvidia board detected. Ignoring ACPI timer override.
[0.00] If you got timer trouble try acpi_use_timer_override

But I have NDI what it means, kernel argument/xconfig option?

I've also done some googling, and it appears this problem is fairly 

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Zan Lynx

On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote:
 On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
  On Monday 28 January 2008, Mikael Pettersson wrote:
  Unfortunately we also see:
[   48.285456] nvidia: module license 'NVIDIA' taints kernel.
[   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI
19 (level, high) - IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
  
  We have no way of debugging that module, so please try 2.6.24 without it.
  
  Sorry, I can't do this and have a working machine.  The nv driver has 
  suffered 
  bit rot or something since the FC2 days when it COULD run a 19 crt at 
  1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at 
  more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg 
  compressed to 10%.  The system is not usable on a day to basis without the 
  nvidia driver.
 
 You should probably give the nouveau[1] driver a try, if only for
 testing purposes; if you are running an NV4x (G6x or G7x) card in
 particular, it works a lot better than the nv driver for 2d support.
 
 1. http://nouveau.freedesktop.org/wiki/InstallNouveau

But nouveau is much less stable than nv.  For testing purposes, go with
stable.

I'm not sure why it won't run his screen though.  I can use nv to run a
1920x1200 laptop LCD.  It *is* dog slow (although nouveau was not any
better with a NV17 / 440-Go -- render support for AA fonts seems to be
missing), but it does work.
-- 
Zan Lynx [EMAIL PROTECTED]


signature.asc
Description: This is a digitally signed message part


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Gene Heskett wrote:

On Monday 28 January 2008, Mark Lord wrote:
..

Another way is to use the make_bad_sector utility that
is included in the source tarball for hdparm-7.7, as follows:

  make_bad_sector --readback /dev/sda 474507


Apparently not in the rpm, darnit.

..

That's okay.  It should still be in the SRPM source file.
And it's a tiny download from sourceforge.net:

http://sourceforge.net/search/?type_of_search=softtype_of_search=softwords=hdparm

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Richard Heck

Daniel Barkalow wrote:
Can you switch back to old IDE to get your work done (and to make sure 
it's not a hardware issue that's developed recently)? 
I think it'd be really, REALLY helpful to a lot of people if you, or 
someone, could explain in moderate detail how this might be done. I 
tried doing it myself, but I'm not sufficiently expert at configuring 
kernels that I was ever able to figure out how to do it.


Obviously, the short version is: switch back to Fedora 6. But this kind 
of problem with libata---and yes, you're almost surely right that it's 
not one problem but lots---is sufficiently widespread that a Mini HOWTO, 
say, would be really welcome and, I'm guessing, widely used.


Richard

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Mark Lord wrote:

Gene Heskett wrote:

..
And so far no one has tried to comment on those 2 dmesg lines I've quoted a 
couple of times now, here's another:
[0.00] Nvidia board detected. Ignoring ACPI timer override.
[0.00] If you got timer trouble try acpi_use_timer_override
what the heck is that trying to tell me to do, in some sort of broken english?

..

I think it says this:

 If your system is misbehaving, then try adding the acpi_use_timer_override
  keyword to your kernel command line (/boot/grub/menu.lst) and see if it 
helps.

So, you can either hardcode it in /boot/grub/menu.lst (just add it to the end
of the first line you see there that begins with the word kernel.

Or you can just try it temporarily at boot time (safer, but tricker),
by catching GRUB (the bootloader) before it actually loads Linux.

Usually there's some key or something it says you have 3 seconds to hit for a 
menu,
so do that, and then use the cursor keys to find the first kernel line in 
that menu
and hit e (edit) to go and add the acpi_use_timer_override keyword to the end 
of
that line (same as above).

..

Minor correction (having just tried it here):  once you see the GRUB (boot) 
menu,
hit the letter e to edit the first entry, then scroll to the kernel line,
and hit the letter e again to edit that line.  It should put you at the end of 
the
line, where you can just type a space and then acpi_use_timer_override and then
hit enter to finish the (temporary) edit.  Then hit b for boot.

-ml
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Alan Cox wrote:

On Mon, Jan 28, 2008 at 01:38:40PM -0500, Mark Lord wrote:

[   31.195305] ata2.00: ATAPI: LITE-ON DVDRW SHM-165H6S, HS06, max UDMA/66
[   31.243813] ata2.01: ATA-7: MAXTOR STM3320620A, 3.AAE, max UDMA/100
[   31.243816] ata2.01: 625142448 sectors, multi 16: LBA48 
[   31.243825] ata2.00: limited to UDMA/33 due to 40-wire cable

[   31.417074] ata2.00: configured for UDMA/33
[   31.451769] ata2.01: configured for UDMA/100

..

That looks like an unrelated bug to me: the driver says 40-wire cable
but then goes and chooses UDMA/100 on one of the drives.


We currently assume that
- If we have host side detecting 40 that we use 40
- If we have drive side detecting 40 use 40
- If we have drive side detecting 80 and host thinks 80 use 80

The case where the drives disagree isn't currently considered.

..

Ahh.  Tricky mess, that stuff.
I believe that if we have a drive that only sees 40W,
then it is probably best to restrict the other drive as well.

Just in case the drive that reports 40W cannot actually keep up
with the 80W timings, even when they're for the other drive.

That's my 2p.

Cheers
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Richard Heck wrote:

 Daniel Barkalow wrote:
  Can you switch back to old IDE to get your work done (and to make sure it's
  not a hardware issue that's developed recently)? 
 I think it'd be really, REALLY helpful to a lot of people if you, or someone,
 could explain in moderate detail how this might be done. I tried doing it
 myself, but I'm not sufficiently expert at configuring kernels that I was ever
 able to figure out how to do it.

As far as configuring the kernel, I can help:

Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that 
looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers, 
and turn off anything that's PATA and looks relevant.

(Whether a device uses IDE or PATA depends on which driver that supports 
the device is present and find it first, not on any sort of global 
configuration, which is probably what tripped you up)

Building this and installing it along with the appropriate initrd (which 
might be handled by Fedora's install scripts) will either get you back to 
old IDE or will make your kernel panic on boot, depending on whether you 
got it right (so make sure you can still boot the kernel you're sure of or 
something from a boot disk). This will also cause your hard drives to show 
up as different device nodes, so if your boot process doesn't mount by 
disk uuid but by some other feature (and I don't know what Fedora does), 
you'll also need to change it to something either stable across access 
methods or which works for the one you're now using.

 Obviously, the short version is: switch back to Fedora 6. But this kind of
 problem with libata---and yes, you're almost surely right that it's not one
 problem but lots---is sufficiently widespread that a Mini HOWTO, say, would be
 really welcome and, I'm guessing, widely used.

Fedora really ought to provide documentation, because there's some 
distro-specific stuff (like how you deal with the kernel's device node for 
the root partition changing), and they're using code by default that's at 
least somewhat documented as experimental (although it doesn't seem to be 
actually marked as experimental in all cases).

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Gene Heskett wrote:
While reading this msg as it came back, I locked up again and rebooted to 
2.6.24, and got lucky (maybe) as the attached dmesg will show quite a few 
instances of this LNNNGG before the nvidia driver is loaded to taint the 
kernel.  Have fun guys!
 
On Monday 28 January 2008, Mikael Pettersson wrote:
Gene Heskett writes:
  On Monday 28 January 2008, Peter Zijlstra wrote:
  On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
   1. Wrong mailing list; use linux-ide (@vger) instead.
  
  What, and keep all us other interested people in the dark?
 
  As a test, I tried rebooting to the latest fedora kernel and found it
  kills X, so I'm back to the second to last fedora version ATM, and the
  third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
  two completed with no errors.
 
  I've added the linux-ide list to refresh those people of the problem,
  the logs are being spammed by this message stanza:
 
   Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
  0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
  [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0
  dma 176128 out Jan 28 04:46:25 coyote kernel: [26550.290029] 
  res 40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28
  04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28
  04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link Jan 28
  04:46:25 coyote kernel: [26550.452301] ata1.00: configured for UDMA/100
  Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
  Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda]
  390721968 512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote
  kernel: [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28
  04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache:
  enabled, read cache: enabled, doesn't support DPO or FUA

It's not obvious from this incomplete dmesg log what HW or driver
is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,

it should be pata_amd driving a WDC disk:
  [   30.702887] pata_amd :00:09.0: version 0.3.10
  [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
  [   30.703188] scsi0 : pata_amd
  [   30.709313] scsi1 : pata_amd
  [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000
  irq 14 [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma
  0xf008 irq 15 [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0,
  15.05R15, max UDMA/100 [   30.864756] ata1.00: 390721968 sectors, multi
  16: LBA48
  [   30.871629] ata1.00: configured for UDMA/100

Unfortunately we also see:
  [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
  [   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI
  19 (level, high) - IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
  Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007

We have no way of debugging that module, so please try 2.6.24 without it.

Sorry, I can't do this and have a working machine.  The nv driver has
 suffered bit rot or something since the FC2 days when it COULD run a 19
 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050
 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking
 like a jpg compressed to 10%.  The system is not usable on a day to basis
 without the nvidia driver.

Fix the nv driver so it will run this screen at its native resolution and
 I'll be glad to run it even if it won't run google earth, which I do use
 from time to time.  Now, if in all the hits you can get from google on
 this, currently 14,800 just for 'exception Emask', apparently caused by a
 timeout, if 100% of the complainers are running nvidia drivers also, then I
 see a legit complaint.  Again, fix the nv driver so it will run my screen 
 I'll be glad to switch.  I can see the reason, sure, but the machine must
 be capable of doing its common day to day stuff, while using that driver,
 like running kde for kmail, and browsers that work.

If the problems persist, please try to capture a complete log from the
failing kernel -- the interesting bits are everything from initial boot
up to and including the first few errors. You may need to increase the
kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).

If by log you mean /var/log/messages, I have several megabytes of those.
If you mean a live dmesg capture taken right now, its attached. It contains
several of these at the bottom.  I long ago made the kernel log buffer
bigger, cuz it couldn't even show the start immediately after the boot, and
even the dump to syslog was truncated.

There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.

That is what I was afraid of.  I've done some limited grepping in that
 branch of the kernel tree, and cannot seem to locate where this EH handler
 is being invoked from.

There is 2 

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mikael Pettersson
Gene Heskett writes:
  On Monday 28 January 2008, Peter Zijlstra wrote:
  On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
   1. Wrong mailing list; use linux-ide (@vger) instead.
  
  What, and keep all us other interested people in the dark?
  
  As a test, I tried rebooting to the latest fedora kernel and found it kills 
  X, 
  so I'm back to the second to last fedora version ATM, and the 
  third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first two 
  completed with no errors.
  
  I've added the linux-ide list to refresh those people of the problem, 
  the logs are being spammed by this message stanza:
  
   Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask 0x0 
  SAct 0x0 SErr 0x0 action 0x2 frozen
  Jan 28 04:46:25 coyote kernel: [26550.290028] ata1.00: cmd 
  35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma 176128 out
  Jan 28 04:46:25 coyote kernel: [26550.290029]  res 
  40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
  Jan 28 04:46:25 coyote kernel: [26550.290032] ata1.00: status: { DRDY }
  Jan 28 04:46:25 coyote kernel: [26550.290060] ata1: soft resetting link
  Jan 28 04:46:25 coyote kernel: [26550.452301] ata1.00: configured for 
  UDMA/100
  Jan 28 04:46:25 coyote kernel: [26550.452318] ata1: EH complete
  Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968 
  512-byte hardware sectors (200050 MB)
  Jan 28 04:46:25 coyote kernel: [26550.456151] sd 0:0:0:0: [sda] Write 
  Protect is off
  Jan 28 04:46:25 coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: 
  enabled, read cache: enabled, doesn't 
  support DPO or FUA

It's not obvious from this incomplete dmesg log what HW or driver
is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,
it should be pata_amd driving a WDC disk:

  [   30.702887] pata_amd :00:09.0: version 0.3.10
  [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
  [   30.703188] scsi0 : pata_amd
  [   30.709313] scsi1 : pata_amd
  [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000 irq 
  14
  [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma 0xf008 irq 
  15
  [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0, 15.05R15, max UDMA/100
  [   30.864756] ata1.00: 390721968 sectors, multi 16: LBA48 
  [   30.871629] ata1.00: configured for UDMA/100

Unfortunately we also see:

  [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
  [   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI 19 
  (level, high) - IRQ 20
  [   48.550149] NVRM: loading NVIDIA UNIX x86 Kernel Module  169.07  Thu Dec 
  13 18:42:56 PST 2007

We have no way of debugging that module, so please try 2.6.24 without it.
If the problems persist, please try to capture a complete log from the
failing kernel -- the interesting bits are everything from initial boot
up to and including the first few errors. You may need to increase the
kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).

There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Jeff Garzik

Gene Heskett wrote:

Greeting;

I had to reboot early this morning due to a freezeup, and I had a 
bunch of these in the messages log:

==
Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0 SAct 
0x0 SErr 0x0 action 0x2 frozen
Jan 27 19:42:11 coyote kernel: [42461.915973] ata1.00: cmd 
ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma 4096 out
Jan 27 19:42:11 coyote kernel: [42461.915974]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 19:42:11 coyote kernel: [42461.915978] ata1.00: status: { DRDY }
Jan 27 19:42:11 coyote kernel: [42461.916005] ata1: soft resetting link
Jan 27 19:42:12 coyote kernel: [42462.078216] ata1.00: configured for UDMA/100
Jan 27 19:42:12 coyote kernel: [42462.078232] ata1: EH complete
Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 19:42:12 coyote kernel: [42462.114230] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 19:42:12 coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

===
That one showed up about 2 hours ago, so I expect I'll be locked 
up again before I've managed a 24 hour uptime.  This drive passed

a 'smartctl -t long /dev/sda' with flying colors after the reboot
this morning.

Two instances were logged after I had rebooted to 2.6.24 from 2.6.24-rc8:

Jan 24 20:46:33 coyote kernel: [0.00] Linux version 2.6.24 ([EMAIL PROTECTED]) (gcc version 4.1.2 20070925 
(Red Hat 4.1.2-33)) #1 SMP Thu Jan 24 20:17:55 EST 2008


Jan 27 02:28:29 coyote kernel: [193207.445158] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 27 02:28:29 coyote kernel: [193207.445170] ata1.00: cmd 
35/00:08:f9:24:0a/00:00:17:00:00/e0 tag 0 dma 4096 out
Jan 27 02:28:29 coyote kernel: [193207.445172]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 02:28:29 coyote kernel: [193207.445175] ata1.00: status: { DRDY }
Jan 27 02:28:29 coyote kernel: [193207.445202] ata1: soft resetting link
Jan 27 02:28:29 coyote kernel: [193207.607384] ata1.00: configured for UDMA/100
Jan 27 02:28:29 coyote kernel: [193207.607399] ata1: EH complete
Jan 27 02:28:29 coyote kernel: [193207.609681] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 02:28:29 coyote kernel: [193207.619277] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 02:28:29 coyote kernel: [193207.649041] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA

Jan 27 02:30:06 coyote kernel: [193304.336929] ata1.00: exception Emask 0x0 
SAct 0x0 SErr 0x0 action 0x2 frozen
Jan 27 02:30:06 coyote kernel: [193304.336940] ata1.00: cmd 
ca/00:20:69:22:a6/00:00:00:00:00/e7 tag 0 dma 16384 out
Jan 27 02:30:06 coyote kernel: [193304.336942]  res 
40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Jan 27 02:30:06 coyote kernel: [193304.336945] ata1.00: status: { DRDY }
Jan 27 02:30:06 coyote kernel: [193304.336972] ata1: soft resetting link
Jan 27 02:30:06 coyote kernel: [193304.499210] ata1.00: configured for UDMA/100
Jan 27 02:30:06 coyote kernel: [193304.499226] ata1: EH complete
Jan 27 02:30:06 coyote kernel: [193304.499714] sd 0:0:0:0: [sda] 390721968 
512-byte hardware sectors (200050 MB)
Jan 27 02:30:06 coyote kernel: [193304.499857] sd 0:0:0:0: [sda] Write Protect 
is off
Jan 27 02:30:06 coyote kernel: [193304.502315] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA


None were logged during the time I was running an -rc7 or -rc8.

The previous hits on this resulted in the udma speed being downgraded 
till it was actually running in pio just before the freeze that 
required the hardware reset button.


Unfortunately there are 1001 different causes for timeouts, so we need 
to drill down into the hardware, libata version, and ACPI version (most 
notably).




I'll reboot to -rc8 right now and resume.  If its the drive, I should see it.
If not, then 2.6.24 is where I'll point the finger.


There was also an ACPI update, which always affects interrupt handling 
(whose symptom can sometimes be a timeout).


Definitely interesting in test results from what you describe.

Jeff


-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Mark Lord wrote:
 [   64.037975] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
 [   64.038102] ata1.00: BMDMA stat 0x65
 [   64.038227] ata1.00: cmd c8/00:58:89:3d:07/00:00:00:00:00/e0 tag 0 dma
 45056 in [   64.038229]  res 51/40:58:8b:3d:07/00:00:00:00:00/e0
 Emask 0x9 (media error) [   64.038432] ata1.00: status: { DRDY ERR }
 [   64.038555] ata1.00: error: { UNC }
 [   64.050125] ata1.00: configured for UDMA/100
 [   64.050134] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
 [   64.050138] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
 [   64.050142] Descriptor sense data with sense descriptors (in hex):
 [   64.050143] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
 [   64.050149] 00 07 3d 8b
 [   64.050152] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4
 [   64.050155] end_request: I/O error, dev sda, sector 474507

..

This error looks somewhat different from the samples posted earlier.
This one is quite definitively a bad sector.

It should also show up in smartctl -a -data /dev/sda (near the bottom)
if SMART was enabled on this drive at boot.

It does not unforch.

You could try reading that specific sector again just to make sure.
One way is to figure out how to use dd for this.
[EMAIL PROTECTED] ~]# dd if=/dev/sda bs=512 skip=474506 count=3
��▒6
{�G���G���libkdecorations.so.1.0.0��c�®���J{�G���G���libkfontinst.so.0.0.0��c�®ʂ�GP�~GJ3G
 
6�7�8�#��z;{�G���G���libkhotkeys_shared.so.1.0.0��c�®���N{�G���G���libkickermain.so.1.0.0��c�®���Y{�G���G���libkonq.so.4.2.0��c�®���Z{�G���G���libkonqsidebarplugin.so.1.2.0��c�®���d{�G���G���libksgrd.so.1.2.0��c�®▒��G7
 
G▒�=G▒]��^���▒?e{�G���G���libksplashthemes.so.0.0.0��c�®{�G���G���libtaskbar.so.1.2.0��c�®{�G���G���libtaskmanager.so.1.0.0��c�®�3+0
 
records in
3+0 records out
1536 bytes (1.5 kB) copied, 6.1403e-05 s, 25.0 MB/s

Another way is to use the make_bad_sector utility that
is included in the source tarball for hdparm-7.7, as follows:

   make_bad_sector --readback /dev/sda 474507

Apparently not in the rpm, darnit.

(when invoked as above, it does *not* make a bad sector; no worries).

If it reports an I/O error consistently on that, then the sector is
indeed faulty, and it's contents have long been lost.

You can repair the bad sector (but not the original contents) like this:

   make_bad_sector --rewrite /dev/sda 474507

Cheers

I'm going up to Clarksburg this afternoon to see if I can find a couple of 
drives, one a 2.5 bigger than 40Gb for my 2.5 maxtor usb housing, and 
another pata drive big enough to run this thing  just re-install the 
December respin after I save as much of this as I can, there's nearly 50GB 
here now.

Maybe it won't be so fscking picky about the next drive.

I was hoping someone could look at that last dmseg I attached, but apparently 
everybody is blinded by unrelated details as that bad sector may have been 
transient, caused by the multiple hardware reset type reboots so far today :(

The last 3 reboots have interrupted a 'smartctl -t long /dev/sda' in 
progress. :(

If I reconvert to non libata, can I do that only for the pata drives of which 
there are 3 here including the dvd writer, and still use libata for the lone 
sata drive left?

And can I do that without mucking with the device map, which will make 
amanda/tar attempt to do a level 0 on the whole system if its changed.  I see 
the drives are at 254 again, when are they going to be given a stable device 
address out of the LANANA experimental group so we can reboot without mucking 
with that and driving tar crazy?

Thanks everybody.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
I just had my entire INTESTINAL TRACT coated with TEFLON!
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

[   64.037975] ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
[   64.038102] ata1.00: BMDMA stat 0x65
[   64.038227] ata1.00: cmd c8/00:58:89:3d:07/00:00:00:00:00/e0 tag 0 dma 45056 
in
[   64.038229]  res 51/40:58:8b:3d:07/00:00:00:00:00/e0 Emask 0x9 
(media error)
[   64.038432] ata1.00: status: { DRDY ERR }
[   64.038555] ata1.00: error: { UNC }
[   64.050125] ata1.00: configured for UDMA/100
[   64.050134] sd 0:0:0:0: [sda] Result: hostbyte=0x00 driverbyte=0x08
[   64.050138] sd 0:0:0:0: [sda] Sense Key : 0x3 [current] [descriptor]
[   64.050142] Descriptor sense data with sense descriptors (in hex):
[   64.050143] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 
[   64.050149] 00 07 3d 8b 
[   64.050152] sd 0:0:0:0: [sda] ASC=0x11 ASCQ=0x4

[   64.050155] end_request: I/O error, dev sda, sector 474507

..

This error looks somewhat different from the samples posted earlier.
This one is quite definitively a bad sector.

It should also show up in smartctl -a -data /dev/sda (near the bottom)
if SMART was enabled on this drive at boot.

You could try reading that specific sector again just to make sure.
One way is to figure out how to use dd for this.
Another way is to use the make_bad_sector utility that
is included in the source tarball for hdparm-7.7, as follows:

  make_bad_sector --readback /dev/sda 474507

(when invoked as above, it does *not* make a bad sector; no worries).

If it reports an I/O error consistently on that, then the sector is
indeed faulty, and it's contents have long been lost.

You can repair the bad sector (but not the original contents) like this:

  make_bad_sector --rewrite /dev/sda 474507

Cheers

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Zan Lynx wrote:
On Mon, 2008-01-28 at 11:50 -0500, Calvin Walton wrote:
 On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
  On Monday 28 January 2008, Mikael Pettersson wrote:
  Unfortunately we also see:
[   48.285456] nvidia: module license 'NVIDIA' taints kernel.
[   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] -
GSI 19 (level, high) - IRQ 20 [   48.550149] NVRM: loading NVIDIA
UNIX x86 Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
  
  We have no way of debugging that module, so please try 2.6.24 without
   it.
 
  Sorry, I can't do this and have a working machine.  The nv driver has
  suffered bit rot or something since the FC2 days when it COULD run a 19
  crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050
  monitor at more than 800x600, which is absolutely butt ugly fuzzy,
  looking like a jpg compressed to 10%.  The system is not usable on a day
  to basis without the nvidia driver.

 You should probably give the nouveau[1] driver a try, if only for
 testing purposes; if you are running an NV4x (G6x or G7x) card in
 particular, it works a lot better than the nv driver for 2d support.

 1. http://nouveau.freedesktop.org/wiki/InstallNouveau

But nouveau is much less stable than nv.  For testing purposes, go with
stable.

I believe at this point, its moot.  I captured quite a few instances of that 
error message while rebooting the last time, all of which occurred long 
before I logged in and did a startx (I boot to runlevel 3 here), so the 
kernel was NOT tainted at that point.  That dmesg has been posted and some 
questions asked.

As this has gone on for a while, it seems to me that with 14,800 google hits 
on this problem, Linus should call a halt until this is found and fixed.  But 
I'm not Linus.  I'm also locking up for 30 at a time,  probably ready for 
reboot #7 today.

I'm not sure why it won't run his screen though.  I can use nv to run a
1920x1200 laptop LCD.  It *is* dog slow (although nouveau was not any
better with a NV17 / 440-Go -- render support for AA fonts seems to be
missing), but it does work.



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
There cannot be a crisis next week.  My schedule is already full.
-- Henry Kissinger
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 I believe at this point, its moot.  I captured quite a few instances of that 
 error message while rebooting the last time, all of which occurred long 
 before I logged in and did a startx (I boot to runlevel 3 here), so the 
 kernel was NOT tainted at that point.  That dmesg has been posted and some 
 questions asked.
 
 As this has gone on for a while, it seems to me that with 14,800 google hits 
 on this problem, Linus should call a halt until this is found and fixed.  But 
 I'm not Linus.  I'm also locking up for 30 at a time,  probably ready for 
 reboot #7 today.

Can you switch back to old IDE to get your work done (and to make sure 
it's not a hardware issue that's developed recently)? I believe libata is 
just a whole lot pickier about behavior than the IDE subsystem was, so 
it's more likely to complain about stuff, both for good reasons and when 
it shouldn't, and there are a slew of potential we have to accept that 
old PATA hardware does this bugs that all have the same symptom of we go 
into error handling when nothing is actually wrong, hence the vast 
quantity of hits. I think it's not exactly that it's a common problem as 
that it's a lot of problems that aren't very distinguishable.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Daniel Barkalow wrote:
On Mon, 28 Jan 2008, Richard Heck wrote:
 Daniel Barkalow wrote:
  Can you switch back to old IDE to get your work done (and to make sure
  it's not a hardware issue that's developed recently)?

 I think it'd be really, REALLY helpful to a lot of people if you, or
 someone, could explain in moderate detail how this might be done. I tried
 doing it myself, but I'm not sufficiently expert at configuring kernels
 that I was ever able to figure out how to do it.

As far as configuring the kernel, I can help:

Go to Device Drivers, ATA/ATAPI/MFM/RLL support, and turn on anything that
looks relevant; go to Device Drivers, Serial ATA and Parallel ATA drivers,
and turn off anything that's PATA and looks relevant.

Done.

(Whether a device uses IDE or PATA depends on which driver that supports
the device is present and find it first, not on any sort of global
configuration, which is probably what tripped you up)

Building this and installing it along with the appropriate initrd (which
might be handled by Fedora's install scripts)

Or mine, which I've been using for years.

will either get you back to 
old IDE or will make your kernel panic on boot, depending on whether you
got it right (so make sure you can still boot the kernel you're sure of or
something from a boot disk). This will also cause your hard drives to show
up as different device nodes, so if your boot process doesn't mount by
disk uuid but by some other feature (and I don't know what Fedora does),
you'll also need to change it to something either stable across access
methods or which works for the one you're now using.

It mounts by LABEL=.  All of it.

 Obviously, the short version is: switch back to Fedora 6. But this kind of
 problem with libata---and yes, you're almost surely right that it's not
 one problem but lots---is sufficiently widespread that a Mini HOWTO, say,
 would be really welcome and, I'm guessing, widely used.

Fedora really ought to provide documentation, because there's some
distro-specific stuff (like how you deal with the kernel's device node for
the root partition changing), and they're using code by default that's at
least somewhat documented as experimental (although it doesn't seem to be
actually marked as experimental in all cases).

Fedora is not the only people having trouble,  name a distro, its probably 
someplace in that 14,800 hit google returns.

   -Daniel
*This .sig left intentionally blank*

Thanks Daniel, try #1 is building now.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Those who do not understand Unix are condemned to reinvent it, poorly.
-- Henry Spencer
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 On Monday 28 January 2008, Daniel Barkalow wrote:
 Building this and installing it along with the appropriate initrd (which
 might be handled by Fedora's install scripts)
 
 Or mine, which I've been using for years.

You're ahead of a surprising number of people, including me, if you 
understand making initrds.

 will either get you back to 
 old IDE or will make your kernel panic on boot, depending on whether you
 got it right (so make sure you can still boot the kernel you're sure of or
 something from a boot disk). This will also cause your hard drives to show
 up as different device nodes, so if your boot process doesn't mount by
 disk uuid but by some other feature (and I don't know what Fedora does),
 you'll also need to change it to something either stable across access
 methods or which works for the one you're now using.
 
 It mounts by LABEL=.  All of it.

That'll save a huge amount of hassle. So long as you manage to get the 
right drivers included and the wrong drivers not included, you should be 
pretty much set.

 Fedora is not the only people having trouble,  name a distro, its probably 
 someplace in that 14,800 hit google returns.

Yeah, but they each may need different instructions, particularly if 
they're not mounting by label in general, or not mounting the root 
partition by label. That was the big hassle going the opposite direction. 
And the procedure is 4 lines to describe to somebody who knows how to 
build and install a new kernel for the distro, which is much shorter than 
the explanation of how you generally build and install a kernel. A real 
howto would have to explain where to get the distro's kernel sources and 
default configuration, for example.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Daniel Barkalow wrote:
On Mon, 28 Jan 2008, Gene Heskett wrote:
 On Monday 28 January 2008, Daniel Barkalow wrote:
 Building this and installing it along with the appropriate initrd (which
 might be handled by Fedora's install scripts)

 Or mine, which I've been using for years.

You're ahead of a surprising number of people, including me, if you
understand making initrds.

In my script, its one line:
mkinitrd -f initrd-$VER.img $VER  \

where $VER is the shell variable I edit to = the version number, located at 
the top of the script.

Unforch, its failing:
No module pata_amd found for kernel 2.6.24, aborting.

This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
on.  So something is still dependent on it.  I do have one sata drive, on an 
accessory card in the box, so I need the rest of the sata_sil and friends 
stuff.  Its my virtual tapes for amanda.  Also home built, the amanda 
security model cannot be successfully bent into the shape of an rpm.  They 
BTW are #2 on coverity's list of most secure software.

So I've rebuilt 2.6.24 as it originally was, and added the acpi timer line to 
the 2.6.24-rc8 stanza's kernel argument list.  It will boot one or the other 
when I next reboot.  Its been about 8 hours since the last error was logged, 
which is totally weirdsville to this old fart.  Phase of the moon maybe?  The 
visit to the sawbones to see about my heart?  They are going to fit me with a 
30 day recorder tomorrow, my skip a beat problem is getting worse.  The sort 
of stuff that goes with the 7nth decade I guess.  Officially, I'm wearing out 
me, too much sugar, too many times nearly electrocuted=shingles yadda 
yadda. :-)  Oh, and don't forget Arther, he moved in uninvited about 25 years 
ago too.  Those people that talk about the golden years?  They're full of 
excrement...

 will either get you back to
 old IDE or will make your kernel panic on boot, depending on whether you
 got it right (so make sure you can still boot the kernel you're sure of
  or something from a boot disk). This will also cause your hard drives to
  show up as different device nodes, so if your boot process doesn't mount
  by disk uuid but by some other feature (and I don't know what Fedora
  does), you'll also need to change it to something either stable across
  access methods or which works for the one you're now using.

 It mounts by LABEL=.  All of it.

That'll save a huge amount of hassle. So long as you manage to get the
right drivers included and the wrong drivers not included, you should be
pretty much set.

 Fedora is not the only people having trouble,  name a distro, its probably
 someplace in that 14,800 hit google returns.

Yeah, but they each may need different instructions, particularly if
they're not mounting by label in general, or not mounting the root
partition by label. That was the big hassle going the opposite direction.
And the procedure is 4 lines to describe to somebody who knows how to
build and install a new kernel for the distro, which is much shorter than
the explanation of how you generally build and install a kernel. A real
howto would have to explain where to get the distro's kernel sources and
default configuration, for example.

   -Daniel
*This .sig left intentionally blank*



-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
Never drink from your finger bowl -- it contains only water.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Daniel Barkalow
On Mon, 28 Jan 2008, Gene Heskett wrote:

 On Monday 28 January 2008, Daniel Barkalow wrote:
 On Mon, 28 Jan 2008, Gene Heskett wrote:
  On Monday 28 January 2008, Daniel Barkalow wrote:
  Building this and installing it along with the appropriate initrd (which
  might be handled by Fedora's install scripts)
 
  Or mine, which I've been using for years.
 
 You're ahead of a surprising number of people, including me, if you
 understand making initrds.
 
 In my script, its one line:
 mkinitrd -f initrd-$VER.img $VER  \
 
 where $VER is the shell variable I edit to = the version number, located at 
 the top of the script.
 
 Unforch, its failing:
 No module pata_amd found for kernel 2.6.24, aborting.
 
 This is with pata_amd turned off and its counterpart under ATA/RLL/etc turned 
 on.  So something is still dependent on it. 

That looks like something in the guts of the initrd; it probably thinks 
you need pata_amd and it's unhappy that you don't have it.

Actually, another thing to try is making the ATA/etc one be y and 
pata_amd be m. Most likely, this should lead to the ATA one claiming the 
drive before the module is loaded (but the module would be loaded later, 
to avoid upsetting the initrd); you should be able to tell from dmesg (or 
/dev, for that matter) which one got it, and I think built-in drivers will 
claim everything they can before an initrd gets loaded.

 I do have one sata drive, on an accessory card in the box, so I need the 
 rest of the sata_sil and friends stuff. 

Assuming it isn't picking up your hard drive, which it isn't, that 
shouldn't matter.

-Daniel
*This .sig left intentionally blank*
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Mark Lord

Gene Heskett wrote:
..

That's ok, dd seemed to do the job also.

..

The two programs operate entirely differently from each other,
so it may still be worth trying the make_bad_sector utility there.

dd goes through the regular kernel I/O calls,
whereas make_bad_sector sends raw ATA commands
directly (more or less) to the drive.

-ml
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Kasper Sandberg
On Mon, 2008-01-28 at 11:35 -0500, Gene Heskett wrote:
 On Monday 28 January 2008, Mikael Pettersson wrote:
 Gene Heskett writes:
   On Monday 28 January 2008, Peter Zijlstra wrote:
   On Mon, 2008-01-28 at 09:17 +0100, Mikael Pettersson wrote:
1. Wrong mailing list; use linux-ide (@vger) instead.
   
   What, and keep all us other interested people in the dark?
  
   As a test, I tried rebooting to the latest fedora kernel and found it
   kills X, so I'm back to the second to last fedora version ATM, and the
   third 'smartctl -t lng /dev/sda' in 24 hours is running now.  The first
   two completed with no errors.
  
   I've added the linux-ide list to refresh those people of the problem,
   the logs are being spammed by this message stanza:
  
Jan 28 04:46:25 coyote kernel: [26550.290016] ata1.00: exception Emask
   0x0 SAct 0x0 SErr 0x0 action 0x2 frozen Jan 28 04:46:25 coyote kernel:
   [26550.290028] ata1.00: cmd 35/00:58:c9:9c:0a/00:01:00:00:00/e0 tag 0 dma
   176128 out Jan 28 04:46:25 coyote kernel: [26550.290029]  res
   40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 28 04:46:25
   coyote kernel: [26550.290032] ata1.00: status: { DRDY } Jan 28 04:46:25
   coyote kernel: [26550.290060] ata1: soft resetting link Jan 28 04:46:25
   coyote kernel: [26550.452301] ata1.00: configured for UDMA/100 Jan 28
   04:46:25 coyote kernel: [26550.452318] ata1: EH complete
   Jan 28 04:46:25 coyote kernel: [26550.455898] sd 0:0:0:0: [sda] 390721968
   512-byte hardware sectors (200050 MB) Jan 28 04:46:25 coyote kernel:
   [26550.456151] sd 0:0:0:0: [sda] Write Protect is off Jan 28 04:46:25
   coyote kernel: [26550.456403] sd 0:0:0:0: [sda] Write cache: enabled,
   read cache: enabled, doesn't support DPO or FUA
 
 It's not obvious from this incomplete dmesg log what HW or driver
 is behind ata1, but if the 2.6.24-rc7 kernel matches the 2.6.24 one,
 
 it should be pata_amd driving a WDC disk:
   [   30.702887] pata_amd :00:09.0: version 0.3.10
   [   30.703052] PCI: Setting latency timer of device :00:09.0 to 64
   [   30.703188] scsi0 : pata_amd
   [   30.709313] scsi1 : pata_amd
   [   30.710076] ata1: PATA max UDMA/133 cmd 0x1f0 ctl 0x3f6 bmdma 0xf000
   irq 14 [   30.710079] ata2: PATA max UDMA/133 cmd 0x170 ctl 0x376 bmdma
   0xf008 irq 15 [   30.864753] ata1.00: ATA-6: WDC WD2000JB-00EVA0,
   15.05R15, max UDMA/100 [   30.864756] ata1.00: 390721968 sectors, multi
   16: LBA48
   [   30.871629] ata1.00: configured for UDMA/100
 
 Unfortunately we also see:
   [   48.285456] nvidia: module license 'NVIDIA' taints kernel.
   [   48.549725] ACPI: PCI Interrupt :02:00.0[A] - Link [APC4] - GSI
   19 (level, high) - IRQ 20 [   48.550149] NVRM: loading NVIDIA UNIX x86
   Kernel Module  169.07  Thu Dec 13 18:42:56 PST 2007
 
 We have no way of debugging that module, so please try 2.6.24 without it.
 
 Sorry, I can't do this and have a working machine.  The nv driver has 
 suffered 
 bit rot or something since the FC2 days when it COULD run a 19 crt at 
 1600x1200, and will not drive this 20 wide screen lcd 1680x1050 monitor at 
 more than 800x600, which is absolutely butt ugly fuzzy, looking like a jpg 
 compressed to 10%.  The system is not usable on a day to basis without the 
 nvidia driver.
 
 Fix the nv driver so it will run this screen at its native resolution and 
 I'll 
 be glad to run it even if it won't run google earth, which I do use from time 
 to time.  Now, if in all the hits you can get from google on this, currently 
 14,800 just for 'exception Emask', apparently caused by a timeout, if 100% of 
 the complainers are running nvidia drivers also, then I see a legit 
I can invalidate this theory...
i helped a guy on irc debug this problem, and he had ati. I tried having
him stop using fglrx, and go to r300.. same problem, and same problem
even with vesa.. :)

also, i have this on my fileserver with .20, which doesent even run X,
or module support in kernel :)

 complaint.  Again, fix the nv driver so it will run my screen  I'll be glad 
 to switch.  I can see the reason, sure, but the machine must be capable of 
 doing its common day to day stuff, while using that driver, like running kde 
 for kmail, and browsers that work.
 
 If the problems persist, please try to capture a complete log from the
 failing kernel -- the interesting bits are everything from initial boot
 up to and including the first few errors. You may need to increase the
 kernel's log buffer size if the log gets truncated (CONFIG_LOG_BUF_SHIFT).
 
 If by log you mean /var/log/messages, I have several megabytes of those.
 If you mean a live dmesg capture taken right now, its attached. It contains 
 several of these at the bottom.  I long ago made the kernel log buffer 
 bigger, cuz it couldn't even show the start immediately after the boot, and 
 even the dump to syslog was truncated.
 
 There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.
 
 That is what I was afraid of.  I've done 

Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Gene Heskett
On Monday 28 January 2008, Kasper Sandberg wrote:
 [...]
 We have no way of debugging that module, so please try 2.6.24 without it.

 Sorry, I can't do this and have a working machine.  The nv driver has
 suffered bit rot or something since the FC2 days when it COULD run a 19
 crt at 1600x1200, and will not drive this 20 wide screen lcd 1680x1050
 monitor at more than 800x600, which is absolutely butt ugly fuzzy, looking
 like a jpg compressed to 10%.  The system is not usable on a day to basis
 without the nvidia driver.

 Fix the nv driver so it will run this screen at its native resolution and
 I'll be glad to run it even if it won't run google earth, which I do use
 from time to time.  Now, if in all the hits you can get from google on
 this, currently 14,800 just for 'exception Emask', apparently caused by a
 timeout, if 100% of the complainers are running nvidia drivers also, then
 I see a legit

I can invalidate this theory...
i helped a guy on irc debug this problem, and he had ati. I tried having
him stop using fglrx, and go to r300.. same problem, and same problem
even with vesa.. :)

No Kasper, you are validating it, that it is not nvidia related, which is what 
I was also saying.

also, i have this on my fileserver with .20, which doesent even run X,
or module support in kernel :)

That far back?  Although ISTR I saw it happen once only when I was running 
2.6.18-somethingorother.

 complaint.  Again, fix the nv driver so it will run my screen  I'll be
 glad to switch.  I can see the reason, sure, but the machine must be
 capable of doing its common day to day stuff, while using that driver,
 like running kde for kmail, and browsers that work.

 If the problems persist, please try to capture a complete log from the
 failing kernel -- the interesting bits are everything from initial boot
 up to and including the first few errors. You may need to increase the
 kernel's log buffer size if the log gets truncated
  (CONFIG_LOG_BUF_SHIFT).

 If by log you mean /var/log/messages, I have several megabytes of those.
 If you mean a live dmesg capture taken right now, its attached. It
 contains several of these at the bottom.  I long ago made the kernel log
 buffer bigger, cuz it couldn't even show the start immediately after the
 boot, and even the dump to syslog was truncated.

 There are no pata_amd changes from 2.6.24-rc7 to 2.6.24 final.

 That is what I was afraid of.  I've done some limited grepping in that
 branch of the kernel tree, and cannot seem to locate where this EH handler
 is being invoked from.

 There is 2 lines of interest in the dmesg:

 [0.00] Nvidia board detected. Ignoring ACPI timer override.
 [0.00] If you got timer trouble try acpi_use_timer_override

 But I have NDI what it means, kernel argument/xconfig option?

 I've also done some googling, and it appears this problem is fairly
 widespread since the switchover to libata was encouraged.  A stock fedora
 F8 kernel suffers the same freezes and eventually locks up, but does it
 without the error messages being logged, it just freezes, feeling
 identical to this in the minutes before the total freeze.  I've tried 2 of
 those too, but the newest one won't even run X.

-- 
Cheers, Gene
There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order.
-Ed Howdershelt (Author)
bureaucrat, n:
A politician who has tenure.
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Kasper Sandberg
On Mon, 2008-01-28 at 23:49 -0500, Gene Heskett wrote:
 On Monday 28 January 2008, Kasper Sandberg wrote:
  [...]
snip
 
 I can invalidate this theory...
 i helped a guy on irc debug this problem, and he had ati. I tried having
 him stop using fglrx, and go to r300.. same problem, and same problem
 even with vesa.. :)
 
 No Kasper, you are validating it, that it is not nvidia related, which is 
 what 
 I was also saying.
yeah thats what i mean - i can invalidate the theory that all the
affected boxes run nvidia.

 
 also, i have this on my fileserver with .20, which doesent even run X,
 or module support in kernel :)
 
 That far back?  Although ISTR I saw it happen once only when I was running 
 2.6.18-somethingorother.

Yes im afraid so.. i will now provide some complete details, as i feel
they are relevant.

the thing is, i run 6x300gb disks, IDE, in raid5.

i have both an onboard via ide controller, and then i bought a promise
pdc 202 new thingie. i had problem however..

after a bit of time, i would get DMA reset error thing, and it all
kindof went NUTS. it was as if all data access were skewed, and as you
might imagine, this made everything fail badly.

i purchased an ITE based controller for the drives on the promise, but
exactly the same thing happened.

the errors i got was:
hdf: dma_intr: bad DMA status (dma_stat=75)
hdf: dma_intr: status=0x50 { DriveReady SeekComplete }
ide: failed opcode was: unknown
---

i then found new hope, when i heard that libata provided much better
error handling, so i upgraded to .20.

this made my box usable.

the error happens once or twice a day, the disk led will turn on
constantly, and all IO freezes for about half a minute, where it returns
PROPERLY(thank you libata!). as far as i can tell, the only side effect
is that i get those messages like described here, and flooded with on
google.

to put some timeline perspective into this.
i believe it was in 2005 i assembled the system, and when i realized it
was faulty, on old ide driver, i stopped using it - that miht have been
in beginning of 2006. then for almost a year i werent using it, hoping
to somehow fix it, but in january 2007 i think it was, atleast in the
very beginning of 2007, i hit upon the idea of trying libata, and ever
since the system has been running 24/7 - doing these errors around 2
times a day.

i have multiple times reported my problems to lkml, but nothing has
happened, i also tried to aproeach jgarzik direcly, but he was not
interested.

i really hope this can be solved now, its a huge problem

my fileserver has an asus k8v motherboard, with via chipset (k8t880 i
think it is, or something like it). currently using the promise
controller again(strangely enough all the timeouts seems to happen here,
and when the ITE was on, there, not the onboard one), in conjunction
with the onboard via.


  complaint.  Again, fix the nv driver so it will run my screen  I'll be
snip

-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Michal Jaegermann
On Mon, Jan 28, 2008 at 08:31:57PM -0500, Gene Heskett wrote:
 
 In my script, its one line:
 mkinitrd -f initrd-$VER.img $VER  \
 
 where $VER is the shell variable I edit to = the version number, located at 
 the top of the script.
 
 Unforch, its failing:
 No module pata_amd found for kernel 2.6.24, aborting.

mkinitrd is just a shell script.  Even if its options, and there is
a quite a number of these, do not allow to influence a choice of
modules in a desired manner, it is pretty trivial to make yourself a
custom version of it and just hardwire there a fixed list of modules
to use instead of relying on general mechanisms which are trying
hard to guess what you may need.

That way your regular 'mkinitrd' will build something to boot with
libata and 'mkinird.ide' will use IDE modules for that purpose using
the same core kernel.

If you are using distribution kernels, as opposed to your own
configuration, it is quite likely that you will need to install
'kernel-devel' package and recompile and add required IDE modules
yourself as those may be not provided.  This is done the same way
like for any other external module.

   Michal
-
To unsubscribe from this list: send the line unsubscribe linux-ide in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Problem with ata layer in 2.6.24

2008-01-28 Thread Florian Attenberger
On Mon, 28 Jan 2008 14:13:21 -0500
Gene Heskett [EMAIL PROTECTED] wrote:


  I had to reboot early this morning due to a freezeup, and I had a
  bunch of these in the messages log:
  ==
  Jan 27 19:42:11 coyote kernel: [42461.915961] ata1.00: exception Emask 0x0
  SAct 0x0 SErr 0x0 action 0x2 frozen Jan 27 19:42:11 coyote kernel:
  [42461.915973] ata1.00: cmd ca/00:08:b1:66:46/00:00:00:00:00/e8 tag 0 dma
  4096 out Jan 27 19:42:11 coyote kernel: [42461.915974]  res
  40/00:01:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) Jan 27 19:42:11
  coyote kernel: [42461.915978] ata1.00: status: { DRDY } Jan 27 19:42:11
  coyote kernel: [42461.916005] ata1: soft resetting link Jan 27 19:42:12
  coyote kernel: [42462.078216] ata1.00: configured for UDMA/100 Jan 27
  19:42:12 coyote kernel: [42462.078232] ata1: EH complete
  Jan 27 19:42:12 coyote kernel: [42462.090700] sd 0:0:0:0: [sda] 390721968
  512-byte hardware sectors (200050 MB) Jan 27 19:42:12 coyote kernel:
  [42462.114230] sd 0:0:0:0: [sda] Write Protect is off Jan 27 19:42:12
  coyote kernel: [42462.115079] sd 0:0:0:0: [sda] Write cache: enabled, read
  cache: enabled, doesn't support DPO or FUA
  ===


I had this error too, or maybe only a similar one, and another, neither
of which of i still have the error output laying around, so I'm posting both
fixes, that i found here on lkml:
1) disabling ncq like that:
echo 1  /sys/block/sda/device/queue_depth 
2) this patch: libata_drain_fifo_on_stuck_drq_hsm.patch 
( applies to 2.6.24 too )

Signed-off-by: Mark Lord [EMAIL PROTECTED]
---

--- old/drivers/ata/libata-sff.c2007-09-28 09:29:22.0 -0400
+++ linux/drivers/ata/libata-sff.c  2007-09-28 09:39:44.0 -0400
@@ -420,6 +420,28 @@
ap-ops-irq_on(ap);
 }
 
+static void ata_drain_fifo(struct ata_port *ap, struct ata_queued_cmd *qc)
+{
+   u8 stat = ata_chk_status(ap);
+   /*
+* Try to clear stuck DRQ if necessary,
+* by reading/discarding up to two sectors worth of data.
+*/
+   if ((stat  ATA_DRQ)  (!qc || qc-dma_dir != DMA_TO_DEVICE)) {
+   unsigned int i;
+   unsigned int limit = qc ? qc-sect_size : ATA_SECT_SIZE;
+
+   printk(KERN_WARNING Draining up to %u words from data FIFO.\n,
+   limit);
+   for (i = 0; i  limit ; ++i) {
+   ioread16(ap-ioaddr.data_addr);
+   if (!(ata_chk_status(ap)  ATA_DRQ))
+   break;
+   }
+   printk(KERN_WARNING Drained %u/%u words.\n, i, limit);
+   }
+}
+
 /**
  * ata_bmdma_drive_eh - Perform EH with given methods for BMDMA controller
  * @ap: port to handle error for
@@ -476,7 +498,7 @@
}
 
ata_altstatus(ap);
-   ata_chk_status(ap);
+   ata_drain_fifo(ap, qc);
ap-ops-irq_clear(ap);
 
spin_unlock_irqrestore(ap-lock, flags);
-





-- 
Florian Attenberger [EMAIL PROTECTED]


pgpaqRPEbjtUv.pgp
Description: PGP signature