Launchpad has imported 28 comments from the remote bug at https://bugzilla.redhat.com/show_bug.cgi?id=465838.
If you reply to an imported comment from within Launchpad, your comment will be sent to the remote bug automatically. Read more about Launchpad's inter-bugtracker facilities at https://help.launchpad.net/InterBugTracking. ------------------------------------------------------------------------ On 2008-10-06T17:22:32+00:00 Johan wrote: Description of problem: After a few minutes (varies <1 - ~10minutes) the IDE (PATA) drive totaly 100% stops responding. dmesg shows timeouts and retries. Processes goes into D states when doing anything requiring disk activity. Version-Release number of selected component (if applicable): 2.6.27-0.392.rc8.git7.fc10.x86_64 bad 2.6.27-0.391.rc8.git7.fc10.x86_64 bad 2.6.27-0.382.rc8.git4.fc10.x86_64 bad 2.6.27-0.354.rc7.git3.fc10.x86_64 good How reproducible: 100% Steps to Reproduce: Cannot find pattern. With or without gui, disk activity, CPU pressure. It happens after a seemingly random number of minutes (<1 - ~10minutes). Actual results: A non-working system. Expected results: A working system. Additional info: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1.01: qc timeout (cmd 0x27) ata1.01: failed to read native max address (err_mask=0x4) ata1.01: HPA support seems broken, skipping HPA handling ata1.01: revalidation failed (errno=-5) ata1: soft resetting link ata1: nv_mode_filter: 0x1f01f&0x1f01f->0x1f01f, BIOS=0x1f000 (0xc5c60000) ACPI=0x1f01f (30:20:0x15) ata1: nv_mode_filter: 0x3f01f&0x3f01f->0x3f01f, BIOS=0x3f000 (0xc5c60000) ACPI=0x3f01f (30:20:0x15) ata1.00: configured for UDMA/66 ata1.01: configured for UDMA/100 ata1: EH complete ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata1.00: cmd a0/00:00:00:00:00/00:00:00:00:00/a0 tag 0 cdb 1e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 40/00:02:00:08:00/00:00:00:00:00/a0 Emask 0x4 (timeout) ata1.00: status: { DRDY } ata1: soft resetting link ata1: nv_mode_filter: 0x1f01f&0x1f01f->0x1f01f, BIOS=0x1f000 (0xc5c60000) ACPI=0x1f01f (30:20:0x15) ata1: nv_mode_filter: 0x3f01f&0x3f01f->0x3f01f, BIOS=0x3f000 (0xc5c60000) ACPI=0x3f01f (30:20:0x15) ata1.00: configured for UDMA/66 ata1.01: configured for UDMA/100 ata1: EH complete Repeats with UDMA/44, UDMA/33, PIO4, PIO3, PIO0. Again and again. See file for a few more. A bit hard to capture after a while as most everything starts going into D states as they apparently does something requiring disk access. Things more or less identical to the above keeps repeating. Some slightly different stuff after a while, copied by hand to another computer: SR0: cdrom (IOCTL) ERROR, COMMAND: GET EVENT STATUS NOTIFICATION 4A 01 00 00 10 00 00 00 08 00 ... sr 0:0:0:0: ioctl_internal_command return code = 8000002 : Sense Key : Aborted Command [current] [descriptor] : Add. Sense: No additional sense information ... sd 0:0:1:0: [sda] Result: hostbyte=DID_OK driverrbyte=DRIVER_SENSE,SUGGEST_OK sd 0:0:1:0: [sda] Sense Key : Aborted Command [current] [descriptor] Descriptor sense data with sense descriptors (in hex): 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00 00 00 00 00 sd 0:0:1:0: [sda] Add. Sense: No additional sense information end_request: I/O error, dev sda, sector 519537 Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/0 ------------------------------------------------------------------------ On 2008-10-06T17:23:25+00:00 Johan wrote: Created attachment 319573 Boot dmesg of 2.6.27-0.392.rc8.git7.fc10.x86_64 (bad) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/1 ------------------------------------------------------------------------ On 2008-10-06T17:24:15+00:00 Johan wrote: Created attachment 319574 Boot dmesg of 2.6.27-0.354.rc7.git3.fc10.x86_64 (good) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/2 ------------------------------------------------------------------------ On 2008-10-06T17:25:33+00:00 Johan wrote: Created attachment 319575 Boot dmesg of 2.6.27-0.382.rc8.git4.fc10.x86_64 (bad) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/3 ------------------------------------------------------------------------ On 2008-10-06T17:26:54+00:00 Johan wrote: Created attachment 319576 Some dmesg ata errors Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/4 ------------------------------------------------------------------------ On 2008-10-06T21:10:12+00:00 Johan wrote: Tried a few more kernels in between, and unfortunately (?) it seems the difference between a working and non-working kernel is if it includes debug code or not (with debug code = no bug). This includes latest kernel-debug (2.6.27-0.392.rc8.git7.fc10.x86_64.debug) which seems to be working, where the non-debug version bugs out within minutes. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/5 ------------------------------------------------------------------------ On 2008-10-08T18:41:15+00:00 Alan wrote: Looks like another stuck DRQ case - good news if so as I'm currently tesitng kernel changes to do DRQ data draining Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/6 ------------------------------------------------------------------------ On 2008-10-08T20:20:54+00:00 Johan wrote: Ok. Please advice if you need any further information or if there is anything that needs testing. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/7 ------------------------------------------------------------------------ On 2008-10-18T10:00:39+00:00 Johan wrote: 2.6.27-3.fc10.x86_64 -- bad (3 min uptime) 2.6.27-3.fc10.x86_64.debug -- good 2.6.27.2-23.rc1.fc10.x86_64 -- bad (7 min uptime) 2.6.27.2-23.rc1.fc10.x86_64.debug -- good Some older kernels 2.6.26.6-79.fc9.x86_64 -- bad (4 min uptime) 2.6.25.14-108.fc9.x86-64 -- bad (2 min uptime) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/8 ------------------------------------------------------------------------ On 2008-10-21T08:25:32+00:00 Alan wrote: If its predictably the case that only the debug kernels work after multiple tests (and I assume you've been running work debug kernels for a few days now ?) that points outside the ATA layer, could I suppose be timing but sounds almost like a compiler bug Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/9 ------------------------------------------------------------------------ On 2008-10-27T12:30:54+00:00 Johan wrote: (sorry for the delay) I have run with debug kernels for a number hours without problems (the machine otherwise runs windows from a sata drive). I've tried with a minimal .config kernel 2.6.28.rc2 latest git -- same result, though it survived bonnie++ and took all of 21 minutes before locking up. Same config with the debug options from fedora enabled seems to be working (1h+) though I'll test it some more. I'll look into trying different compiler. The rc2 test was with gcc Red Hat 4.3.2-6 and Ubuntu 4.3.2-1ubuntu11 in a distcc thing. Could it be hardware related? The HD in question is oldish -- rest of machine is new. Still, it seems 100% stable with those debug options turned on. Is there anything I can do to find out what is happening here? I can patch the kernel easily enough, or look into using kgdb, but I really have no idea what to look for. For the record: 2.6.28.4-47.rc3.fc10.x86_64 -- bad (3 min uptime) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/10 ------------------------------------------------------------------------ On 2008-10-27T15:55:01+00:00 Johan wrote: non-debug 2.6.28.rc2 kernel compiled with gcc Red Had 3.4.6-9 locked up after ~4 minutes uptime. debug 2.6.28.rc2 was still ok after 4h+ of hard testing. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/11 ------------------------------------------------------------------------ On 2008-10-29T03:37:46+00:00 Johan wrote: New lockup, with a complete break in pattern: 1. 2.6.27.4-51.fc10.x86_64.debug (all debug kernels so far had worked) 2. SATA (sata_nv) dmraid (raid-0 nvidia ntfs ro) instead of main PATA-IDE ext3 (which kept working) Lockup was for ~15-20 minutes(?), then worked again for ~8 minutes, then locked up again. I cannot seem to get main drive to lock up like that with this kernel. I cannot tell if this is new to this kernel, I only recently set this up. (The dmraid did not activate out of the box). It locked up within minutes on this kernel after working for a few hours on a 28.rc2-git thing. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/12 ------------------------------------------------------------------------ On 2008-10-29T03:39:32+00:00 Johan wrote: Created attachment 321742 dmesg of 2.6.28.4-51.fc10.x86_64.debug (bad-sata) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/13 ------------------------------------------------------------------------ On 2008-11-26T03:36:40+00:00 Bug wrote: This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle. Changing version to '10'. More information and reason for this action is here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/14 ------------------------------------------------------------------------ On 2008-11-29T01:41:35+00:00 Joe wrote: I have had a similar problem since Fedora 8, as have a few others. Please see bug 440408 as well. These appear to be the same thing. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/15 ------------------------------------------------------------------------ On 2008-12-30T11:33:19+00:00 Jeff wrote: I also experience this problem since Fedora 8. I'm pleased it is getting some attention. The only way to get it working again is a reboot. Since Fedora 10 and DBus issue I now get this message from GUI: Unable to mount location Cannot invoke CheckForMedia on HAL: org.freedesktop.DBus.Error.NoReply: Did not receive a reply. Possible causes include: the remote application did not send a reply, the message bus security policy blocked the reply, the reply timeout expired, or the network connection was broken. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/21 ------------------------------------------------------------------------ On 2009-01-22T18:56:21+00:00 scott wrote: I am having the same problem. I have built a machine around an Asus P5N- EM motherboard. I am running a 32-bit kernel, not 64-bit. Originally, I had an IDE boot drive with Fedora 10 installed on it, and a spare SATA drive for server disk space. I had this install of F10 completely updated with the latest "yum update" but can't access that drive now to see what kernel version it had - it was whatever version was current this week (ca. Jan 20 - I worked on this problem off and on all week). The machine would run for a few minutes, and then the disk light would come on and stay on. The machine still ran, but disk I/O quit working. Anything in memory (cached, I guess) would still work - I could open xterm, and read files like "messages" that were not yet written to disk. But new commands that were not in cache would not work (said "Input/output error." at the shell prompt), and I could not ctl-alt-F8 to the text console and log in as root. Suspecting a hardware problem, I spent a lot of time running diagnostics like smartctrl and booting from Hiren's boot disk and running Seagate and Maxtor utilities. Every single disk diagnostic comes back clean. The problem is not in the IDE drive itself, or if it is, it's a problem that diagnostics can't find. Today, I unplugged the IDE drive, and put a base install of F10 on the SATA drive which has 2.6.27.5-117.fc10.i686. The machine no longer locks up while it is running. But it prints this message every 10-20 seconds: ata5.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen ata5.01: cmd a0/00:00:00:00:00/00:00:00:00:00/b0 tag 0 cdb 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 res 40/00:03:00:00:00/00:00:00:00:00/b0 Emask 0x4 (timeout) ata5.01: status: { DRDY } ata5: soft resetting link ata5: nv_mode_filter: 0x1&0x1f01f->0x1, BIOS=0x1f000 (0xc50000) ACPI=0x1f01f (600:30:0x1c) ata5.01: configured for PIO0 ata5: EH complete This machine was built back in November/December, and worked fine for a while - but I never had time to finish doing anything with it. This week, I booted and ran the "yum update" and it was the first time I noticed these problems. However, they could have been there all along, but it certainly didn't lock up as it was running. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/23 ------------------------------------------------------------------------ On 2009-01-27T17:09:28+00:00 scott wrote: I have installed CentOS 5.2 on this machine, and have no ATA errors like this. What bothers me is this is the third SATA bug I've encountered where a working system breaks for no apparent reason because of an upgrade. One was fixed, and the other two are open. I have a Blu-Ray burner that is a brick because of one of these errors. All three of these are situations where Fedora worked fine on the hardware, but an upgrade broke the existing system. The first one was a year or two ago, and was eventually fixed. But these two are open and inactive. I can't live with this any more, so this is the end of the line with Fedora for me. I have to run Fedora for some IBM software I develop with, but will try to see if I can get it to run on SuSE, or CentOS. I've used Red Hat since 5.2, but can't deal with these broken systems any longer. I could understand if Fedora had bleeding-edge new stuff that wasn't working. I don't have any problems with that. But I have problems with existing, working code suddenly breaking to the point the systems aren't usable. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/24 ------------------------------------------------------------------------ On 2009-04-05T18:49:49+00:00 Martin wrote: Hi to all, I'm having the same troubles ,as in comment's 14 16 17 18. For note 12 ,I now for "sure" ,that the main disk never gets locked that way. In my experiences ,the disk witch contain the '/' never gets involved in this .I say for "sure" because I do shuffle a lot with disks ,file systems ,and partitioning schemes . In my case ,the "normal" disks set-up is : sda ,sdb ,sdc = sata build-in HD's ata 5 = DVD-RW (Aopen) the '/' is on sdb , /tmp ;/var/tmp ;/var/spool ;/var/cache/yum ;/usr ;/usr/lib64 ;/usr/share ; and some other subdir's are all in there own partition divided over the 3 disks .All of this for flexibility ,performance (by using parallelism) etc. For now ,since the update 2009/02/23 ,disks sda and sdc are no longer locking as in message 17 . But ata5 (DVD) does . Because ,from time to time ,I also use other disk's and file-systems ,I'm "nearly sure" there is no hardware ore file system issue . Also noteworthy is that there is "something & somewhere" polling via the D-bus all of the time ,which slows down seriously other system functions. It seems the D-bus is occupied by this problem ,but I can't find a clue . OS = F10 x86_64 all in ext4 except /boot ,latest update yesterday. I also have a machine around an Asus P5VDC-MX motherboard. Is there a solution somehow ? Thanks a lot martin Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/27 ------------------------------------------------------------------------ On 2009-04-17T09:14:39+00:00 Stanislaw wrote: Only change related with ata between 2.6.27-0.392.rc8.git7.fc10 and 2.6.27-0.354.rc7.git3.fc10 is sata_nv hardreset commit: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commitdiff;h=4c1eb90a0908c0c60db2169dce08fb672e7582f1 It is know that the commit cause a problems, which where reported in two places: http://bugzilla.kernel.org/show_bug.cgi?id=12176 http://bugzilla.kernel.org/show_bug.cgi?id=11195 And fixed in two further commits: http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commit;h=2fd673ecf0378ddeeeb87b3605e50212e0c0ddc6 http://git.kernel.org/?p=linux/kernel/git/stable/linux-2.6.27.y.git;a=commitdiff;h=2da462eba7e5b585d54c17d76c6a662e4fbb3c32 So the bug should fixed in the newest updates of fedora kernel (2.6.27.21 based). Johan could you confirm that ? BTW: Johan in your dmesg are lot of messages like that: attempt to access beyond end of device sdc: rw=0, want=625160072, limit=312581808 Buffer I/O error on device sdc1, logical block 78144752 attempt to access beyond end of device sdc: rw=0, want=625160072, limit=312581808 Buffer I/O error on device sdc1, logical block 78144752 attempt to access beyond end of device It is serious problem which can cause data corruption. It can be something wrong it the software working on the top of ata devices (filesystem, device maper) or maybe in ata itself or a hardware problem (memory corruption, chipsets etc...) Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/28 ------------------------------------------------------------------------ On 2009-07-12T19:05:03+00:00 Jeff wrote: When will this problem get a resolution? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/29 ------------------------------------------------------------------------ On 2009-07-14T17:52:46+00:00 Stanislaw wrote: (In reply to comment #22) > When will this problem get a resolution? As base kernel version for fedora 10 and 11 is now 2.6.29, I believe this problem it is already solved. Jeff, can you reproduce this issue on fedora 10 or 11? Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/30 ------------------------------------------------------------------------ On 2009-07-15T18:27:13+00:00 Jeff wrote: I checked my version it is: uname -r 2.6.27.25-170.2.72.fc10.x86_64 I will upgrade the kernel and give update. Regards Jeff Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/31 ------------------------------------------------------------------------ On 2009-08-15T18:10:03+00:00 Jeff wrote: i have upgraded to F11, $ uname -r 2.6.29.6-217.2.3.fc11.x86_64 The CDrom and DVD appears to work for longer period of time before locking up. but it still locks up. In the past i would get DBUS error. Now, I get no error at all on eject or rescan. I can not eject the device manually from externally. Tell what logs or traces i can provide to help resolve this issue. I'm attempting to reboot to capture screen shot of working scenario. regards Jeff Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/32 ------------------------------------------------------------------------ On 2009-08-16T04:20:12+00:00 Jeff wrote: It appears that after firefox file download, and I perform a "open folder containing" if .ISOs are present they are automatically mounted. In my case about 15 ISOs. Then immediately the mplayer also runs. This kills the cdrom/DVD devices. Regards Jeff Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/33 ------------------------------------------------------------------------ On 2009-11-18T07:56:09+00:00 Bug wrote: This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/34 ------------------------------------------------------------------------ On 2009-12-18T06:31:07+00:00 Bug wrote: Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. Reply at: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/comments/35 ** Changed in: linux (Fedora) Importance: Unknown => High ** Bug watch added: Linux Kernel Bug Tracker #12176 https://bugzilla.kernel.org/show_bug.cgi?id=12176 ** Bug watch added: Linux Kernel Bug Tracker #11195 https://bugzilla.kernel.org/show_bug.cgi?id=11195 -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/309901 Title: SATA timeout causing soft lockup during heavy disk activity Status in linux package in Ubuntu: Won't Fix Status in linux package in Fedora: Won't Fix Bug description: Binary package hint: linux-source-2.6.27 During heavy disk activity, SATA drive will timeout and the system will stop responding. The system rarely recovers and normally needs a cold boot. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/309901/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp