[Bug 1036366] Re: software RAID arrays fail to start on boot
Please ignore this bug report. I now believe this problem was caused by a configuration error in my RAID setup. In particular, two different arrays had the same 'name' (I think this is the 'name' recorded in the array superblocks -- IMHO, 'name' has become an over-conflated term in the context of linux software-RAID). Apparently having duplicate names is not a good idea. I have no recollection of ever explicitly assigning these names; I think this was done automatically by mdadm, several versions ago (probably over a year ago). Since many bugs in mdadm have been fixed since then, we should probably assume that this issue has been fixed unless somebody reports similar symptoms again. I have fixed my system by re-creating one of the arrays without that duplicate name. I have now rebooted several times, and the symptoms have not recurred. If there is a bug here, it is that it is (or was?) possible to create two arrays with the same name, with no obvious warning given at the time. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1036366 Title: software RAID arrays fail to start on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1036366/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1036366] Re: software RAID arrays fail to start on boot
You're welcome, Dmitrijs. Now that this system is finally behaving itself (for the first time in the better part of a year!) I can now look at this properly functioning configuration and compare it with the previously broken one. It is becoming more clear what happened. (Note: all of the following are things I have deduced by reading a vast amount of material from different sources [and giving myself many headaches in the process], and some of it may not be an entirely accurate description of what's really happening.) The superblock contains a field called name. It's not like /dev/md1 or /dev/md/1 or /dev/md1p1 or anything like that. On my system, it's more like 5. As it happened I had two very different arrays (different array levels, sizes, etc.) that both had that name. When you run Disk Utility and select a RAID array, this is the Name displayed in the right pane; if it's empty, the pane shows Name: - but on my system two arrays showed Name: 5. I didn't choose this name; I think mdadm assigned it because each array happened to be mounted at /dev/md5 at the time it was created, and the two arrays were created at different times (of course these device names change arbitrarily whenever you boot). But of course it's more complicated than that. That's just part of the name; the superblock actually contains a 'fully qualified name' that is of the form hostname:name and Disk Utility only displays the last part of it. The hostname part is just the hostname at the time the array is created. My system has a long history. A year ago, its hostname was different, and one of the arrays was created then. After the system became unstable (when I upgraded to Oneiric and gained a particularly buggy version of mdadm) I stopped using it and backed all the data off. When Precise became available, I did a fresh install onto a non-RAID partition and left all the existing RAID partitions in place for testing purposes. Because I was no longer going to use this system as a file server, but as a test machine, I gave it a different hostname (precisetest). A bit later I added another array and mdadm assigned it the name 5, presumably because it was sitting at /dev/md5 at the time. I did not even notice this at first. Of course, the fully qualified names stored in the superblocks were actually different, having different hostnames on the front, so even though Disk Utility showed the name 5 on both arrays, they really had different full names. Although RAID was really messed up on this system, that was only a problem at boot time. After booting, I could go into Disk Utility and manually start all affected arrays. Once this was done, the system worked great, until the next reboot. RAID was working; I could access files on any array. I came to the (perhaps incorrect?) conclusion that these two arrays having the same name was not a problem. After all, I could look at mdadm.conf and see that the arrays really had different names (the fully qualified names are shown there). Now I am thinking that it is not sufficient that the fully qualified names be unique. I think the part of the name after the : has to be unique too, otherwise problems happen at boot, at least on Ubuntu Precise. But I don't think mdadm upstream intended it to be that way. So: some part of the boot process is getting hung up on these (apparently) duplicate names, because it is looking at just the short names instead of the fully qualified names. (In the udev scripts perhaps?) If that code looked at hostname:name instead of just name, perhaps this problem would disappear. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1036366 Title: software RAID arrays fail to start on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1036366/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 1036366] [NEW] software RAID arrays fail to start on boot
Public bug reported: Some software RAID arrays fail to start on boot. Exactly two of my arrays (but not always the same two!) do not start, on every single boot, and I have done 24 boots since I started taking detailed notes. Have been running Ubuntu 12.04 with latest updates. Two days ago I selectively upgraded mdadm to 3.2.5 from -proposed, as suggested in bug #942106; that upgrade helped some other people, but not me. Over the last few months, various updates in kernel and mdadm have resulted in great improvement of symptoms, but no complete cure so far. Note that the following symptoms once regularly occurred on this system, but have NOT occurred in the past few weeks: - Having to wait for a degraded array to resync - Having to manually re-attach a component (usually a spare) that had become detached - Having to drop to the command line to zero a superblock before reattaching a component - Having an array containing swap fail to start - Having to use anything other than Disk Utility to get arrays running properly again This system has six SATA drives on two controllers. It contains seven RAID arrays, including RAID 1, RAID 10, and RAID 6; all are listed in fstab. Some use 0.90.0 metadata and some use 1.2 metadata. The root filesystem is not on a RAID array (at least not any more; I got tired of that REAL fast) but everything else (including /boot and all swap) is on RAID. One array is used for /boot, two for swap, and the other four are just there for testing purposes. BOOT_DEGRADED is set. All partitions are GPT. Not using LUKS or LVM. All drives are 2TB and by various manufacturers, and I suspect some have 512B physical sectors and some have 2KB sectors. This is an AMD64 system with 8GB RAM. This system has had about four different versions of Ubuntu on it over the last few years, and has had multiple RAID arrays on it from the beginning. (This is why some of the arrays are still using 0.90.0 metadata, and why there are so many arrays; some arrays are old partitions containing root and home and such from earlier incarnations.) RAID worked fine until the system was upgraded to Oneiric early in 2012 (no, the problem did not start with Precise). I have carefully tested the system every time an updated kernel or mdadm has appeared, ever since the problem started. The behavior has gradually improved over the last several months. This latest proposed version of mdadm (3.2.5), thankfully, did not result in regressions, but also did not result in significant improvement on this system; have rebooted five times since then and the behavior is consistent. When the problem first started, on Oneiric, I had the root file system on RAID. This was unpleasant. I stopped using the system for a while, as I had another one running Maverick, which was reliable. When I noticed some discussion of possibly related bugs on the Linux RAID list (I've been lurking there for years) I decided to test the system some more. By then Precise was out, so I upgraded. That did not help. Eventually I backed up all data onto another system and did a clean install of Precise on a non-RAID partition, which made the system tolerable. I left /boot on a RAID1 array (on all six drives), but that does not prevent the system from booting even if /boot does not start during Ubuntu startup (I assume because GRUB can find /boot even if Ubuntu later can't). I started taking detailed notes in May (seven cramped pages so far). Have rebooted 24 times since then. On every boot, exactly two arrays did not start. Which arrays they were, varied from boot to boot; could be any of the arrays (but recently, swap arrays are not affected). No apparent correlation with metadata type or RAID level. ProblemType: Bug DistroRelease: Ubuntu 12.04 Package: mdadm 3.2.5-1ubuntu0.2 ProcVersionSignature: Ubuntu 3.2.0-29.46-generic 3.2.24 Uname: Linux 3.2.0-29-generic x86_64 ApportVersion: 2.0.1-0ubuntu12 Architecture: amd64 Date: Mon Aug 13 12:10:36 2012 InstallationMedia: Ubuntu 12.04 LTS Precise Pangolin - Release amd64 (20120425) MDadmExamine.dev.sda: /dev/sda: MBR Magic : aa55 Partition[0] : 3907029167 sectors at1 (type ee) MDadmExamine.dev.sda1: Error: command ['/sbin/mdadm', '-E', '/dev/sda1'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda1. MDadmExamine.dev.sda11: Error: command ['/sbin/mdadm', '-E', '/dev/sda11'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda11. MDadmExamine.dev.sda4: Error: command ['/sbin/mdadm', '-E', '/dev/sda4'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda4. MDadmExamine.dev.sda5: Error: command ['/sbin/mdadm', '-E', '/dev/sda5'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda5. MDadmExamine.dev.sda6: Error: command ['/sbin/mdadm', '-E', '/dev/sda6'] failed with exit code 1: mdadm: No md superblock detected on /dev/sda6. MDadmExamine.dev.sda7: Error: command
[Bug 1036366] Re: software RAID arrays fail to start on boot
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/1036366 Title: software RAID arrays fail to start on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1036366/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 942106] Re: software raid doesn't assemble before mount on boot
@Brian, Dmitrijs: Thanks. I have filed Bug # 1036366 to report the symptoms not resolved by this recent fix. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/942106 Title: software raid doesn't assemble before mount on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/942106/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 995445] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127
@Atheg: (Note that I am totally guessing about this, and all of the following may be less than helpful) AFAIK, installing tcl8.4 doesn't roll back anything. I just checked my system, and I now have both 8.4 and 8.5 installed where I only had 8.5 before. I don't know if this presents any problems as I never explicitly use TCL myself and know little about it. However: I just tried this in a terminal: me@precise:~$ tclsh % % % exit me@precise:~$ tclsh8.4 % % % % exit me@precise:~$ tclsh8.5 % % % % % exit me@precise:~$ So I can explicitly call up either version of tclsh. They are both installed and working. I don't know how to tell which one comes up when I just type tclsh. The idea of doing sudo apt-get install tcl8.4 to resolve the problem was just an attempt to make a single error message go away, and see what happens after that. As it turns out, it appears to have completely fixed the problem. But it is just a workaround, and I don't know if it will negatively impact anything else you are doing. Somebody more expert on the use of TCL should chime in on this. The error message was coming from a script called gpsmanshp.postinst that is explicitly referring to tclsh8.4, a package that is not installed on a default Precise installation (and there is apparently no dependency listed with gpsmanshp, so it is not automatically installed when gpsmanshp is). Perhaps manually editing that script (replacing 8.4 with 8.5 in one line) would also fix the problem, but I haven't tested that and don't know what ripple effects that would have on anything else, if any. It has been over a month since I tried that fix, and it doesn't seem to have hurt anything. It did make an extremely annoying recurring error message go away. And it did complete the installation (I think!) of gpsmanshp, a package that I know virtually nothing about and haven't even had time to look at since then. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/995445 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apt/+bug/995445/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 995445] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127
@Atheg: (continuing previous comment) I just had a look at the Tcl docs. I learned about the info patchlevel command. This is what it does on my system: me@precise:~$ tclsh % info patchlevel 8.5.11 % exit me@precise:~$ tclsh8.4 % info patchlevel 8.4.19 % exit me@precise:~$ tclsh8.5 % info patchlevel 8.5.11 % exit me@precise:~$ So if I ask for tclsh without specifying a specific version, it is giving me 8.5, the latest. But AFAIK the most recent version that was installed is 8.4. So apparently tclsh will take you to the most recent version number, not the last one installed. I imagine this is the behavior you desire. The gpsmanshp.postinst script will still use the 8.4 version it thinks it needs because it explicitly calls up that version. Hope this helps. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/995445 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/apt/+bug/995445/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 942106] Re: software raid doesn't assemble before mount on boot
This does NOT fix this issue for me. My system still boots up with some RAID arrays not running. Every single time. This system has six SATA drives on two controllers. It contains seven RAID arrays, a mix of RAID 1, RAID 10, and RAID 6; all are listed in fstab. Some use 0.90.0 metadata and some use 1.2 metadata. The root filesystem is not on a RAID array (at least not any more, I got tired of that REAL fast) but everything else (including /boot and all swap) is on RAID. BOOT_DEGRADED is set. All partitions are GPT. Not using LUKS or LVM. All drives are 2TB and by various manufacturers, and I suspect some have 512B physical sectors and some have 2KB sectors. This is an AMD64 system with 8GB RAM. This system has had about four different versions of Ubuntu on it, and has had multiple RAID arrays on it from the beginning. (This is why some of the arrays are still using 0.90.0 metadata.) RAID worked fine until the system was upgraded to Oneiric early in 2012 (no, it did not start with Precise). I have carefully tested the system every time an updated kernel or mdadm has appeared, since the problem started with Oneiric. The behavior has gradually improved over the last several months. This latest version of mdadm (3.2.5) did not result in significant improvement; have rebooted four times since then and the behavior is consistent. When the problem first started, on Oneiric, I had the root file system on RAID. This was unpleasant. I stopped using the system for a while, as I had another one running Maverick. When I noticed some discussion of possibly related bugs on the Linux RAID list (I've been lurking there for years) I decided to test the system some more. By then Precise was out, so I upgraded. That did not help. Eventually I backed up all data onto another system and did a clean install of Precise on a non-RAID partition, which made the system tolerable. I left /boot on a RAID1 array (on all six drives), but that does not prevent the system from booting even if /boot does not start during Ubuntu startup (I assume because GRUB can find /boot even if Ubuntu later can't). I started taking detailed notes in May (seven cramped pages so far). Have rebooted 23 times since then. On every boot, exactly two arrays did not start. Which arrays they were, varied from boot to boot; could be any of the arrays. No apparent correlation with metadata type or RAID level. This mdadm 3.2.5 is the first time I have resorted to doing a forced upgrade from -proposed; before, I always just waited for a regular update. The most significant improvements happened with earlier regular updates. It has been a while since I had to wait for a degraded array to resync, or manually re-attach a component (usually a spare) that had become detached, or drop to the command line to zero a superblock before reattaching a component. It has been a while since an array containing swap has failed to start. This issue has now become little more than an annoyance. I can now boot, wait for first array to not start, hit S, wait for the second, hit S, wait for the login screen, log in, wait for Unity desktop, start Disk Utility, manually start the two arrays that didn't start, then check all the other arrays to see if anything else has happened. Takes about five minutes. But I am still annoyed. If you want to replicate this behavior consistently, get yourself seven arrays. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/942106 Title: software raid doesn't assemble before mount on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/942106/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 942106] Re: software raid doesn't assemble before mount on boot
@Dmitrijs: I must agree with your assessment. I cannot really tell which (of many) existing bug reports are most relevant to my system, so I've been chiming in on a bunch of them, just in case they provide clues helpful to others :-) This latest mdadm update does not have any negative impact on me, and clearly helps others, so +1 from me. Will do sudo apport mdadm. And thanks for the link to that spec. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/942106 Title: software raid doesn't assemble before mount on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/942106/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 942106] Re: software raid doesn't assemble before mount on boot
@Dmitrijs: sudo apport mdadm does nothing. I know that apport is installed and the service is running. Am I doing this wrong? -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/942106 Title: software raid doesn't assemble before mount on boot To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/942106/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 995445] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127
On my system, tcl8.5 is installed, but tcl8.4 is not. So: sudo apt-get install tcl8.4 This resulted in: Reading package lists... Done Building dependency tree Reading state information... Done Suggested packages: tclreadline The following NEW packages will be installed: tcl8.4 0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded. 1 not fully installed or removed. Need to get 870 kB of archives. After this operation, 3,375 kB of additional disk space will be used. Get:1 http://us.archive.ubuntu.com/ubuntu/ precise/main tcl8.4 amd64 8.4.19-4ubuntu3 [870 kB] Fetched 870 kB in 6s (124 kB/s) Selecting previously unselected package tcl8.4. (Reading database ... 253204 files and directories currently installed.) Unpacking tcl8.4 (from .../tcl8.4_8.4.19-4ubuntu3_amd64.deb) ... Processing triggers for man-db ... Setting up gpsmanshp (1.2.1-1) ... warning: error while loading gpsmanshp.so: couldn't load file ./gpsmanshp.so: ./gpsmanshp.so: undefined symbol: DBFClose Setting up tcl8.4 (8.4.19-4ubuntu3) ... Processing triggers for libc-bin ... ldconfig deferred processing now taking place ...which leads me to suspect that gpsmanshp is now installed. And, now I don't get error messages whenever I try to install packages. So, either this is a dependency problem, or perhaps gpsmanshp.postinst just needs to be edited to refer to tclsh8.5 instead of tclsh8.4. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/995445 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 To manage notifications about this bug go to: https://bugs.launchpad.net/apt/+bug/995445/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 975839] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: le sous-processus script post-installation installé a retourné une erreur de sortie d'état 127
*** This bug is a duplicate of bug 995445 *** https://bugs.launchpad.net/bugs/995445 ** This bug has been marked a duplicate of bug 995445 package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/975839 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: le sous-processus script post-installation installé a retourné une erreur de sortie d'état 127 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gpsmanshp/+bug/975839/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 963536] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: Unterprozess installiertes post-installation-Skript gab den Fehlerwert 127 zurück
*** This bug is a duplicate of bug 995445 *** https://bugs.launchpad.net/bugs/995445 ** This bug has been marked a duplicate of bug 995445 package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/963536 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: Unterprozess installiertes post-installation-Skript gab den Fehlerwert 127 zurück To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gpsmanshp/+bug/963536/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 995445] Re: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127
Following this installation failure, an error message appears at the end of every upgrade of any software package. So affected users see error messages almost every day, even if they never even try to run gpsmanshp. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/995445 Title: package gpsmanshp 1.2.1-1 failed to install/upgrade: subprocess installed post-installation script returned error exit status 127 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/gpsmanshp/+bug/995445/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 990913] Re: RAID goes into degrade mode on every boot 12.04 LTS server
Precise is using a 3.2.0 kernel. There is a known MD bug that affects some 3.2.x and 3.3.x kernels, that seems like it might be relevant to this problem. See: http://www.spinics.net/lists/raid/msg39004.html and the rest of that thread. Note the mention of possible racing in scripts. Unfortunately for us, the lead MD developer does not test with Ubuntu, or with any other Debian-based distro. (He only uses SUSE.) So if there are any complex race conditions or other problems created by Ubuntu's udev scripts or configs or whatever, he might not uncover them in his testing, and the level of assistance he can provide is limited. (He and the others on the linux-raid list are indeed helpful, but I'm not sure that very many of them use Ubuntu, and the level of the discussion there is fairly technical and probably well beyond what most Ubuntu users could follow.) Now that Canonical has announced the plan to eliminate the Alternate installer and merge all installer functionality (presumably including RAID) into the regular Desktop installer, it seems likely that the number of users setting up RAID arrays will increase. (I am using Desktop myself, not Server). For some time now, it has been possible to set up and (to a limited degree) manage software RAID arrays on Ubuntu without any knowledge of the command line. So there are Desktop users who are using RAID arrays, thinking they are safeguarding their data. But when the complex creature known as linux software RAID breaks down, as it has with this bug, they are quickly in over their heads. Given that RAID bugs can destroy the user's data, just about the worst thing that can happen, it would seem prudent to either (1) actively discourage non-expert users from using RAID, or (2) make Ubuntu's implementation of RAID far more reliable. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/990913 Title: RAID goes into degrade mode on every boot 12.04 LTS server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 990913] Re: RAID goes into degrade mode on every boot 12.04 LTS server
Since my last comment, an updated kernel arrived via Update Manager. Its changelog included the following: * md: fix possible corruption of array metadata on shutdown. - LP: #992038 This seems possibly relevant. I updated, and have now rebooted several times. The RAID degradation is still happening, on every reboot. As before, the system runs just fine after I finish fixing up RAID. I am now keeping detailed notes on which partitions are being degraded. Since it takes me anywhere from fifteen minutes to several hours to accomplish each reboot and ensuing repair, and I have other things to do as well, it will be a while before meaningful statistics are accumulated. Further details I forgot to mention earlier: This is an AMD64 system with 8GB of ECC RAM. Have attached most recent dmesg. ** Attachment added: dmesg.txt https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+attachment/3157872/+files/dmesg.txt -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/990913 Title: RAID goes into degrade mode on every boot 12.04 LTS server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 990913] Re: RAID goes into degrade mode on every boot 12.04 LTS server
I have now installed Precise on my system. (I had intended to install as a multiboot, along with the existing Oneiric, but apparently the alternate installer could not recognize my existing /boot RAID1 partition, so now I can't boot Oneiric. But that's another story...) Note that the title of the original bug report refers to 12.04 Server, but I have a Desktop system, installed with the Alternate disk. This time I installed / on a non-RAID partition. My pre-existing RAID partitions are now mounted as directories in /media, except for /boot, which is still on the same MD partition as before. I have now rebooted several times since installing 12.04. The previous behavior of hanging during shutdown has not recurred. Also, pleasantly, the previous behavior of hanging during boot (between the GRUB splash and the Ubuntu splash) has also not recurred. I am getting error messages on the Ubuntu splash screen (under the crawling dots) about file systems not being found. I have seen these occasionally for many years, and have become quite accustomed to them. It says I can wait, or hit S to skip, or do something manually; I wait for a while, but soon give up on that and hit S because waiting NEVER accomplishes anything. I'm not sure why that option is even mentioned. Fortunately, this has not been happening with my /, so I can successfully log into Ubuntu. Once there, I start up palimpsest (Disk Utility) and look at the RAID partitions. Generally, about half of them are degraded or simply not started. The ones that are not started are the ones mentioned in the error messages on the splash screen. I can start them from palimpsest; sometimes they start degraded, sometimes not. After about an hour of work, all of the degraded partitions are fully synchronized. I usually have to re-attach some components as well. Haven't lost any data yet. Sometimes I cannot re-attach a component using palimpsest and have to drop to the command line, zero the superblock, and then add the component. This has always worked so far. I only noticed this particular behavior since installing Precise. In short: On this system, RAID usually degrades upon reboot. It did this with Oneiric (but only starting a few weeks ago) and it does this with a freshly installed Precise. Around the time this behavior started with Oneiric, I did a lot of maintenance work on this hardware, including: 1) swapping out one hard drive 2) putting some 1.2 metadata RAID partitions on, where previously all were 0.90 metadata I have not noticed any correlation between metadata version and degradation. Any of them can get degraded, in an apparently random fashion. Between reboots, the system runs just fine. Hard drive SMART health appears stable. The newest hard drive is reported as healthy. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/990913 Title: RAID goes into degrade mode on every boot 12.04 LTS server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 990913] Re: RAID goes into degrade mode on every boot 12.04 LTS server
I am having similar problems. I am running Oneiric. I am NOT using LUKS or LVM. Symptoms vary in severity a lot. Sometimes it simply drops a spare, and it's listed in palimpsest as not attached. One click of the button and it's reattached, and shown as spare. But then sometimes it gets really hairy. These nightmares usually start when I shut down the system, and it appears to hang during shutdown. Now that this has been going on for a while, I *always* check the status of my drives and arrays immediately before shutting down. First I shut down all apps, then I start palimpsest. I check the SMART health of all drives (all are healthy, except for one that has one bad block, and that never changes). Then I check the arrays; all are running and idle. I also drill down and check the array components, to make sure they are all attached. If I find one that isn't attached, I attach it. I don't shut down until everything looks good. Then I shut down, and cross all my fingers and toes. About half the time, shutdown never completes. It hangs on the purple screen, with Ubuntu and five dots that don't crawl. I watch the drive activity light; nothing. No drive activity at all. Then I wait and wait and wait, wasting my valuable time (well, valuable to me anyway) until I get fed up. Then I do what my mommy always told me, and shut down with Alt-SysRq -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/990913 Title: RAID goes into degrade mode on every boot 12.04 LTS server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 990913] Re: RAID goes into degrade mode on every boot 12.04 LTS server
(Ooops, apparently hit the wrong key... continuing the previous comment) ...shut down with Alt-SysRq REISUB. This has no effect whatsoever. The screen doesn't change; the drive activity light does nothing. Finally, after stewing for a while longer, I hold down the power switch until I hear all the fans powering down. Then I boot up. I see no error messages. Everything seems to be working fine, except the part about having to boot it three or four times before it actually gets past the GRUB splash screen and arrives at the Ubuntu splash screen. After that, everything looks great... I log in, and get to Unity, and I never saw any error message going by. Then, the first thing I do is start up palimpsest and check the drives and arrays. The drives are always fine, but generally about half of the arrays are degraded. Sometimes it will start re-syncing one of the arrays all by itself; usually it starts with an array that I don't care so much about, and I can't do anything about the ones with more important data until later, because apparently palimpsest can only change one RAID-related thing at a time. Which means that sometimes I have to wait for mny hours to start working on the next array. The worst I've seen was the time it detached two drives from my RAID6 array. Very scary. I have one RAID6 array, one RAID10 array, and several RAID1 arrays. I think all of them have degraded at one time or another. This bug seems to be an equal opportunity degrader. Usually I find two or three of the larger arrays are degraded, plus several detached spares on other arrays. This system has six 2TB drives. I think some of them have 512 byte sectors, and some have 2048 byte sectors; how the heck do you tell, anyway? All use GPT partitions, and care has been taken to align all partitions on 1MB boundaries (palimpsest actually reports if it finds alignment issues). The system has two SATA controllers. I put four drives on one controller, and two on the other, and for the RAID1 and RAID10 arrays I make sure there are no mirrors where both parts are on the same controller, or both parts on drives made by the same company. Except, that isn't really true any more; whenever something gets degraded and I have to re-attach and re-sync, the array members often get rearranged. I think most of my spares are now concentrated on a couple of drives, which isn't really what I had planned. I've given up on rearranging the drives to my liking, for the duration. In fact, for the duration, I've given up on this system. I've been gradually moving data off it, onto another system, which is running Maverick, and it will continue to run Maverick because it doesn't try to rearrange my data storage every time I look at it sideways. (Verry gradually, since NFS has been broken for the better part of a year...) This nice expensive Oneiric system will be dedicated to the task of rebooting, re-attaching, and re-syncing, until Oneiric starts to behave itself. I am planning to also install Precise (multiboot) so I can test that too. Attempting an OS install while partitions are borking themselves on every other reboot sounds like fun. BTW, I watched the UDS Software RAID reliability session video from last Tuesday: https://www.youtube.com/watch?v=RpC-dkgN37Mlist=UUWUDCz- Q0m4qK7lkK4CevQAindex=2feature=plcp I was quite pleased to see that people are working on these problems. (But I was particularly surprised to learn how many people there were completely unaware that Ubuntu rearranges device names (i.e. /dev/sda etc.) at each reboot. I noticed that a really long time ago.) -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/990913 Title: RAID goes into degrade mode on every boot 12.04 LTS server To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/990913/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 848823] Re: nfs-kernel-server requires a real interface to be up
This is affecting me as well. It started when I upgraded from 11.04 to 11.10. I am identifying machines with IPs, not DNS. I set it up as per this tutorial: https://mostlylinux.wordpress.com/network/nfshowto/ These instructions work fine for 10.04, 10.10, and 11.04. (However, the commands it describes for restarting services no longer apply in 10.10 and later; my workaround has been to simply reboot the machine instead, and that works fine). I still have machines running those older versions, and they still talk to each other, but the one running 11.10 is incommunicado. I did get the error message Tim mentioned about /etc/exports.d: No such file or directory, but I simply created an empty directory at /etc/exports.d and that message no longer appears. But NFS still doesn't work. I have tried the workaround Joseph Brown describes but that doesn't seem to help. I would like to try the workaround where one uses /etc/network/interfaces instead of network-manager, but I have no idea how to do that. If someone could spell that out, at the user- friendliness level of the tutorial I mentioned above, that would be great. BTW, I know at least two other people who are using NFS on 10.04 LTS, using basically the same setup I have (from that same tutorial), and who probably would be upgrading to the upcoming LTS if they hadn't already been warned about NFS being borked. I wonder how many other LTS users are about to get a nasty surprise. -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/848823 Title: nfs-kernel-server requires a real interface to be up To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/ifupdown/+bug/848823/+subscriptions -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 666038] Re: error while creating logical partition
I just tested this on Maverick final release (AMD64 alternate installer). Same result. -- error while creating logical partition https://bugs.launchpad.net/bugs/666038 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 666038] [NEW] error while creating logical partition
Public bug reported: Binary package hint: gnome-disk-utility Running Maverick release candidate, with all the updates applied since final release (which happened two weeks ago). Ran palimpsest to add partitions to an empty SATA drive, in preparation for from-scratch installation of Maverick final. To reproduce: Start with empty 500GB drive. Add three empty primary partitions: 1GB, 189GB, 20GB. Then add an extended partition, using all remaining space. Add one empty 20GB logical partition within the extended. Error reported: Error creating partition An error occurred while performing an operation on 500 GB Hard Disk (ATA Hitachi HTS545050B9A300): The operation failed Details: Error creating partition: helper exited with exit code 1: In part_add_partition: device_file=/dev/sde, start=210007848960, size=200, type=0x83 Entering MS-DOS parser (offset=0, size=500107862016) MSDOS_MAGIC found looking at part 0 (offset 32256, size 1003451904, type 0x83) new part entry looking at part 1 (offset 1003484160, size 189000483840, type 0x83) new part entry looking at part 2 (offset 190003968000, size 20003880960, type 0x83) new part entry looking at part 3 (offset 210007848960, size 290097400320, type 0x05) Entering MS-DOS extended parser (offset=210007848960, size=290097400320) readfrom = 210007848960 MSDOS_MAGIC found Exiting MS-DOS extended parser Exiting MS-DOS parser MSDOS partition table detected containing partition table scheme = 1 got it got disk new partition added partition start=210007881216 size=20003848704 committed to disk Error doing BLKPG ioctl with BLKPG_ADD_PARTITION for partition 5 of size 210007881216 at offset 20003848704 on /dev/sde: Device or resource busy Note that I also tried this with a different 500GB drive, made by a different company, and got the same result. I see that there are other bugs filed against Lucid that seem to afflict 500GB drives but not other commonly used sizes. I was under the impression that these bugs had been fixed by the time Maverick release candidate appeared, but perhaps all of them weren't, or perhaps this is unrelated. ProblemType: Bug DistroRelease: Ubuntu 10.10 Package: gnome-disk-utility 2.30.1-2 ProcVersionSignature: Ubuntu 2.6.35-22.35-generic-pae 2.6.35.4 Uname: Linux 2.6.35-22-generic-pae i686 Architecture: i386 Date: Sun Oct 24 13:19:47 2010 ExecutablePath: /usr/bin/palimpsest InstallationMedia: Ubuntu 10.10 Maverick Meerkat - Release Candidate i386 (20100928) ProcEnviron: LANG=en_US.UTF-8 SHELL=/bin/bash SourcePackage: gnome-disk-utility XsessionErrors: (polkit-gnome-authentication-agent-1:1756): GLib-CRITICAL **: g_once_init_leave: assertion `initialization_value != 0' failed (nautilus:1751): GConf-CRITICAL **: gconf_value_free: assertion `value != NULL' failed ** Affects: gnome-disk-utility (Ubuntu) Importance: Undecided Status: New ** Tags: apport-bug i386 maverick -- error while creating logical partition https://bugs.launchpad.net/bugs/666038 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 666038] Re: error while creating logical partition
-- error while creating logical partition https://bugs.launchpad.net/bugs/666038 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 569900] Re: mount: mounting /dev/md0 on /root/ failed: Invalid argument
This bit me too. Two 500GB drives, RAID1, using 10.04.1 alternate 386 installer. Reading through all these comments, and those on similar (possibly related) bugs, it seems like this is caused by an arithmetic error in some code that figures out where things ought to be on the disk. Suddenly I am reminded of another arithmetic error that cropped up in gparted recently, relating to the switchover from align-to-cylinder to align-to-megabyte. Didn't the default partition alignment method just change in Lucid? Very suspicious... -- mount: mounting /dev/md0 on /root/ failed: Invalid argument https://bugs.launchpad.net/bugs/569900 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
[Bug 199393] Re: servicemenu for amarok has an invalid menu entry addAsPodcast
I looked at the amarok_addaspodcast.desktop file and compared it with a number of other amarok*.desktop files in the same folder. This one differed in that it had no Exec line in it. I added one after the Icon line: Exec=amarok -a %u I have no idea if this is actually the correct line to use, as the amarok docs I found were not very instructive on this issue. (I'm sure it helps that I know little about amarok.) But, adding this line to the file eliminated the bad behavior. In any case, it would be a mistake to consider this merely a bug in a .desktop file. The biggest bug is in dolphin itself. An app shouldn't start behaving like a popup-mad web browser from the bad old days just because a config file is wrong. If it finds a bad .desktop file, it should skip it (and maybe log the error somewhere). I tried dragging a selection box around a list of files in a dolphin window and it went crazy with endless identical dialog boxes that I just couldn't close fast enough. And the pane to the right with context- sensitive options started replicating itself down the window. I couldn't close dolphin, and eventually had to restart the xserver. So this appears to be two bugs: One in dolphin itself, and one in the file amarok_addaspodcast.desktop . -- servicemenu for amarok has an invalid menu entry addAsPodcast https://bugs.launchpad.net/bugs/199393 You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. -- ubuntu-bugs mailing list ubuntu-bugs@lists.ubuntu.com https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs