Re: Frequent SATA errors / port timeouts in 2.6.18.3?
Patrik Jonsson wrote: Hi all, this may not be the best list for this question, but I figure that the number of disks connected to users here should be pretty big... I upgraded from 2.6.17-rc4 to 2.6.18.3 about a week ago, and I've since had 3 drives kicked out of my 10-drive RAID5 array. Previously, I had no kicks over almost a year. The kernel message is: ata7.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 ata7.00: (BMDMA stat 0x20) ata7.00: tag 0 cmd 0xc8 Emask 0x1 stat 0x41 err 0x4 (device error) ata7: EH complete Any ideas or thought would be appreciated, SMART? Read the manpage and then try running: smartctl -data -S on /dev/... and smartctl -data -s on /dev/... Then look at your smartd timing and see if it's related; possibly just do a manual smartd poll. I've had smart/libata problems (well, no, glitches) for about 2 years now but as the irq handler occasionally says no one cared ;) It may well not be your problem but... David - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: disappointed with 3ware 9550sx
i want to say up front that i have several 3ware 7504 and 7508 cards which i am completely satisfied with. i use them as JBOD, and they make stellar PATA controllers (not RAID controllers). they're not perfect (they're slow), but they've been rock solid for years. not so the 9550sx. i've been a software raid devotee for years now. i've never wanted to trust my data to hw raid, because i can't look under the covers and see what it's doing, and i'm at the mercy of the vendor when it comes to recovery situations. so why did i even consider hw raid? NVRAM. i wanted the write performance of NVRAM. i debated between areca and 3ware, but given the areca driver wasn't in the kernel (it is now), the lack of smartmontools support for areca, and my experiences with the 7504/7508 i figured i'd stick with what i know. sure i am impressed with the hw raid i/o rates on the 9550sx, especially with the NVRAM. but i am unimpressed with several failures which have occured which evidence suggests are 3ware's fault (or at worst would not have resulted in problems with sw raid). my configuration has 7 disks: - 3x400GB WDC WD4000YR-01PLB0 firmware 01.06A01 - 4x250GB WDC WD2500YD-01NVB1 firmware 10.02E01 those disks and firmwares are on the 3ware drive compatibility list: http://www.3ware.com/products/pdf/Drive_compatibility_list_9550SX_9590SE_2006_09.pdf note that the compatibility list has a column NCQ, which i read as an indication the drive supports NCQ or not. as supporting evidence for this i refer to footnote number 4, which is specifically used on some drives which MUST NOT have NCQ enabled. i had NCQ enabled on all 7 drives. perhaps this is the source of some of my troubles, i'll grant 3ware that. initially i had the firmware from the 9.3.0.4 release on the 9550sx (3.04.00.005) it was the most recent at the time i installed the system. (and the appropriate driver in the kernel -- i think i was using 2.6.16.x at the time.) my first disappointment came when i tried to create a 3-way raid1 on the 3x400 disks. it doesn't support it at all. i had become so accustomed to using a 3-way raid1 with software raid it didn't even occur to me to find out up front if the 3ware could support this. apparently this is so revolutionary an idea 3ware support was completely baffled when i opened a ticket regarding it. why would you want that? it will fail over to a spare disk automatically. still lured by the NVRAM i gave in and went with a 2-way mirror plus a spare. (i prefer the 3-way mirror so i'm never without a redundant copy and don't have to rush to the colo with a replacement when a disk fails.) the 4x250GB were turned into a raid-10. install went fine, testing went fine, system was put into production. second disappointment: within a couple weeks the 9550sx decided it didn't like one of the 400GB disks and knocked it out of the array. here's what the driver had to say about it: Sep 6 23:47:30 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0009): Drive timeout detected:port=0. Sep 6 23:47:31 kernel: 3w-9xxx: scsi0: AEN: ERROR (0x04:0x0002): Degraded unit:unit=0, port=0. Sep 6 23:48:46 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild started:unit=0. Sep 7 00:02:12 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x003B): Rebuild paused:unit=0. Sep 7 00:02:27 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x000B): Rebuild started:unit=0. Sep 7 09:32:19 kernel: 3w-9xxx: scsi0: AEN: INFO (0x04:0x0005): Rebuild completed:unit=0. the 9550sx could still communicate with the disk -- the SMART log had no indications of error. i converted the drive to JBOD and read and overwrote the entire surface without a problem. i ended up just just converting the drive to the spare disk... but remained worried about why it could have been knocked out of the array. maybe this is a WD bug, maybe it's a 3ware bug, who knows. third disappointment: for a large data copy i inserted a disk into the remaining spare slot on the 3ware. now i'm familiar with 750[48] where i run everything as JBOD and never let 3ware raid touch it. when i inserted this 8th disk i found i had to ask tw_cli to create a JBOD. the disappointment comes here: it zeroed the MBR! fortunately the disk had a single full-sized partition and i could recreate the partition table, but there's no sane reason to zero the MBR just because i asked for the disk to be treated as JBOD (and don't tell me it'll reduce customer support cases because people might reuse a bad partition table from a previously raid disk -- i think it'll create even more problems than that explanation might solve). fourth disappointment: heavy write traffic on one unit can affect other units even though they have separate spindles. my educated guess is the 3ware does not share its cache fairly and the write traffic starves everything else. i described this in a post here
Re: [RFC: 2.6 patch] simplify drivers/md/md.c:update_size()
On Fri, 2006-12-15 at 01:19 +0100, Adrian Bunk wrote: While looking at commit 8ddeeae51f2f197b4fafcba117ee8191b49d843e, I got the impression that this commit couldn't fix anything, since the size variable can't be changed before fit gets used. Is there any big thinko, or is the patch below that slightly simplifies update_size() semantically equivalent to the current code? No, this patch is broken. Where it fails is specifically the case where you want to autofit the largest possible size, you have different size devices, and the first device is not the smallest. When you hit the first device, you will set size, then as you repeat the ITERATE_RDEV loop, when you hit the smaller device, size will be non-0 and you'll then trigger the later if and return -ENOSPC. In the case of autofit, you have to preserve the fit variable instead of looking at size so you know whether or not to modify the size when you hit a smaller device later in the list. Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- drivers/md/md.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- linux-2.6.19-mm1/drivers/md/md.c.old 2006-12-15 00:57:05.0 +0100 +++ linux-2.6.19-mm1/drivers/md/md.c 2006-12-15 00:57:42.0 +0100 @@ -4039,57 +4039,56 @@ * Generate a 128 bit UUID */ get_random_bytes(mddev-uuid, 16); mddev-new_level = mddev-level; mddev-new_chunk = mddev-chunk_size; mddev-new_layout = mddev-layout; mddev-delta_disks = 0; mddev-dead = 0; return 0; } static int update_size(mddev_t *mddev, unsigned long size) { mdk_rdev_t * rdev; int rv; struct list_head *tmp; - int fit = (size == 0); if (mddev-pers-resize == NULL) return -EINVAL; /* The size is the amount of each device that is used. * This can only make sense for arrays with redundancy. * linear and raid0 always use whatever space is available * We can only consider changing the size if no resync * or reconstruction is happening, and if the new size * is acceptable. It must fit before the sb_offset or, * if that is data_offset, it must fit before the * size of each device. * If size is zero, we find the largest size that fits. */ if (mddev-sync_thread) return -EBUSY; ITERATE_RDEV(mddev,rdev,tmp) { sector_t avail; avail = rdev-size * 2; - if (fit (size == 0 || size avail/2)) + if (size == 0) size = avail/2; if (avail ((sector_t)size 1)) return -ENOSPC; } rv = mddev-pers-resize(mddev, (sector_t)size *2); if (!rv) { struct block_device *bdev; bdev = bdget_disk(mddev-gendisk, 0); if (bdev) { mutex_lock(bdev-bd_inode-i_mutex); i_size_write(bdev-bd_inode, (loff_t)mddev-array_size 10); mutex_unlock(bdev-bd_inode-i_mutex); bdput(bdev); } } return rv; } -- Doug Ledford [EMAIL PROTECTED] GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband signature.asc Description: This is a digitally signed message part
Re: [RFC: 2.6 patch] simplify drivers/md/md.c:update_size()
On Thu, Dec 14, 2006 at 07:36:35PM -0500, Doug Ledford wrote: On Fri, 2006-12-15 at 01:19 +0100, Adrian Bunk wrote: While looking at commit 8ddeeae51f2f197b4fafcba117ee8191b49d843e, I got the impression that this commit couldn't fix anything, since the size variable can't be changed before fit gets used. Is there any big thinko, or is the patch below that slightly simplifies update_size() semantically equivalent to the current code? No, this patch is broken. Where it fails is specifically the case where you want to autofit the largest possible size, you have different size devices, and the first device is not the smallest. When you hit the first device, you will set size, then as you repeat the ITERATE_RDEV loop, when you hit the smaller device, size will be non-0 and you'll then trigger the later if and return -ENOSPC. In the case of autofit, you have to preserve the fit variable instead of looking at size so you know whether or not to modify the size when you hit a smaller device later in the list. ... OK, sorry, I've got my thinko: ITERATE_RDEV() is a loop. That's what I missed. cu Adrian -- Is there not promise of rain? Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. Only a promise, Lao Er said. Pearl S. Buck - Dragon Seed - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
Nikolai Joukov wrote: We have designed a new stackable file system that we called RAIF: Redundant Array of Independent Filesystems. Great! We have performed some benchmarking on a 3GHz PC with 2GB of RAM and U320 SCSI disks. Compared to the Linux RAID driver, RAIF has overheads of about 20-25% under the Postmark v1.5 benchmark in case of striping and replication. In case of RAID4 and RAID5-like configurations, RAIF performed about two times *better* than software RAID and even better than an Adaptec 2120S RAID5 controller. I am not surprised. RAID 4/5/6 performance is highly sensitive to the underlying hw, and thus needs a fair amount of fine tuning. Nevertheless, performance is not the biggest advantage of RAIF. For read-biased workloads RAID is always slightly faster than RAIF. The biggest advantages of RAIF are flexible configurations (e.g., can combine NFS and local file systems), per-file-type storage policies, and the fact that files are stored as files on the lower file systems (which is convenient). This is because RAIF is located above file system caches and can cache parity as normal data when needed. We have more performance details in a technical report, if anyone is interested. Definitely interested. Can you give a link? The main focus of the paper is on a general OS profiling method and not on RAIF. However, it has some details about the RAIF benchmarking with Postmark in Chapter 9: http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5 operation under the same Postmark workload. Nikolai. - Nikolai Joukov, Ph.D. Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
On Friday 15 December 2006 10:01, Nikolai Joukov wrote: Nikolai Joukov wrote: We have designed a new stackable file system that we called RAIF: Redundant Array of Independent Filesystems. Great! Yes, definitely... I see the major benefit being in the mobile, industrial and embedded systems arena. Perhaps this might come as a suprise to people, but a very large and ever growing number (perhaps even most) Linux devices don't use block devices for storage. Instead they use flash file systems or nfs, niether of which use local block devices. It looks like RAIF gives a way to provide redundancy etc on these devices. We have performed some benchmarking on a 3GHz PC with 2GB of RAM and U320 SCSI disks. Compared to the Linux RAID driver, RAIF has overheads of about 20-25% under the Postmark v1.5 benchmark in case of striping and replication. In case of RAID4 and RAID5-like configurations, RAIF performed about two times *better* than software RAID and even better than an Adaptec 2120S RAID5 controller. I am not surprised. RAID 4/5/6 performance is highly sensitive to the underlying hw, and thus needs a fair amount of fine tuning. Nevertheless, performance is not the biggest advantage of RAIF. For read-biased workloads RAID is always slightly faster than RAIF. The biggest advantages of RAIF are flexible configurations (e.g., can combine NFS and local file systems), per-file-type storage policies, and the fact that files are stored as files on the lower file systems (which is convenient). This is because RAIF is located above file system caches and can cache parity as normal data when needed. We have more performance details in a technical report, if anyone is interested. Definitely interested. Can you give a link? The main focus of the paper is on a general OS profiling method and not on RAIF. However, it has some details about the RAIF benchmarking with Postmark in Chapter 9: http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5 operation under the same Postmark workload. Nikolai. - Nikolai Joukov, Ph.D. Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line unsubscribe linux-fsdevel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
Nikolai Joukov wrote: http://www.fsl.cs.sunysb.edu/docs/joukov-phdthesis/thesis.pdf Figures 9.7 and 9.8 also show profiles of the Linux RAID5 and RAIF5 operation under the same Postmark workload. Nikolai. - Nikolai Joukov, Ph.D. Filesystems and Storage Laboratory Stony Brook University Well, Congratulations, Doctor!! [Must be nice to be exiled to Stony Brook!! Oh, well, not I] For some reason, I can not connect to the above link, but I may not need to. Does [should] it contain a link/pointer to the underlying source code? This concept sounds very interesting, and I am sure that many of us would like to look closer, and maybe even get a taste. Here's hoping that source exists, and that it is available for us. Thanks b- - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
We started the project in April 2004. Right now I am using it as my /home/kolya file system at home. We believe that at this stage RAIF is mature enough for others to try it out. The code is available at: ftp://ftp.fsl.cs.sunysb.edu/pub/raif/ The code requires no kernel patches and compiles for a wide range of kernels as a module. The latest kernel we used it for is 2.6.13 and we are in the process of porting it to 2.6.19. We will be happy to hear your back. When removing a file from the underlying branch, the oops below happens. Wouldn't it be possible to just fail the branch instead of oopsing? This is a known problem of all Linux stackable file systems. Users are not supposed to change the file systems below mounted stackable file systems (but they can read them). One of the ways to enforce it is to use overlay mounts. For example, mount the lower file systems at /raif/b0 ... /raif/bN and then mount RAIF at /raif. Stackable file systems recently started getting into the kernel and we hope that there will be a better solution for this problem in the future. Having said that, you are right: failing the branch would be the right thing to do. Nikolai. - Nikolai Joukov, Ph.D. Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] RAIF: Redundant Array of Independent Filesystems
Well, Congratulations, Doctor!! [Must be nice to be exiled to Stony Brook!! Oh, well, not I] Long Island is a very nice place with lots of vineries and perfect sand beaches - don't envy :-) Here's hoping that source exists, and that it is available for us. I guess, you are subscribed to the linux-raid list only. Unfortunately, I didn't CC my post to that list and one of the replies was CC'd there without the link. The original post is available here: http://marc.theaimsgroup.com/?l=linux-fsdevelm=116603282106036w=2 And the link to the sources is: ftp://ftp.fsl.cs.sunysb.edu/pub/raif/ Nikolai. - Nikolai Joukov, Ph.D. Filesystems and Storage Laboratory Stony Brook University - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC: 2.6 patch] simplify drivers/md/md.c:update_size()
While looking at commit 8ddeeae51f2f197b4fafcba117ee8191b49d843e, I got the impression that this commit couldn't fix anything, since the size variable can't be changed before fit gets used. Is there any big thinko, or is the patch below that slightly simplifies update_size() semantically equivalent to the current code? Signed-off-by: Adrian Bunk [EMAIL PROTECTED] --- drivers/md/md.c |3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- linux-2.6.19-mm1/drivers/md/md.c.old2006-12-15 00:57:05.0 +0100 +++ linux-2.6.19-mm1/drivers/md/md.c2006-12-15 00:57:42.0 +0100 @@ -4039,57 +4039,56 @@ * Generate a 128 bit UUID */ get_random_bytes(mddev-uuid, 16); mddev-new_level = mddev-level; mddev-new_chunk = mddev-chunk_size; mddev-new_layout = mddev-layout; mddev-delta_disks = 0; mddev-dead = 0; return 0; } static int update_size(mddev_t *mddev, unsigned long size) { mdk_rdev_t * rdev; int rv; struct list_head *tmp; - int fit = (size == 0); if (mddev-pers-resize == NULL) return -EINVAL; /* The size is the amount of each device that is used. * This can only make sense for arrays with redundancy. * linear and raid0 always use whatever space is available * We can only consider changing the size if no resync * or reconstruction is happening, and if the new size * is acceptable. It must fit before the sb_offset or, * if that is data_offset, it must fit before the * size of each device. * If size is zero, we find the largest size that fits. */ if (mddev-sync_thread) return -EBUSY; ITERATE_RDEV(mddev,rdev,tmp) { sector_t avail; avail = rdev-size * 2; - if (fit (size == 0 || size avail/2)) + if (size == 0) size = avail/2; if (avail ((sector_t)size 1)) return -ENOSPC; } rv = mddev-pers-resize(mddev, (sector_t)size *2); if (!rv) { struct block_device *bdev; bdev = bdget_disk(mddev-gendisk, 0); if (bdev) { mutex_lock(bdev-bd_inode-i_mutex); i_size_write(bdev-bd_inode, (loff_t)mddev-array_size 10); mutex_unlock(bdev-bd_inode-i_mutex); bdput(bdev); } } return rv; } - To unsubscribe from this list: send the line unsubscribe linux-raid in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html