[zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Hello list, I'm having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the backups pool (which turned out to be pretty slow for file deletes since there are a lot of files on the backups pool and I haven't installed an l2arc yet). The main memory is 6 GB, it's an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I'm in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Hi Frank, you might face the problem of lots of snapshots of your filesystems. For each snapshot a device is created during import of the pool. This can easily lead to an extend startup time. At my system it took about 15 minutes for 3500 snapshots. 2010/12/8 Frank Van Damme frank.vanda...@gmail.com Hello list, I'm having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the backups pool (which turned out to be pretty slow for file deletes since there are a lot of files on the backups pool and I haven't installed an l2arc yet). The main memory is 6 GB, it's an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I'm in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Failed zil devices will also cause this... Fred From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Wolfram Tomalla Sent: Wednesday, December 08, 2010 10:40 PM To: Frank Van Damme Cc: zfs-discuss@opensolaris.org Subject: Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems Hi Frank, you might face the problem of lots of snapshots of your filesystems. For each snapshot a device is created during import of the pool. This can easily lead to an extend startup time. At my system it took about 15 minutes for 3500 snapshots. 2010/12/8 Frank Van Damme frank.vanda...@gmail.commailto:frank.vanda...@gmail.com Hello list, I'm having trouble with a server holding a lot of data. After a few months of uptime, it is currently rebooting from a lockup (reason unknown so far) but it is taking hours to boot up again. The boot process is stuck at the stage where it says: mounting zfs filesystems (1/5) the machine responds to pings and keystrokes. I can see disk activity; the disk leds blink one after another. The file system layout is: a 40 GB mirror for the syspool, and a raidz volume over 4 2TB disks which I use for taking backups (=the purpose of this machine). I have deduplication enabled on the backups pool (which turned out to be pretty slow for file deletes since there are a lot of files on the backups pool and I haven't installed an l2arc yet). The main memory is 6 GB, it's an HP server running Nexenta core platform (kernel version 134f). I assume sooner or later the machine will boot up, but I'm in a bit of a panic about how to solve this permanently - after all the last thing I want is not being able to restore data one day because it takes days to boot the machine. Does anyone have an idea how much longer it may take and if the problem may have anything to do with dedup? -- Frank Van Damme No part of this copyright message may be reproduced, read or seen, dead or alive or by any means, including but not limited to telepathy without the benevolence of the author. ___ zfs-discuss mailing list zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [OpenIndiana-discuss] iops...
I am totally aware of these differences, but it seems some people think RAIDz is nonsense unless you don't need speed at all. My testing shows (so far) that the speed is quite good, far better than single drives. Also, as Eric said, those speeds are for random i/o. I doubt there is very much out there that is truely random i/o except perhaps databases, but then, I would never use raid5/raidz for a DB unless at gunpoint. Well besides databases there are VM datastores, busy email servers, busy ldap servers, busy web servers, and I'm sure the list goes on and on. I'm sure it is much harder to list servers that are truly sequential in IO then random. This is especially true when you have thousands of users hitting it. For busy web servers, I would guess most of the data can be cached, at least over time, and with good amounts of arc/l2arc, this should remove most of that penalty. A spooling server is another thing, for which I don't think raidz would be suitable, although with async i/o will streamline at least some of it. For VM datastores, I totally agree. Vennlige hilsener / Best regards roy -- Roy Sigurd Karlsbakk (+47) 97542685 r...@karlsbakk.net http://blogg.karlsbakk.net/ -- I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og relevante synonymer på norsk. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] 3TB HDD in ZFS
On Tue, Dec 7, 2010 at 11:37 PM, Eugen Leitl eu...@leitl.org wrote: What about Hitachi HDS723030ALA640 (aka Deskstar 7K3000, claimed 24/7)? The spec sheets claim 512b sectors, so hopefully it'll work. There's a lot more info to support that at http://www.hitachigst.com/internal-drives/desktop/deskstar/deskstar-7k3000 as well. -B -- Brandon High : bh...@freaks.com ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
[zfs-discuss] Best choice - file system for system
Hi, I wonder what is the better option to install the system on solaris ufs and zfs sensitive data on whether this is the best all on zfs? What are the pros and cons of such a solution? f...@ll ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems
Dedup? Taking a long time to boot after hard reboot after lookup? I'll bet that it hard locked whilst deleting some files or a dataset that was dedup'd. After the delete is started, it spends *ages* cleaning up the DDT (the table containing a list of dedup'd blocks). If you hard lock in the middle of this clean up, then the DDT isn't valid, to anything. The next mount attempt on that pool will do this operation for you. Which will take an inordinate amount of time. My pool spent *eight days* (iirc) in limbo, waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload of blocks and then everything was fine. This was for a zfs destroy of a 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs. Unfortunately, raidz is of course slower for random reads than a set or mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat of a workaround for this, although I've not seen comprehensive figures for the gain it gives - http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913 ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [OpenIndiana-discuss] iops...
On Dec 7, 2010, at 9:49 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: From: Ross Walker [mailto:rswwal...@gmail.com] Well besides databases there are VM datastores, busy email servers, busy ldap servers, busy web servers, and I'm sure the list goes on and on. I'm sure it is much harder to list servers that are truly sequential in IO then random. This is especially true when you have thousands of users hitting it. Depends on the purpose of your server. For example, I have a ZFS server whose sole purpose is to receive a backup data stream from another machine, and then write it to tape. This is a highly sequential operation, and I use raidz. Some people have video streaming servers. And http/ftp servers with large files. And a fileserver which is the destination for laptop whole-disk backups. And a repository that stores iso files and rpm's used for OS installs on other machines. And data capture from lab equipment. And packet sniffer / compliance email/data logger. and I'm sure the list goes on and on. ;-) Ok, single stream backup servers are one type, but as soon as you have multiple streams, even for large files, then IOPS trumps throughput to a degree, of course if throughput is very bad then that's no good either. Know your workload is key, or have enough $$ to implement RAID10 everywhere. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
On Wed, 8 Dec 2010, Albert wrote: I wonder what is the better option to install the system on solaris ufs and zfs sensitive data on whether this is the best all on zfs? What are the pros and cons of such a solution? The best choice is usually to install with zfs root on a mirrored pair of disks. UFS is going away as a boot option. Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn The best choice is usually to install with zfs root on a mirrored pair of disks. UFS is going away as a boot option. UFS is already unavailable as a boot option. It's only still available if you're using something old, such as solaris 10u9. (Which is the latest solaris.) ;-) Seriously though. UFS is dead. It has no advantage over ZFS that I'm aware of. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
For anyone who cares: I created an ESXi machine. Installed two guest (centos) machines and vmware-tools. Connected them to each other via only a virtual switch. Used rsh to transfer large quantities of data between the two guests, unencrypted, uncompressed. Have found that ESXi virtual switch performance peaks around 2.5Gbit. Also, if you have a NFS datastore, which is not available at the time of ESX bootup, then the NFS datastore doesn't come online, and there seems to be no way of telling ESXi to make it come online later. So you can't auto-boot any guest, which is itself stored inside another guest. So basically, if you want a layer of ZFS in between your ESX server and your physical storage, then you have to have at least two separate servers. And if you want anything resembling actual disk speed, you need infiniband, fibre channel, or 10G ethernet. (Or some really slow disks.) ;-) ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] [OpenIndiana-discuss] iops...
From: Edward Ned Harvey [mailto:opensolarisisdeadlongliveopensola...@nedharvey.com] In order to test random reads, you have to configure iozone to use a data set which is much larger than physical ram. Since iozone will write a big file and then immediately afterward, start reading it ... It means that whole file will be in cache unless that whole file is much larger than physical ram. You'll get false read results which are unnaturally high. For this reason, when I'm using an iozone benchmark, I remove as much ram from the system as possible. Sorry. There's a better way. This is straight from the mouth of Don Capps, author of iozone: If you use the -w option, then the test file will be left behind. Then reboot, or umount and mount… If you then use the read test, without the write test and again use -w, then you will achieve what you are describing. Example: iozone -i 0 -w -r $recsize -s $filesize Umount, then remount iozone -i 1 -w -r $recsize -s $filesize ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Best choice - file system for system
The only situation I can think of where UFS would be advantageous over ZFS might be in a low memory situation. ZFS loves memory. But to answer the original question, ZFS is where you want to be. Jerry On 12/08/10 20:56, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn The best choice is usually to install with zfs root on a mirrored pair of disks. UFS is going away as a boot option. UFS is already unavailable as a boot option. It's only still available if you're using something old, such as solaris 10u9. (Which is the latest solaris.) ;-) Seriously though. UFS is dead. It has no advantage over ZFS that I'm aware of. ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS
On Dec 8, 2010, at 11:41 PM, Edward Ned Harvey opensolarisisdeadlongliveopensola...@nedharvey.com wrote: For anyone who cares: I created an ESXi machine. Installed two guest (centos) machines and vmware-tools. Connected them to each other via only a virtual switch. Used rsh to transfer large quantities of data between the two guests, unencrypted, uncompressed. Have found that ESXi virtual switch performance peaks around 2.5Gbit. Also, if you have a NFS datastore, which is not available at the time of ESX bootup, then the NFS datastore doesn't come online, and there seems to be no way of telling ESXi to make it come online later. So you can't auto-boot any guest, which is itself stored inside another guest. So basically, if you want a layer of ZFS in between your ESX server and your physical storage, then you have to have at least two separate servers. And if you want anything resembling actual disk speed, you need infiniband, fibre channel, or 10G ethernet. (Or some really slow disks.) ;-) Besides the chicken and egg scenario that Ed mentions there is also the CPU usage that running the storage virtualized. You might find that as you get more machines on the storage the performance will decrease a lot faster then it otherwise would if it were standalone as it competes with the very machines it is suppose to be serving. -Ross ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
Re: [zfs-discuss] snaps lost in space?
usedsnap is the amount of space consumed by all snapshots. Ie, the amount of space that would be recovered if all snapshots were to be deleted. The space used by any one snapshot is the space that would be recovered if that snapshot was deleted. Ie, the amount of space that is unique to that snapshot. Any space usedbysnap that is shared by multiple snapshots will not show up in any snapshot's used. Therefore, deleting a snapshot can increase the adjacent snapshots' used space. So in general, usedbysnaps = sum(used by each snapshot). You can read more about the used property in the zfs(1m) manpage. The bug mentioned below (6792701) is not related to this phenomenon, it manifests as a discrepancy between the filesystem's (or a snapshot's) referenced space and the amount of space accessible through posix interfaces (eg, du(1)). --matt On Mon, Dec 6, 2010 at 4:09 AM, Joost Mulders joost...@gmail.com wrote: Hi, I've output of space allocation which I can't explain. I hope someone can point me at the right direction. The allocation of my home filesystem looks like this: jo...@onix$ zfs list -o space p0/home NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD p0/home 31.0G 156G 86.7G 69.7G 0 0 This tells me that *86,7G* is used by *snapshots* of this filesystem. However, when I look at the space allocation of the snapshots, I don't see the 86,7G back! jo...@onix$ zfs list -t snapshot -o space | egrep 'NAME|^p0\/home' NAME AVAIL USED USEDSNAP USEDDS USEDREFRESERV USEDCHILD p0/h...@s1 - 62.7M - - - - p0/h...@s2 - 53.1M - - - - p0/h...@s3 - 34.1M - - - - p0/h...@s4 - 277M - - - - p0/h...@s5 - 2.21G - - - - p0/h...@s6 - 175M - - - - p0/h...@s7 - 46.1M - - - - p0/h...@s8 - 47.6M - - - - p0/h...@s9 - 43.0M - - - - p0/h...@s10 - 64.1M - - - - p0/h...@s11 - 563M - - - - p0/h...@s12 - 76.6M - - - - The sum of the USED column is only some 3,6G, so the question is to what is the 86,7G of USEDSNAP allocated? Ghost snapshots? This is with zpool version 22. This zpool was used a year or so in onnv-129. I upgraded the host recently to build 151a but I didn't upgrade the pool yet. Any pointers are appreciated! Joost ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss ___ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss