[zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Frank Van Damme
Hello list,

I'm having trouble with a server holding a lot of data. After a few
months of uptime, it is currently rebooting from a lockup (reason
unknown so far) but it is taking hours to boot up again. The boot
process is stuck at the stage where it says:
mounting zfs filesystems (1/5)
the machine responds to pings and keystrokes. I can see disk activity;
the disk leds blink one after another.

The file system layout is: a 40 GB mirror for the syspool, and a raidz
volume over 4 2TB disks which I use for taking backups (=the purpose
of this machine). I have deduplication enabled on the backups pool
(which turned out to be pretty slow for file deletes since there are a
lot of files on the backups pool and I haven't installed an l2arc
yet). The main memory is 6 GB, it's an HP server running Nexenta core
platform (kernel version 134f).

I assume sooner or later the machine will boot up, but I'm in a bit of
a panic about how to solve this permanently - after all the last thing
I want is not being able to restore data one day because it takes days
to boot the machine.

Does anyone have an idea how much longer it may take and if the
problem may have anything to do with dedup?

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Wolfram Tomalla
Hi Frank,

you might face the problem of lots of snapshots of your filesystems.
For each snapshot a device is created during import of the pool. This can
easily lead to an extend startup time.
At my system it took about 15 minutes for 3500 snapshots.


2010/12/8 Frank Van Damme frank.vanda...@gmail.com

 Hello list,

 I'm having trouble with a server holding a lot of data. After a few
 months of uptime, it is currently rebooting from a lockup (reason
 unknown so far) but it is taking hours to boot up again. The boot
 process is stuck at the stage where it says:
 mounting zfs filesystems (1/5)
 the machine responds to pings and keystrokes. I can see disk activity;
 the disk leds blink one after another.

 The file system layout is: a 40 GB mirror for the syspool, and a raidz
 volume over 4 2TB disks which I use for taking backups (=the purpose
 of this machine). I have deduplication enabled on the backups pool
 (which turned out to be pretty slow for file deletes since there are a
 lot of files on the backups pool and I haven't installed an l2arc
 yet). The main memory is 6 GB, it's an HP server running Nexenta core
 platform (kernel version 134f).

 I assume sooner or later the machine will boot up, but I'm in a bit of
 a panic about how to solve this permanently - after all the last thing
 I want is not being able to restore data one day because it takes days
 to boot the machine.

 Does anyone have an idea how much longer it may take and if the
 problem may have anything to do with dedup?

 --
 Frank Van Damme
 No part of this copyright message may be reproduced, read or seen,
 dead or alive or by any means, including but not limited to telepathy
 without the benevolence of the author.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Fred Liu
Failed zil devices will also cause this...

Fred

From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Wolfram Tomalla
Sent: Wednesday, December 08, 2010 10:40 PM
To: Frank Van Damme
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

Hi Frank,

you might face the problem of lots of snapshots of your filesystems.
For each snapshot a device is created during import of the pool. This can 
easily lead to an extend startup time.
At my system it took about 15 minutes for 3500 snapshots.

2010/12/8 Frank Van Damme 
frank.vanda...@gmail.commailto:frank.vanda...@gmail.com
Hello list,

I'm having trouble with a server holding a lot of data. After a few
months of uptime, it is currently rebooting from a lockup (reason
unknown so far) but it is taking hours to boot up again. The boot
process is stuck at the stage where it says:
mounting zfs filesystems (1/5)
the machine responds to pings and keystrokes. I can see disk activity;
the disk leds blink one after another.

The file system layout is: a 40 GB mirror for the syspool, and a raidz
volume over 4 2TB disks which I use for taking backups (=the purpose
of this machine). I have deduplication enabled on the backups pool
(which turned out to be pretty slow for file deletes since there are a
lot of files on the backups pool and I haven't installed an l2arc
yet). The main memory is 6 GB, it's an HP server running Nexenta core
platform (kernel version 134f).

I assume sooner or later the machine will boot up, but I'm in a bit of
a panic about how to solve this permanently - after all the last thing
I want is not being able to restore data one day because it takes days
to boot the machine.

Does anyone have an idea how much longer it may take and if the
problem may have anything to do with dedup?

--
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] iops...

2010-12-08 Thread Roy Sigurd Karlsbakk
  I am totally aware of these differences, but it seems some people
  think RAIDz is nonsense unless you don't need speed at all. My
  testing shows (so far) that the speed is quite good, far better than
  single drives. Also, as Eric said, those speeds are for random i/o.
  I doubt there is very much out there that is truely random i/o
  except perhaps databases, but then, I would never use raid5/raidz
  for a DB unless at gunpoint.
 
 Well besides databases there are VM datastores, busy email servers,
 busy ldap servers, busy web servers, and I'm sure the list goes on and
 on.
 
 I'm sure it is much harder to list servers that are truly sequential
 in IO then random. This is especially true when you have thousands of
 users hitting it.

For busy web servers, I would guess most of the data can be cached, at least 
over time, and with good amounts of arc/l2arc, this should remove most of that 
penalty. A spooling server is another thing, for which I don't think raidz 
would be suitable, although with async i/o will streamline at least some of it. 
For VM datastores, I totally agree.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] 3TB HDD in ZFS

2010-12-08 Thread Brandon High
On Tue, Dec 7, 2010 at 11:37 PM, Eugen Leitl eu...@leitl.org wrote:
 What about Hitachi HDS723030ALA640 (aka Deskstar 7K3000, claimed
 24/7)?

The spec sheets claim 512b sectors, so hopefully it'll work.

There's a lot more info to support that at
http://www.hitachigst.com/internal-drives/desktop/deskstar/deskstar-7k3000
as well.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Best choice - file system for system

2010-12-08 Thread Albert

Hi,
I wonder what is the better option to install the system on solaris ufs 
and zfs sensitive data on whether this is the best all on zfs?

What are the pros and cons of such a solution?

f...@ll

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread taemun
Dedup? Taking a long time to boot after hard reboot after lookup?

I'll bet that it hard locked whilst deleting some files or a dataset that
was dedup'd. After the delete is started, it spends *ages* cleaning up the
DDT (the table containing a list of dedup'd blocks). If you hard lock in the
middle of this clean up, then the DDT isn't valid, to anything. The next
mount attempt on that pool will do this operation for you. Which will take
an inordinate amount of time. My pool spent *eight days* (iirc) in limbo,
waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload
of blocks and then everything was fine. This was for a zfs destroy of a
900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs.

Unfortunately, raidz is of course slower for random reads than a set or
mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat
of a workaround for this, although I've not seen comprehensive figures for
the gain it gives -
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] iops...

2010-12-08 Thread Ross Walker
On Dec 7, 2010, at 9:49 PM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

 From: Ross Walker [mailto:rswwal...@gmail.com]
 
 Well besides databases there are VM datastores, busy email servers, busy
 ldap servers, busy web servers, and I'm sure the list goes on and on.
 
 I'm sure it is much harder to list servers that are truly sequential in IO
 then
 random. This is especially true when you have thousands of users hitting
 it.
 
 Depends on the purpose of your server.  For example, I have a ZFS server
 whose sole purpose is to receive a backup data stream from another machine,
 and then write it to tape.  This is a highly sequential operation, and I use
 raidz.
 
 Some people have video streaming servers.  And http/ftp servers with large
 files.  And a fileserver which is the destination for laptop whole-disk
 backups.  And a repository that stores iso files and rpm's used for OS
 installs on other machines.  And data capture from lab equipment.  And
 packet sniffer / compliance email/data logger.
 
 and I'm sure the list goes on and on.  ;-)

Ok, single stream backup servers are one type, but as soon as you have multiple 
streams, even for large files, then IOPS trumps throughput to a degree, of 
course if throughput is very bad then that's no good either.

Know your workload is key, or have enough $$ to implement RAID10 everywhere.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best choice - file system for system

2010-12-08 Thread Bob Friesenhahn

On Wed, 8 Dec 2010, Albert wrote:

I wonder what is the better option to install the system on solaris ufs and 
zfs sensitive data on whether this is the best all on zfs?

What are the pros and cons of such a solution?


The best choice is usually to install with zfs root on a mirrored pair 
of disks.  UFS is going away as a boot option.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best choice - file system for system

2010-12-08 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bob Friesenhahn
 
 The best choice is usually to install with zfs root on a mirrored pair
 of disks.  UFS is going away as a boot option.

UFS is already unavailable as a boot option.  It's only still available if
you're using something old, such as solaris 10u9.  (Which is the latest
solaris.)   ;-)

Seriously though.  UFS is dead.  It has no advantage over ZFS that I'm aware
of.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-12-08 Thread Edward Ned Harvey
For anyone who cares:

I created an ESXi machine.  Installed two guest (centos) machines and
vmware-tools.  Connected them to each other via only a virtual switch.  Used
rsh to transfer large quantities of data between the two guests,
unencrypted, uncompressed.  Have found that ESXi virtual switch performance
peaks around 2.5Gbit.

Also, if you have a NFS datastore, which is not available at the time of ESX
bootup, then the NFS datastore doesn't come online, and there seems to be no
way of telling ESXi to make it come online later.  So you can't auto-boot
any guest, which is itself stored inside another guest.

So basically, if you want a layer of ZFS in between your ESX server and your
physical storage, then you have to have at least two separate servers.  And
if you want anything resembling actual disk speed, you need infiniband,
fibre channel, or 10G ethernet.  (Or some really slow disks.)   ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] [OpenIndiana-discuss] iops...

2010-12-08 Thread Edward Ned Harvey
 From: Edward Ned Harvey
 [mailto:opensolarisisdeadlongliveopensola...@nedharvey.com]
 
 In order to test random reads, you have to configure iozone to use a data set
 which is much larger than physical ram.  Since iozone will write a big file 
 and
 then immediately afterward, start reading it ...  It means that whole file 
 will
 be in cache unless that whole file is much larger than physical ram.  You'll 
 get
 false read results which are unnaturally high.
 
 For this reason, when I'm using an iozone benchmark, I remove as much ram
 from the system as possible.

Sorry.  There's a better way.  This is straight from the mouth of Don Capps, 
author of iozone:

If you use the -w option, then the test file will be left behind.

Then reboot, or umount and mount…

If you then use the read test, without the write test and again use 
-w, 
then you will achieve what you are describing.

Example:

iozone -i 0 -w -r $recsize -s $filesize

Umount, then remount

iozone -i 1 -w  -r $recsize -s $filesize


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Best choice - file system for system

2010-12-08 Thread Jerry Kemp
The only situation I can think of where UFS would be advantageous over 
ZFS might be in a low memory situation.  ZFS loves memory.


But to answer the original question, ZFS is where you want to be.

Jerry


On 12/08/10 20:56, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Bob Friesenhahn

The best choice is usually to install with zfs root on a mirrored pair
of disks.  UFS is going away as a boot option.


UFS is already unavailable as a boot option.  It's only still available if
you're using something old, such as solaris 10u9.  (Which is the latest
solaris.)   ;-)

Seriously though.  UFS is dead.  It has no advantage over ZFS that I'm aware
of.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Faster than 1G Ether... ESX to ZFS

2010-12-08 Thread Ross Walker
On Dec 8, 2010, at 11:41 PM, Edward Ned Harvey 
opensolarisisdeadlongliveopensola...@nedharvey.com wrote:

 For anyone who cares:
 
 I created an ESXi machine.  Installed two guest (centos) machines and
 vmware-tools.  Connected them to each other via only a virtual switch.  Used
 rsh to transfer large quantities of data between the two guests,
 unencrypted, uncompressed.  Have found that ESXi virtual switch performance
 peaks around 2.5Gbit.
 
 Also, if you have a NFS datastore, which is not available at the time of ESX
 bootup, then the NFS datastore doesn't come online, and there seems to be no
 way of telling ESXi to make it come online later.  So you can't auto-boot
 any guest, which is itself stored inside another guest.
 
 So basically, if you want a layer of ZFS in between your ESX server and your
 physical storage, then you have to have at least two separate servers.  And
 if you want anything resembling actual disk speed, you need infiniband,
 fibre channel, or 10G ethernet.  (Or some really slow disks.)   ;-)

Besides the chicken and egg scenario that Ed mentions there is also the CPU 
usage that running the storage virtualized. You might find that as you get more 
machines on the storage the performance will decrease a lot faster then it 
otherwise would if it were standalone as it competes with the very machines it 
is suppose to be serving.

-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] snaps lost in space?

2010-12-08 Thread Matthew Ahrens
usedsnap is the amount of space consumed by all snapshots.  Ie, the
amount of space that would be recovered if all snapshots were to be
deleted.

The space used by any one snapshot is the space that would be
recovered if that snapshot was deleted.  Ie, the amount of space that
is unique to that snapshot.  Any space usedbysnap that is shared by
multiple snapshots will not show up in any snapshot's used.
Therefore, deleting a snapshot can increase the adjacent snapshots'
used space.

So in general, usedbysnaps = sum(used by each snapshot).  You can
read more about the used property in the zfs(1m) manpage.

The bug mentioned below (6792701) is not related to this phenomenon,
it manifests as a discrepancy between the filesystem's (or a
snapshot's) referenced space and the amount of space accessible
through posix interfaces (eg, du(1)).

--matt

On Mon, Dec 6, 2010 at 4:09 AM, Joost Mulders joost...@gmail.com wrote:
 Hi,

 I've output of space allocation which I can't explain. I hope someone can
 point me at the right direction.

 The allocation of my home filesystem looks like this:

 jo...@onix$ zfs list -o space p0/home
 NAME     AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
 p0/home  31.0G   156G     86.7G   69.7G              0          0

 This tells me that *86,7G* is used by *snapshots* of this filesystem.
 However, when I look at the space allocation of the snapshots, I don't see
 the 86,7G back!

 jo...@onix$ zfs list -t snapshot -o space | egrep 'NAME|^p0\/home'
 NAME          AVAIL   USED  USEDSNAP  USEDDS  USEDREFRESERV  USEDCHILD
 p0/h...@s1    -       62.7M -         -       -              -
 p0/h...@s2    -       53.1M -         -       -              -
 p0/h...@s3    -       34.1M -         -       -              -
 p0/h...@s4    -        277M -         -       -              -
 p0/h...@s5    -       2.21G -         -       -              -
 p0/h...@s6    -        175M -         -       -              -
 p0/h...@s7    -       46.1M -         -       -              -
 p0/h...@s8    -       47.6M -         -       -              -
 p0/h...@s9    -       43.0M -         -       -              -
 p0/h...@s10   -       64.1M -         -       -              -
 p0/h...@s11   -        563M -         -       -              -
 p0/h...@s12   -       76.6M -         -       -              -

 The sum of the USED column is only some 3,6G, so the question is to what is
 the 86,7G of USEDSNAP allocated? Ghost snapshots?

 This is with zpool version 22. This zpool was used a year or so in onnv-129.
 I upgraded the host recently to build 151a but I didn't upgrade the pool
 yet.

 Any pointers are appreciated!

 Joost
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss