Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-09 Thread Frank Van Damme
2010/12/8 taemun tae...@gmail.com:
 Dedup? Taking a long time to boot after hard reboot after lookup?

 I'll bet that it hard locked whilst deleting some files or a dataset that
 was dedup'd. After the delete is started, it spends *ages* cleaning up the
 DDT (the table containing a list of dedup'd blocks). If you hard lock in the
 middle of this clean up, then the DDT isn't valid, to anything. The next
 mount attempt on that pool will do this operation for you. Which will take
 an inordinate amount of time. My pool spent eight days (iirc) in limbo,
 waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload
 of blocks and then everything was fine. This was for a zfs destroy of a
 900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs.

Eight days is just... scary.
Ok so basically it seems you can't have all the advantages of zfs at
once. No more fsck, but if you have a deduplicated pool the kernel
will still consider it as unclean if you have a crash or unclean
shutdown?

I am indeed nearly continously deleting older files because each day a
mass of files gets written to the machine (and backups rotated). Is it
in some way possible to do the cleanup in smaller increments so the
amount of cleanup work to do when you (hard)reboot is smaller?

 Unfortunately, raidz is of course slower for random reads than a set or
 mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat
 of a workaround for this, although I've not seen comprehensive figures for
 the gain it gives
 - http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913



-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-09 Thread Frank Van Damme
2010/12/8  gon...@comcast.net:
 To explain further the  slow delete problem:

 It is absolutely critical for zfs to manage the incoming data rate.
 This is done reasonably well for write transactions.

 Delete transactions, prior to dedup, were very light-weight, nearly free,
 so these are not managed.

 Because of dedup,  deletes become rather expensive, because they introduce a
 substantial seek penalty. Mostly because the need to update the dedupe
 meta data (reference counts and such)

 The mechanism of the problem:
 1) Too many delete transactions are accepted into the
 open transaction group.

 2) When this txg comes up to be synced to disk, the sync takes a very long
 time.
 ( instead of a healthy 1-2 seconds, minutes, hours or days)

Ok, had to look that one up, but the fog starts clearing up.
I reckon in zfs land, a command like sync has no effect at all?

 3) Because the open txg can not be closed while the sync of a previous txg
 is in progress, eventually we run out of buffer space in the open txg, and all
 input is severely throttled.

 4) Because of (3) other bad things happen, like the arc tries to shrink,
 memory shortage, making things worse.

Yes... I see... speaking of which: the arc size on my system would be
1685483656 bytes - that's 1.6 GB in a system with 6 GB, with 3942 MB
allocated to the kernel (dixit mdb's ::memstat module). So can i
assume that the better part of the rest is allocated in buffers that
needlessly fill up over time? I'd much rather have the memory used for
ARC :)

 5) Because delete-s persist across reboots, you are unable to mount your
 pool

 Once solution is booting into maintenance mode, and renaming the zfs cache
 file (look in /etc/zfs, I forget the name at the moment)
 You can then boot up and import your pool. The import will take a long time
 but meanwhile you are up and can do other things.
 At that point you have the option of getting rid of the pool and starting
 over
 ( possibly installing a better kernel and starting over)..
 After update, and import, update your pool to the current pool version
 and life will be much better.

By now, the system booted up. It has taken quit a few hours though.
This system is actually running Nexenta but I'll see if I can upgrade
the kernel.

 I hope this helps, good luck

It clarified a few things. Thank you very much. There are one or two
things I still have to change on this system it seems...

 In addition, there was virtual memory related bug (allocating one of the zfs
 memory caches with the wrong object size) that would cause other
 components to hang, waiting for memory allocations.

 This was so bad in earlier kernels that systems would become unresponsive
 for
 a potentially very long time ( a phenomenon known as bricking).

 As I recall a lot fo fixes came in in the 140 series kernels to fix this.

 Anything 145 and above should be OK.

I'm on 134f. No wonder.

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Frank Van Damme
Hello list,

I'm having trouble with a server holding a lot of data. After a few
months of uptime, it is currently rebooting from a lockup (reason
unknown so far) but it is taking hours to boot up again. The boot
process is stuck at the stage where it says:
mounting zfs filesystems (1/5)
the machine responds to pings and keystrokes. I can see disk activity;
the disk leds blink one after another.

The file system layout is: a 40 GB mirror for the syspool, and a raidz
volume over 4 2TB disks which I use for taking backups (=the purpose
of this machine). I have deduplication enabled on the backups pool
(which turned out to be pretty slow for file deletes since there are a
lot of files on the backups pool and I haven't installed an l2arc
yet). The main memory is 6 GB, it's an HP server running Nexenta core
platform (kernel version 134f).

I assume sooner or later the machine will boot up, but I'm in a bit of
a panic about how to solve this permanently - after all the last thing
I want is not being able to restore data one day because it takes days
to boot the machine.

Does anyone have an idea how much longer it may take and if the
problem may have anything to do with dedup?

-- 
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Wolfram Tomalla
Hi Frank,

you might face the problem of lots of snapshots of your filesystems.
For each snapshot a device is created during import of the pool. This can
easily lead to an extend startup time.
At my system it took about 15 minutes for 3500 snapshots.


2010/12/8 Frank Van Damme frank.vanda...@gmail.com

 Hello list,

 I'm having trouble with a server holding a lot of data. After a few
 months of uptime, it is currently rebooting from a lockup (reason
 unknown so far) but it is taking hours to boot up again. The boot
 process is stuck at the stage where it says:
 mounting zfs filesystems (1/5)
 the machine responds to pings and keystrokes. I can see disk activity;
 the disk leds blink one after another.

 The file system layout is: a 40 GB mirror for the syspool, and a raidz
 volume over 4 2TB disks which I use for taking backups (=the purpose
 of this machine). I have deduplication enabled on the backups pool
 (which turned out to be pretty slow for file deletes since there are a
 lot of files on the backups pool and I haven't installed an l2arc
 yet). The main memory is 6 GB, it's an HP server running Nexenta core
 platform (kernel version 134f).

 I assume sooner or later the machine will boot up, but I'm in a bit of
 a panic about how to solve this permanently - after all the last thing
 I want is not being able to restore data one day because it takes days
 to boot the machine.

 Does anyone have an idea how much longer it may take and if the
 problem may have anything to do with dedup?

 --
 Frank Van Damme
 No part of this copyright message may be reproduced, read or seen,
 dead or alive or by any means, including but not limited to telepathy
 without the benevolence of the author.
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread Fred Liu
Failed zil devices will also cause this...

Fred

From: zfs-discuss-boun...@opensolaris.org 
[mailto:zfs-discuss-boun...@opensolaris.org] On Behalf Of Wolfram Tomalla
Sent: Wednesday, December 08, 2010 10:40 PM
To: Frank Van Damme
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

Hi Frank,

you might face the problem of lots of snapshots of your filesystems.
For each snapshot a device is created during import of the pool. This can 
easily lead to an extend startup time.
At my system it took about 15 minutes for 3500 snapshots.

2010/12/8 Frank Van Damme 
frank.vanda...@gmail.commailto:frank.vanda...@gmail.com
Hello list,

I'm having trouble with a server holding a lot of data. After a few
months of uptime, it is currently rebooting from a lockup (reason
unknown so far) but it is taking hours to boot up again. The boot
process is stuck at the stage where it says:
mounting zfs filesystems (1/5)
the machine responds to pings and keystrokes. I can see disk activity;
the disk leds blink one after another.

The file system layout is: a 40 GB mirror for the syspool, and a raidz
volume over 4 2TB disks which I use for taking backups (=the purpose
of this machine). I have deduplication enabled on the backups pool
(which turned out to be pretty slow for file deletes since there are a
lot of files on the backups pool and I haven't installed an l2arc
yet). The main memory is 6 GB, it's an HP server running Nexenta core
platform (kernel version 134f).

I assume sooner or later the machine will boot up, but I'm in a bit of
a panic about how to solve this permanently - after all the last thing
I want is not being able to restore data one day because it takes days
to boot the machine.

Does anyone have an idea how much longer it may take and if the
problem may have anything to do with dedup?

--
Frank Van Damme
No part of this copyright message may be reproduced, read or seen,
dead or alive or by any means, including but not limited to telepathy
without the benevolence of the author.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.orgmailto:zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] very slow boot: stuck at mounting zfs filesystems

2010-12-08 Thread taemun
Dedup? Taking a long time to boot after hard reboot after lookup?

I'll bet that it hard locked whilst deleting some files or a dataset that
was dedup'd. After the delete is started, it spends *ages* cleaning up the
DDT (the table containing a list of dedup'd blocks). If you hard lock in the
middle of this clean up, then the DDT isn't valid, to anything. The next
mount attempt on that pool will do this operation for you. Which will take
an inordinate amount of time. My pool spent *eight days* (iirc) in limbo,
waiting for the DDT cleanup to finish. Once it did, it wrote out a shedload
of blocks and then everything was fine. This was for a zfs destroy of a
900GB, 64KiB block dataset, over 2x 8-wide raidz vdevs.

Unfortunately, raidz is of course slower for random reads than a set or
mirrors. The raidz/mirror hybrid allocator available in snv_148+ is somewhat
of a workaround for this, although I've not seen comprehensive figures for
the gain it gives -
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6977913
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss