Re: [zfs-discuss] X4540 32GB SSD in x4500 as slog

Richard Elling Wed, 13 May 2009 21:26:21 -0700

Paul B. Henson wrote:

On Wed, 13 May 2009, Richard Elling wrote:

I didn't find that exact part number, but I notice that manufacturing part
   371-4196 32GB Solid State Drive, SATA Interface
is showing up in a number of systems.  IIRC, this would be an Intel X25-E.


Hmm, the part number I provided was off an official quote from our
authorized reseller, googling it comes up with one sun.com link:

http://www.sun.com/executives/iforce/mysun/docs/Support2a_ReleaseContentInfo.html

and a bunch of Japanese sites. List price was $1500, if it is actually an
OEM'd Intel X25-E that's quite a markup, street price on that has dropped
below $500.  If it's not, it sure would be nice to see some specs.

Generally, Sun doesn't qualify new devices with EOLed systems.


Understood, it just sucks to have bought a system on its deathbed without
prior knowledge thereof.


Since it costs real $$ to do such things, given the current state of
the economy, I don't think you'll find anyone in the computer business
not trying to sell new product.

Today, you can remove a cache device, but not a log device. You can
replace a log device.


I guess if we ended up going this way replacing the log device with a
standard hard drive in case of support issues would be the only way to go.
Those log device replacement also require the replacement device be of
equal or greater size?


Yes, standard mirror rules apply.  This is why I try to make it known
that you don't generally need much size for the log device. They are
solving a latency problem, not a space or bandwidth problem.

 If I wanted to swap between a 32GB SSD and a 1TB
SATA drive, I guess I would need to make a partition/slice on the TB drive
of exactly the size of the SSD?


Yes, but note that an SMI label hangs onto the outdated notion of
cylinders and you can't make a slice except on cylinder boundaries.

Before you start down this path, you should take a look at the workload
using zilstat, which will show you the kind of work the ZIL is doing. If
you don't see any ZIL activity, no need to worry about a separate log.
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat


Would a dramatic increase in performance when disabling the ZIL also be
sufficient evidence? Even with only me as the only person using our test
x4500 disabling the ZIL provides markedly better performance as originally
described for certain use cases.


Yes.  If the latency through the data path to write to the log was zero,
then it would perform the same as disabling the ZIL.

Usually, the log device does not need to be very big.  A good strategy
would be to create a small partition or slice, say 1 GByte, on an idle disk.


If the log device was too small, you potentially could end up bottlenecked
waiting for transactions to be committed to free up log device blocks?


zilstat can give you an idea of how much data is being written to
the log, so you can make that decision.  Of course you can always
grow the log, or add another.  But I think you will find that if a
txg commits in 30 seconds or less (less as it becomes more busy),
then the amount of data sent to the log will be substantially less
than 1 GByte per txg commit.  Once the txg commits, then the
log space is freed.

Intel claims > 3,300 4kByte random write iops.


Is that before after the device gets full and starts needing to erase whole
pages to write new blocks 8-/?


Buy two, if you add two log devices, then the data is striped
across them (add != attach)

My rule of thumb is to have a hot spare.  Having lots of hot
spares only makes a big difference for sites where you cannot
service the systems within a few days, such as remote locations.


Eh, they're just downstairs, and we have 7x24 gold on them. Plus I have 5,
each with 2 hot spares. I wouldn't have an issue trading a hot spare for a
log device other than potential issues with the log device failing if not
mirrored.

Yes, and this is what would happen in the case where the log device
completely failed while the pool was operational -- the ZIL will revert
to using the main pool.


But would then go belly up if the system ever rebooted? You said currently
you cannot remove a log device, if the pool reverts to an embedded log
upon slog failure, and continues to work after a reboot, you've effectively
removed the slog, other than I guess it might keep complaining and showing
a dead slog device.


In that case, the pool knows the log device is failed.

This is the case where the log device fails completely while
the pool is not operational.  Upon import, the pool will look
for an operational log device and will not find it.  This means
that any committed transactions that would have been in the
log device are not recoverable *and* the pool won't know
the extent of this missing information.


So is there simply no recovery available for such a pool? Presumably the
majority of the data in the pool would probably be fine.

Just as in the disabled ZIL case, the on-disk format is still correct.It is

client applications that may be inconsistent.  There may be a way to
recover the pool, Sun Service will have a more definitive stance.

OTOH, if you are paranoid and feel very strongly about CYA, then by all
means, mirror the log :-).


That all depends on the outcome in that rare as it might case where the log
device fails and the pool is inaccessible. If it's just a matter of some
manual intervention to reset the pool to a happy state and the potential
loss of any uncommitted transactions (which, according to the evil zfs
tuning guide don't result in a corrupted zfs filesystem, only in
potentially unhappy nfs clients), I could live with that. If all of the
data in the poll is trashed and must be restored from backup, that would be
problematic.


You are still much more likely to lose disks in the main pool.
Pedantically, ZFS does not limit the number of mirrors, so you
could do a 47-way mirror for the log device and use 1 disk for
the pool :-)
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] X4540 32GB SSD in x4500 as slog

Reply via email to