Paul B. Henson wrote:
I see Sun has recently released part number XRA-ST1CH-32G2SSD, a 32GB SATA
SSD for the x4540 server.

I didn't find that exact part number, but I notice that manufacturing part
  371-4196 32GB Solid State Drive, SATA Interface
is showing up in a number of systems.  IIRC, this would be an Intel X25-E.
(shock rated at 1,000 Gs @ 0.5ms, so it should still work if I fall off my horse ;-)

We have five x4500's we purchased last year that we are deploying to
provide file and web services to our users. One issue that we have had is
horrible performance for the "single threaded process creating lots of
small files over NFS" scenario. The bottleneck in that case is fairly
clear, and to verify it we temporarily disabled the ZIL on one of the
servers. Extraction time for a large tarball into an NFSv4 mounted
filesystem dropped from 20 minutes to 2 minutes.

Obviously, it is strongly recommended not to run with the ZIL disabled, and
we don't particularly want to do so in production. However, for some of our
users, performance is simply unacceptable for various usage cases
(including not only tar extracts, but other common software development
processes such as svn checkouts).

Yep.  Same sort of workload.

As such, we have been investigating the possibility of improving
performance via a slog, preferably on some type of NVRAM or SSD. We haven't
really found anything appropriate, and now we see Sun has officially
released something very possibly like what we have been looking for.

My sales rep tells me the drive is only qualified for use in an x4540.
However, as a standard SATA interface SSD there is theoretically no reason
why it would not work in an x4500, they even share the exact same drive
sleds. I was told Sun just didn't want to spend the time/effort to qualify
it for the older hardware (kind of sucks that servers we bought less than a
year ago are being abandoned). We are considering using them anyway, in the
worst case if Sun support complains that they are installed and refuses to
continue any diagnostic efforts, presumably we can simply swap them out for
standard hard drives. slog devices can be replaced like any other zfs vdev,
correct? Or alternatively, what is the state of removing a slog device and
reverting back to a pool embedded log?

Generally, Sun doesn't qualify new devices with EOLed systems.

Today, you can remove a cache device, but not a log device.  You can
replace a log device.

Before you start down this path, you should take a look at the workload
using zilstat, which will show you the kind of work the ZIL is doing.
If you don't see any ZIL activity, no need to worry about a separate log.
http://www.richardelling.com/Home/scripts-and-programs-1/zilstat

If you decide you need a log device... read on.
Usually, the log device does not need to be very big.  A good strategy
would be to create a small partition or slice, say 1 GByte, on an idle disk.
Add this as a log device to the pool.  If this device is a HDD, then you
might not see much of a performance boost. But now that you have a
log device setup, you can experiment with replacing the log device
with another.  You won't be able to remove the log device, but you
can relocate or grow it on the fly.

So, has anyone played with this new SSD in an x4500 and can comment on
whether or not they seemed to work okay? I can't imagine no one inside of
Sun, regardless of official support level, hasn't tried it :). Feel free to
post anonymously or reply off list if you don't want anything on the record
;).

>From reviewing the Sun hybrid storage documentation, it describes two
different flash devices, the "Logzilla", optimized for blindingly fast
writes and intended as a ZIL slog, and the "Cachezilla", optimized for fast
reads and intended for use as L2ARC. Is this one of those, or some other
device? If the latter, what are its technical read/write performance
characteristics?

Intel claims > 3,300 4kByte random write iops.  A really fast HDD
may reach 300 4kByte random write iops, but there are no really
fast SATA HDDs.
http://www.intel.com/design/flash/nand/extreme/index.htm

We currently have all 48 drives allocated, 23 mirror pairs and two hot
spares. Is there any timeline on the availability of removing an active
vdev from a pool, which would allow us to swap out a couple of devices
without destroying and having to rebuild our pool?

My rule of thumb is to have a hot spare.  Having lots of hot
spares only makes a big difference for sites where you cannot
service the systems within a few days, such as remote locations.
But you can remove a hot spare, so that could be a source of
your experimental 1 GByte log.

What is the current state of behavior in the face of slog failure?

It depends on both the failure and event tree...

Theoretically, if a dedicated slog device failed, the pool could simply
revert to logging embedded in the pool.

Yes, and this is what would happen in the case where the log
device completely failed while the pool was operational --
the ZIL will revert to using the main pool.

However, the last I heard slog
device failure rendered a pool completely unusable and inaccessible. If
that is still the case and not expected to be resolved anytime soon, we
would presumably need two of the devices to mirror?

This is the case where the log device fails completely while
the pool is not operational.  Upon import, the pool will look
for an operational log device and will not find it.  This means
that any committed transactions that would have been in the
log device are not recoverable *and* the pool won't know
the extent of this missing information.

We could build a model of such a system for an availability
or data retention analysis, but we would be hard pressed to
agree upon a probability that the events (system down and
log device fails) that would be interestingly large.  In large
part this is because the failure rate of SSDs is so much better
than the failure rate for HDDs.  In other words, the HDD
failure modes would dominate the analysis by a significant
margin and the SSD-failing-while-system-down case would
be way down in the noise.

OTOH, if you are paranoid and feel very strongly about CYA,
then by all means, mirror the log :-).

Thanks for any info you might be able to provide.

[editorial comment: it would be to Sun's benefit if Sun people
would respond to Sun product questions.  Harrrummppff.]
-- richard

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to