Hey all -

Just saw something really weird.

I have been playing with by box for a little while now, and just noticed 
something whilst checking how fast / slow my IDE ports were on a newish 
motherboard...

I had been copying around an image. Not a particularly large one - 500M 
ISO...

I had been observing the read speed off disk, and write speed to disk.

When reading from one disk and writing to another, I was seeing about 
60MB/s and all was as expected.

But, then, I thought I'd do one more run, and copied the *same* image as 
my last run...

Of course, the image was in memory, so I expected there would be no 
reads and lots of writes. What I saw was lots of not very impressive 
speed (cmdk1 is the target):

                  extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk1     0.0   35.3    0.0 4514.1 32.4  2.0  975.3 100 100

                  extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk1     0.0   36.7    0.0 4697.6 32.4  2.0  936.8 100 100

                  extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0     0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk1     0.0   36.8    0.0 4650.6 32.4  2.0  935.1 100 100

                  extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk0     0.0    0.0    0.0    0.0  0.1  0.0    0.0   0   0
cmdk1     0.0   37.5    0.0 4424.9 32.3  2.0  913.8 100 100

So my target disk, which is owned exclusively by ZFS, was apparently 
flat out writing at 4.4MB/s.

At the time it goes bad, the svc_t jumps from about 125ms to 950ms.

Ouch!!

On closer inspection, I see that -
  - The cp returns almost immediately. (somewhat expected)
  - ZFS starts writing at about 60MB/s, but only for about 2 seconds
    (This is changable. Sometimes, it writes the whole image at the 
slower rate.)
  - the write rate drops back to 4 - 5MB/s
  - CPU usage is only 8%
  - I still have 1.5GB of 4GB free memory (Though I *am* running Xen at 
the moment. Not sure if that matters)
  - If I kick off a second copy to a different filename whilst the first 
is running it does not get any faster.
  - If I kick off a write to a raw zvol on the same pool, the write rate 
to the disk jumps back up to the expected 60MB/s, but drops again as 
soon as it's completed the write to the raw zvol... So, it seems it's 
not the disk itself.
  - The zpool *has* been exported and imported this boot. Not sure if 
that matters either.
  - I had a hunch that memory availability might be playing a part, so I 
forced a whole heap to be freed up with a honking big malloc and walk of 
the pages. I freed up 3GB (box has 4GB total) and it seems that I start 
to see the problem much more frequently as I get to about 1.5GB free.
  - It's not entirely predictable. Sometimes, it'll write at 50-60MB/s 
for up to 8 or so seconds, and others, it'll only write fast for a burst 
right at the start, then take quite some time to write out the rest.

It's almost as if we are being throttled on the rate at which we can 
push data through the ZFS in-memory cache when writing previously read 
and written data. Or something equally bogus like me expecting that ZFS 
would write as fast as it can all the time, which I guess might be an 
invalid assumption?

Now: This is running NV_65 with the Xen bits from back then. Not sure if 
that really matters.

Does not seem that the disk is having problem -
beaker:/disk2/crap # zpool status
   pool: disk2
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         disk2       ONLINE       0     0     0
           c3d0      ONLINE       0     0     0

errors: No known data errors

   pool: zfs
  state: ONLINE
  scrub: none requested
config:

         NAME        STATE     READ WRITE CKSUM
         zfs         ONLINE       0     0     0
           c1d0s7    ONLINE       0     0     0

errors: No known data errors

c1d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3320620AS     Revision:  Serial No:             6QF Size: 
320.07GB <320070352896 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c3d0             Soft Errors: 0 Hard Errors: 0 Transport Errors: 0
Model: ST3320620AS     Revision:  Serial No:             6QF Size: 
320.07GB <320070352896 bytes>
Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0
Illegal Request: 0
c0t1d0           Soft Errors: 4 Hard Errors: 302 Transport Errors: 0

If anyone within SWAN on the ZFS team wanted to take a look at this box 
and see if it's a new bug or just me being a bonehead and not 
understanding what I'm seeing, please respond to me directly, and I can 
provide access. (I'll make an effort not to reboot the box just in case 
it's only this boot that sees the problems.)

Nathan. :)


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to