Re: [zfs-discuss] periodic slow responsiveness

James Lever Thu, 24 Sep 2009 20:30:45 -0700


On 25/09/2009, at 11:49 AM, Bob Friesenhahn wrote:

The commentary says that normally the COMMIT operations occur duringclose(2) or fsync(2) system call, or when encountering memorypressure. If the problem is slow copying of many small files, thisCOMMIT approach does not help very much since very little data issent per file and most time is spent creating directories and files.

The problem appears to be slog bandwidth exhaustion due to all databeing sent via the slog creating a contention for all following NFS orlocally synchronous writes. The NFS writes do not appear to besynchronous in nature - there is only a COMMIT being issued at thevery end, however, all of that data appears to be going via the slogand it appears to be inflating to twice its original size.

For a test, I just copied a relatively small file (8.4MB in size).Looking at a tcpdump analysis using wireshark, there is a SETATTRwhich ends with a V3 COMMIT and no COMMIT messages during the transfer.


iostat output that matches looks like this:

slog write of the data (17MB appears to hit the slog)

Friday, 25 September 2009  1:01:00 PM EST

extended device statistics ----errors ---r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/wtrn tot device0.0 135.0 0.0 17154.5 0.0 0.8 0.0 6.0 0 3 00 0 0 c7t2d0

then a few seconds later, the transaction group gets flushed toprimary storage writing nearly 11.4MB which is inline with raid Z2(expect around 10.5MB; 8.4/8*10):


Friday, 25 September 2009  1:01:13 PM EST

extended device statistics ----errors ---r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/wtrn tot device0.0 91.0 0.0 1170.4 0.0 0.1 0.0 1.3 0 2 00 0 0 c11t0d00.0 84.0 0.0 1171.4 0.0 0.1 0.0 1.2 0 2 00 0 0 c11t1d00.0 92.0 0.0 1172.4 0.0 0.1 0.0 1.2 0 2 00 0 0 c11t2d00.0 84.0 0.0 1172.4 0.0 0.1 0.0 1.3 0 2 00 0 0 c11t3d00.0 81.0 0.0 1176.4 0.0 0.1 0.0 1.4 0 2 00 0 0 c11t4d00.0 86.0 0.0 1176.4 0.0 0.1 0.0 1.4 0 2 00 0 0 c11t5d00.0 89.0 0.0 1175.4 0.0 0.1 0.0 1.4 0 2 00 0 0 c11t6d00.0 84.0 0.0 1175.4 0.0 0.1 0.0 1.3 0 2 00 0 0 c11t7d00.0 91.0 0.0 1168.9 0.0 0.1 0.0 1.3 0 2 00 0 0 c11t8d00.0 89.0 0.0 1170.9 0.0 0.1 0.0 1.4 0 2 00 0 0 c11t9d0

So I performed the same test with a much larger file (533MB) to seewhat it would do, being larger than the NVRAM cache in front of theSSD. Note that after the second second of activity the NVRAM is fulland only allowing in about the sequential write speed of the SSD(~70MB/s).


Friday, 25 September 2009  1:13:14 PM EST

extended device statistics ----errors ---r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/wtrn tot device0.0 640.9 0.0 81782.9 0.0 4.2 0.0 6.5 1 14 00 0 0 c7t2d00.0 1065.7 0.0 136408.1 0.0 18.6 0.0 17.5 1 78 00 0 0 c7t2d00.0 579.0 0.0 74113.3 0.0 30.7 0.0 53.1 1 100 00 0 0 c7t2d00.0 588.7 0.0 75357.0 0.0 33.2 0.0 56.3 1 100 00 0 0 c7t2d00.0 532.0 0.0 68096.3 0.0 31.5 0.0 59.1 1 100 00 0 0 c7t2d00.0 559.0 0.0 71428.0 0.0 32.5 0.0 58.1 1 100 00 0 0 c7t2d00.0 542.0 0.0 68755.9 0.0 25.1 0.0 46.4 1 100 00 0 0 c7t2d00.0 542.0 0.0 69376.4 0.0 35.0 0.0 64.6 1 100 00 0 0 c7t2d00.0 581.0 0.0 74368.0 0.0 30.6 0.0 52.6 1 100 00 0 0 c7t2d00.0 567.0 0.0 72574.1 0.0 33.2 0.0 58.6 1 100 00 0 0 c7t2d00.0 564.0 0.0 72194.1 0.0 31.1 0.0 55.2 1 100 00 0 0 c7t2d00.0 573.0 0.0 73343.5 0.0 33.2 0.0 57.9 1 100 00 0 0 c7t2d00.0 536.3 0.0 68640.5 0.0 33.1 0.0 61.7 1 100 00 0 0 c7t2d00.0 121.9 0.0 15608.9 0.0 2.7 0.0 22.1 0 22 00 0 0 c7t2d0

Again, the slog wrote about double the file size (1022.6MB) and a fewseconds later, the data was pushed to the primary storage (684.9MBwith an expectation of 666MB = 533MB/8*10) so again about the rightnumber hit the spinning platters.


Friday, 25 September 2009  1:13:43 PM EST

extended device statistics ----errors ---r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/wtrn tot device0.0 338.3 0.0 32794.4 0.0 13.7 0.0 40.6 1 47 00 0 0 c11t0d00.0 325.3 0.0 31399.8 0.0 13.7 0.0 42.0 1 47 00 0 0 c11t1d00.0 339.3 0.0 33273.3 0.0 13.7 0.0 40.3 1 47 00 0 0 c11t2d00.0 332.3 0.0 32009.0 0.0 13.7 0.0 41.4 0 47 00 0 0 c11t3d00.0 352.3 0.0 34364.0 0.0 13.7 0.0 39.0 1 47 00 0 0 c11t4d00.0 355.2 0.0 33788.7 0.0 13.7 0.0 38.6 1 47 00 0 0 c11t5d00.0 352.3 0.0 33452.3 0.0 13.8 0.0 39.3 1 47 00 0 0 c11t6d00.0 339.3 0.0 32873.5 0.0 13.7 0.0 40.4 1 47 00 0 0 c11t7d00.0 337.3 0.0 32889.0 0.0 13.5 0.0 40.0 1 47 00 0 0 c11t8d00.0 336.3 0.0 32441.9 0.0 13.7 0.0 40.9 1 47 00 0 0 c11t9d0


Friday, 25 September 2009  1:13:44 PM EST

extended device statistics ----errors ---r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b s/w h/wtrn tot device0.0 349.7 0.0 35677.0 0.0 16.1 0.0 45.9 0 48 00 0 0 c11t0d00.0 367.7 0.0 37078.3 0.0 16.1 0.0 43.8 0 49 00 0 0 c11t1d00.0 348.7 0.0 35197.1 0.0 16.3 0.0 46.9 0 49 00 0 0 c11t2d00.0 360.7 0.0 36467.7 0.0 15.9 0.0 44.1 0 48 00 0 0 c11t3d00.0 342.7 0.0 34103.9 0.0 16.2 0.0 47.2 0 48 00 0 0 c11t4d00.0 347.7 0.0 34682.1 0.0 16.0 0.0 46.0 0 48 00 0 0 c11t5d00.0 349.7 0.0 35018.3 0.0 16.3 0.0 46.7 0 49 00 0 0 c11t6d00.0 353.7 0.0 35600.5 0.0 16.1 0.0 45.6 0 49 00 0 0 c11t7d00.0 350.7 0.0 35580.5 0.0 16.2 0.0 46.1 0 49 00 0 0 c11t8d00.0 356.7 0.0 36031.0 0.0 15.9 0.0 44.4 0 48 00 0 0 c11t9d0

Can anybody explain what is going on with the slog device in that alldata is being shunted via it and why about double the data size isbeing written to it per transaction?


cheers,
James

_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] periodic slow responsiveness

Reply via email to