On Sat, 2009-08-08 at 15:05, Mike Gerdts wrote:
> On Sat, Aug 8, 2009 at 12:51 PM, Ed Spencer<ed_spen...@umanitoba.ca> wrote:
> >
> > On Sat, 2009-08-08 at 09:17, Bob Friesenhahn wrote:
> >> Many of us here already tested our own systems and found that under
> >> some conditions ZFS was offering up only 30MB/second for bulk data
> >> reads regardless of how exotic our storage pool and hardware was.
> >
> > Just so we are using the same units of measurements. Backup/copy
> > throughput on our development mail server is 8.5MB/sec. The people
> > running our backups would be over joyed with that performance.
> >
> > However backup/copy throughput on our production mail server is 2.25
> > MB/sec.
> >
> > The underlying disk is 15000 RPM 146GB FC drives.
> > Our performance may be hampered somewhat because the luns are on a
> > Network Appliance accessed via iSCSI, but not to the extent that we are
> > seeing, and it does not account for the throughput difference in the
> > development and production pools.
>
> NetApp filers run WAFL - Write Anywhere File Layout. Even if ZFS
> arranged everything perfrectly (however that is defined) WAFL would
> undo its hard work.
>
> Since you are using iSCSI, I assume that you have disabled the Nagle
> algorithm and increased tcp_xmit_hiwat and tcp_recv_hiwat. If not,
> go do that now.
We've tried many different iscsi parameter changes on our development server:
Jumbo Frames
Disabling the Nagle
I'll double check next week on tcp_xmit_hiwat and tcp_recv_hiwat.
Nothing has made any real difference.
We are only using about 5% of the bandwidth on our IPSan.
We use two cisco ethernet switches on the IPSAN. The iscsi initiators
use MPXIO in a round robin configuration.
> > When I talk about fragmentation its not in the normal sense. I'm not
> > talking about blocks in a file not being sequential. I'm talking about
> > files in a single directory that end up spread across the entire
> > filesytem/pool.
>
> It's tempting to think that if the files were in roughly the same area
> of the block device that ZFS sees that reading the files sequentially
> would at least trigger a read-ahead at the filer. I suspect that even
> a moderate amount of file creation and deletion would cause the I/O
> pattern to be random enough (not purely sequential) that the back-end
> storage would not have a reasonable chance of recognizing it as a good
> time for read-ahead. Further, since the backup application is
> probably in a loop of:
>
> while there are more files in the directory
> if next file mtime > last backup time
> open file
> read file contents, send to backup stream
> close file
> end if
> end while
>
> In other words, other I/O operations are interspersed between the
> sequential data reads, some files are likely to be skipped, and there
> is latency introduced by writing to the data stream. I would be
> surprised to see any file system do intelligent read-ahead here. In
> other words, lots of small file operations make backups and especially
> restores go slowly. More backup and restore streams will almost
> certainly help. Multiplex the streams so that you can keep your tapes
> moving at a constant speed.
We backup to disk first and then put to tape later.
> Do you have statistics on network utilization to ensure that you
> aren't stressing it?
>
> Have you looked at iostat data to be sure that you are seeing asvc_t +
> wsvc_t that supports the number of operations that you need to
> perform? That is if asvc_t + wsvc_t for a device adds up to 10 ms, a
> workload that waits for the completion of one I/O before issuing the
> next will max out at 100 iops. Presumably ZFS should hide some of
> this from you[1], but it does suggest that each backup stream would be
> limited to about 100 files per second[2]. This is because the read
> request for one file does not happen before the close of the previous
> file[3]. Since cyrus stores each message as a separate file, this
> suggests that 2.5 MB/s corresponds to average mail message size of 25
> KB.
>
> 1. via metadata caching, read-ahead on file data reads, etc.
> 2. Assuming wsvc_t + asvc_t = 10 ms
> 3. Assuming that networker is about as smart as tar, zip, cpio, etc.
There is a backup of a single filesystem in the pool going on right now:
# zpool iostat 5 5
capacity operations bandwidth
pool used avail read write read write
---------- ----- ----- ----- ----- ----- -----
space 1.05T 965G 97 69 5.24M 2.71M
space 1.05T 965G 113 10 6.41M 996K
space 1.05T 965G 100 112 2.87M 1.81M
space 1.05T 965G 112 8 2.35M 35.9K
space 1.05T 965G 106 3 1.76M 55.1K
Here are examples :
iostat -xpn 5 5
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
17.1 29.2 746.7 317.1 0.0 0.6 0.0 12.5 0 27
c4t60A98000433469764E4A2D456A644A74d0
25.0 11.9 991.9 277.0 0.0 0.6 0.0 16.1 0 36
c4t60A98000433469764E4A2D456A696579d0
14.9 17.9 423.0 406.4 0.0 0.3 0.0 10.2 0 21
c4t60A98000433469764E4A476D2F664E4Fd0
20.8 17.4 588.9 361.2 0.0 0.4 0.0 11.5 0 30
c4t60A98000433469764E4A476D2F6B385Ad0
and:
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
11.9 43.0 528.9 1972.8 0.0 2.1 0.0 38.9 0 31
c4t60A98000433469764E4A2D456A644A74d0
17.0 19.6 496.9 1499.0 0.0 1.4 0.0 38.8 0 39
c4t60A98000433469764E4A2D456A696579d0
14.0 30.0 670.2 1971.3 0.0 1.7 0.0 38.0 0 34
c4t60A98000433469764E4A476D2F664E4Fd0
19.7 28.7 985.2 1647.6 0.0 1.6 0.0 32.5 0 37
c4t60A98000433469764E4A476D2F6B385Ad0
and:
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
22.7 41.3 973.7 423.5 0.0 0.8 0.0 11.8 0 34
c4t60A98000433469764E4A2D456A644A74d0
27.9 20.0 1474.7 344.0 0.0 0.8 0.0 16.7 0 42
c4t60A98000433469764E4A2D456A696579d0
15.1 17.9 1318.7 463.7 0.0 0.6 0.0 17.7 0 19
c4t60A98000433469764E4A476D2F664E4Fd0
22.3 19.5 1801.7 406.7 0.0 0.8 0.0 20.0 0 29
c4t60A98000433469764E4A476D2F6B385Ad0
> > My problem right now is diagnosing the performance issues. I can't
> > address them without understanding the underlying cause. There is a
> > lack of tools to help in this area. There is also a lack of acceptance
> > that I'm actually having a problem with zfs. Its frustrating.
>
> This is a prime example of why Sun needs to sell Analytics[4][5] as an
> add-on to Solaris in general. This problem is just as hard to figure
> out on Solaris as it is on Linux, Windows, etc. If Analytics were
> bundled with Gold and above support contracts, it would be a very
> compelling reason to shell out a few extra bucks for better support
> contract.
>
> 4. http://blogs.sun.com/bmc/resource/cec_analytics.pdf
> 5. http://blogs.sun.com/brendan/category/Fishworks
>
Oh definitely!
It will also give me the oppurtunity to yell at my drives!
Might help to relieve some stress.
http://sunbeltblog.blogspot.com/2009/01/yelling-at-your-hard-drive.html
> > Anyone know how significantly increase the performance of a zfs
> > filesystem without causing any downtime to an Enterprise email system
> > used by 30,000 intolerant people, when you don't really know what is
> > causing the performance issues in the first place? (Yeah, it sucks to be
> > me!)
>
> Hopefully I've helped find a couple places to look...
Thanx
--
Ed
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss