Re: [zfs-discuss] NFS asynchronous writes being written to ZIL

2012-06-14 Thread Phil Harman
On 14 Jun 2012, at 23:15, Timothy Coalson tsc...@mst.edu wrote: The client is using async writes, that include commits. Sync writes do not need commits. Are you saying nfs commit operations sent by the client aren't always reported by that script? They are not reported in your case because

Re: [zfs-discuss] webserver zfs root lock contention under heavy load

2012-03-27 Thread Phil Harman
One of the glories of Solaris is that it is so very observable. Then there are the many excellent blog posts, wiki entries, and books - some or which are authored by contributors to this very thread - explaining how Solaris works. But these virtues are also a snare to some, and it is not

Re: [zfs-discuss] L2ARC and poor read performance

2011-06-08 Thread Phil Harman
On 08/06/2011 14:35, Marty Scholes wrote: Are some of the reads sequential? Sequential reads don't go to L2ARC. That'll be it. I assume the L2ARC is just taking metadata. In situations such as mine, I would quite like the option of routing sequential read data to the L2ARC also. The good

[zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman
Ok here's the thing ... A customer has some big tier 1 storage, and has presented 24 LUNs (from four RAID6 groups) to an OI148 box which is acting as a kind of iSCSI/FC bridge (using some of the cool features of ZFS along the way). The OI box currently has 32GB configured for the ARC, and 4x

Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman
On 07/06/2011 20:34, Marty Scholes wrote: I'll throw out some (possibly bad) ideas. Thanks for taking the time. Is ARC satisfying the caching needs? 32 GB for ARC should almost cover the 40GB of total reads, suggesting that the L2ARC doesn't add any value for this test. Are the SSD

Re: [zfs-discuss] L2ARC and poor read performance

2011-06-07 Thread Phil Harman
On 07/06/2011 22:57, LaoTsao wrote: You have un balance setup Fc 4gbps vs 10gbps nic It's actually 2x 4Gbps (using MPXIO) vs 1x 10Gbps. After 10b/8b encoding it is even worse, but this not yet impact your benchmark yet Sent from my iPad Hung-Sheng Tsao ( LaoTsao) Ph.D On Jun 7, 2011, at

Re: [zfs-discuss] Surprise Thread Preemptions

2011-01-18 Thread Phil Harman
Big subject! You haven't said what your 32 threads are doing, or how you gave them the same priority, or what scheduler class they are running in. However, you only have 24 VCPUs, and (I assume) 32 active threads, so Solaris will try to share resources evenly, and yes, it will preempt one

Re: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010-12-24 Thread Phil Harman
On 24/12/2010 18:21, Richard Elling wrote: Latency is what matters most. While there is a loose relationship between IOPS and latency, you really want low latency. For 15krpm drives, the average latency is 2ms for zero seeks. A decent SSD will beat that by an order of magnitude. And the

Re: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010-12-23 Thread Phil Harman
Great question. In good enough computing, beauty is in the eye of the beholder. My home NAS appliance uses IDE and SATA drives withoutba dedicated ZIL http://dtrace.org/blogs/ahl/2010/11/15/zil-analysis-from-chris-george/ if HDDs and commodity SSDs continue to be target ZIL devices, ZFS could

Re: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010-12-23 Thread Phil Harman
Sent from my iPhone (which had a lousy user interface which makes it all too easy for a clumsy oaf like me to touch Send before I'm done)... On 23 Dec 2010, at 11:07, Phil Harman phil.har...@gmail.com wrote: Great question. In good enough computing, beauty is in the eye of the beholder. My

Re: [zfs-discuss] SAS/short stroking vs. SSDs for ZIL

2010-12-23 Thread Phil Harman
On 23 Dec 2010, at 11:53, Stephan Budach stephan.bud...@jvm.de wrote: Am 23.12.10 12:18, schrieb Phil Harman: Sent from my iPhone (which had a lousy user interface which makes it all too easy for a clumsy oaf like me to touch Send before I'm done)... On 23 Dec 2010, at 11:07, Phil Harman

Re: [zfs-discuss] relationship between ARC and page cache

2010-12-22 Thread Phil Harman
On 21/12/2010 21:53, Jeff Bacon wrote: So, to Phil's email - read()/write() on a ZFS-backed vnode somehow completely bypass the page cache and depend only on the ARC? How the heck does that happen - I thought all files were represented as vm objects? For most other filesystems (and

Re: [zfs-discuss] A few questions

2010-12-21 Thread Phil Harman
On 21/12/2010 05:44, Richard Elling wrote: On Dec 20, 2010, at 7:31 AM, Phil Harman phil.har...@gmail.com mailto:phil.har...@gmail.com wrote: On 20/12/2010 13:59, Richard Elling wrote: On Dec 20, 2010, at 2:42 AM, Phil Harman phil.har...@gmail.com mailto:phil.har...@gmail.com wrote: Why does

Re: [zfs-discuss] A few questions

2010-12-21 Thread Phil Harman
On 21/12/2010 13:05, Deano wrote: On Dec 20, 2010, at 7:31 AM, Phil Harman phil.har...@gmail.com mailto:phil.har...@gmail.com wrote: If you only have a few slow drives, you don't have performance. Like trying to win the Indianapolis 500 with a tricycle... Actually, I didn't say

Re: [zfs-discuss] relationship between ARC and page cache

2010-12-21 Thread Phil Harman
Hi Jeff, ZFS support for mmap() was something of an afterthought. The current Solaris virtual memory infrastructure didn't have the features or performance required, which is why ZFS ended up with the ARC. Yes, you've got it. When we mmap() a ZFS file, there are two main caches involved:

Re: [zfs-discuss] A few questions

2010-12-20 Thread Phil Harman
Why does resilvering take so long in raidz anyway? Because it's broken. There were some changes a while back that made it more broken. There has been a lot of discussion, anecdotes and some data on this list. The resilver doesn't do a single pass of the drives, but uses a smarter temporal

Re: [zfs-discuss] A few questions

2010-12-20 Thread Phil Harman
On 20/12/2010 11:03, Deano wrote: Hi, Which brings up an interesting question... IF it were fixed in for example illumos or freebsd is there a plan for how to handle possible incompatible zfs implementations? Currently the basic version numbering only works as it implies only one stream of

Re: [zfs-discuss] A few questions

2010-12-20 Thread Phil Harman
On 20/12/2010 11:29, Lanky Doodle wrote: I believe Oracle is aware of the problem, but most of the core ZFS team has left. And of course, a fix for Oracle Solaris no longer means a fix for the rest of us. OK, that is a bit concerning then. As good as ZFS may be, i'm not sure I want to committ

Re: [zfs-discuss] A few questions

2010-12-20 Thread Phil Harman
On 20/12/2010 13:59, Richard Elling wrote: On Dec 20, 2010, at 2:42 AM, Phil Harman phil.har...@gmail.com mailto:phil.har...@gmail.com wrote: Why does resilvering take so long in raidz anyway? Because it's broken. There were some changes a while back that made it more broken. broken

Re: [zfs-discuss] zpool import is this safe to use -f option in this case ?

2010-11-17 Thread Phil Harman
+1 When I did my stuff (with a major bank) two years ago, my reasoning was that we (Sun, remember them?) had made huge capital out of the always consistent on disk claim, and that we could be expected to stand by and honour that promise. But because this was a big bank, I felt that due

Re: [zfs-discuss] Changing GUID

2010-11-16 Thread Phil Harman
Actually, I did this very thing a couple of years ago with M9000s and EMC DMX4s ... with the exception of the same host requirement you have (i.e. the thing that requires the GUID change). If you want to import the pool back into the host where the cloned pool is also imported, it's not just

Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Phil Harman
What more info could you provide? Quite a lot more, actually, like: how many streams of SQL and copy are you running? how are the filesystems/zvols configured (recordsize, etc)? some CPU, VM and network stats would also be nice. Based on the nexenta iostats you've provided (a tiny window on

Re: [zfs-discuss] Myth? 21 disk raidz3: Don't put more than ___ disks in a vdev

2010-10-20 Thread Phil Harman
On 20/10/2010 14:48, Darren J Moffat wrote: On 20/10/2010 14:03, Edward Ned Harvey wrote: In a discussion a few weeks back, it was mentioned that the Best Practices Guide says something like Don't put more than ___ disks into a single vdev. At first, I challenged this idea, because I see no

Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Phil Harman
Ian, It would help to have some config detail (e.g. what options are you using? zpool status output; property lists for specific filesystems and zvols; etc) Some basic Solaris stats can be very helpful too (e.g. peak flow samples of vmstat 1, mpstst 1, iostat -xnz 1, etc) It would also be

Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-15 Thread Phil Harman
As I have mentioned already, it would be useful to know more about the config, how the tests are being done, and to see some basic system performance stats. On 15/10/2010 15:58, Ian D wrote: As I have mentioned already, we have the same performance issues whether we READ or we WRITE to the

Re: [zfs-discuss] TLER and ZFS

2010-10-06 Thread Phil Harman
www.solarisinternals.com has always been a community. It never was hosted by Sun, and it's not hosted by Oracle. True, many of the contributors were Sun employees, but not so many remain at Oracle. If it's out if date, I suspect that's because the original contributors are too busy doing other

Re: [zfs-discuss] send/recv reads a lot from destination zpool

2010-08-15 Thread Phil Harman
I saw this the other day when doing an initial auto sync from one Nexenta 3.0.3 node to another (using the ZFS/SSH method). I later tried it again with a fresh destination pool and the read traffic was minimal. Sadly I didn't have an opportunity to do and investigation, but it doesn't fit my

Re: [zfs-discuss] RAID Z stripes

2010-08-10 Thread Phil Harman
On 10 Aug 2010, at 08:49, Ian Collins i...@ianshome.com wrote: On 08/10/10 06:21 PM, Terry Hull wrote: I am wanting to build a server with 16 - 1TB drives with 2 – 8 dri ve RAID Z2 arrays striped together. However, I would like the capa bility of adding additional stripes of 2TB drives in the

Re: [zfs-discuss] RAID Z stripes

2010-08-10 Thread Phil Harman
On 10 Aug 2010, at 10:22, Ian Collins i...@ianshome.com wrote: On 08/10/10 09:12 PM, Andrew Gabriel wrote: Phil Harman wrote: On 10 Aug 2010, at 08:49, Ian Collins i...@ianshome.com wrote: On 08/10/10 06:21 PM, Terry Hull wrote: I am wanting to build a server with 16 - 1TB drives with 2 – 8

Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
That's because NFS adds synchronous writes to the mix (e.g. the client needs to know certain transactions made it to nonvolatile storage in case the server restarts etc). The simplest safe solution, although not cheap, is to add an SSD log device to the pool. On 23 Jul 2010, at 08:11, Sigbjorn

Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
On 23 Jul 2010, at 09:18, Andrew Gabriel andrew.gabr...@oracle.com wrote: Thomas Burgess wrote: On Fri, Jul 23, 2010 at 3:11 AM, Sigbjorn Lie sigbj...@nixtra.com mailto:sigbj...@nixtra.com wrote: Hi, I've been searching around on the Internet to fine some help with this, but

Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
Sent from my iPhone On 23 Jul 2010, at 09:42, tomwaters tomwat...@chadmail.com wrote: I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 95-105MB and NFS of 5-20MB. Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI target...as I am

Re: [zfs-discuss] NFS performance?

2010-07-23 Thread Phil Harman
On 23/07/2010 10:02, Sigbjorn Lie wrote: On Fri, July 23, 2010 10:42, tomwaters wrote: I agree, I get apalling NFS speeds compared to CIFS/Samba..ie. CIFS/Samba of 95-105MB and NFS of 5-20MB. Not the thread hijack, but I assume a SSD ZIL will similarly improve an iSCSI target...as I am

Re: [zfs-discuss] swap - where is it coming from?

2010-06-11 Thread Phil Harman
On 10 Jun 2010, at 19:20, Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote: On Thu, 10 Jun 2010, casper@sun.com wrote: Swap is perhaps the wrong name; it is really virtual memory; virtual memory consists of real memory and swap on disk. In Solaris, a page either exists on the

Re: [zfs-discuss] Compellant announces zNAS

2010-04-29 Thread Phil Harman
That screen shot looks very much like Nexenta 3.0 with a different branding. Elsewhere, The Register confirms it's OpenSolaris. On 29 Apr 2010, at 07:35, Thommy M. Malmström thommy.m.malmst...@gmail.co m wrote: What operating system does it run? -- This message posted from opensolaris.org

Re: [zfs-discuss] Filebench Performance is weird

2010-03-02 Thread Phil Harman
I see at least two differences: 1. duration 30s vs 100s (so not SAME) 2. your manual test doesn't empty the cache Of course, it is the latter that makes all the difference. Hope this helps, Phil Sent from my iPhone On 2 Mar 2010, at 08:38, Abdullah Al-Dahlawi dahl...@ieee.org wrote:

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-19 Thread Phil Harman
On 19/02/2010 21:57, Ragnar Sundblad wrote: On 18 feb 2010, at 13.55, Phil Harman wrote: Whilst the latest bug fixes put the world to rights again with respect to correctness, it may be that some of our performance workaround are still unsafe (i.e. if my iSCSI client assumes all writes

Re: [zfs-discuss] Abysmal ISCSI / ZFS Performance

2010-02-18 Thread Phil Harman
client assumes all writes are synchronised to nonvolatile storage, I'd better be pretty sure of the failure modes before I work around that). Right now, it seems like an SSD Logzilla is needed if you want correctness and performance. Phil Harman Harman Holistix - focusing on the detail

Re: [zfs-discuss] x4500...need input and clarity on striped/mirrored configuration

2010-01-21 Thread Phil Harman
Can ASM match ZFS for checksum and self healing? The reason I ask is that the x45x0 uses inexpensive (less reluable) SATA drives. Even the J4xxx paper you cite uses SAS for production data (only using SATA for Oracle Flash, although I gave my concerns about that too). The thing is, ZFS and

Re: [zfs-discuss] Recordsize...

2010-01-18 Thread Phil Harman
Richard Elling wrote: Tristan Ball wrote: Also - Am I right in thinking that if a 4K write is made to a filesystem block with a recordsize of 8K, then the original block is read (assuming it's not in the ARC), before the new block is written elsewhere (the copy, from copy on write)? This

Re: [zfs-discuss] zfs send/receive as backup - reliability?

2010-01-18 Thread Phil Harman
YMMV. At a recent LOSUG meeting we were told of a case where rsync was faster than an incremental zfs send/recv. But I think that was for a mail server with many tiny files (i.e. changed blocks are very easy to find in files with very few blocks). However, I don't see why further ZFS

Re: [zfs-discuss] I/O Read starvation

2010-01-11 Thread Phil Harman
Hi Banks, Some basic stats might shed some light, e.g. vmstat 5, mpstat 5, iostat -xnz 5, prstat -Lmc 5 ... all running from just before you start the tests until things are normal again. Memory starvation is certainly a possibility. The ARC can be greedy and slow to release memory under

Re: [zfs-discuss] I/O Read starvation

2010-01-10 Thread Phil Harman
What version of Solaris / OpenSolaris are you using? Older versions use mmap(2) for reads in cp(1). Sadly, mmap(2) does not jive well with ZFS. To be sure, you could check how your cp(1) is implemented using truss(1) (i.e. does it do mmap/write or read/write?) aside I find it interesting

Re: [zfs-discuss] zfs send from solaris 10/08 to zfs receive on solaris 10/09

2009-11-14 Thread Phil Harman
of this list. On 14 Nov 2009, at 17:58, Miles Nordin car...@ivy.net wrote: ph == Phil Harman phil.har...@gmail.com writes: The format of the stream is committed. You will be able to receive your streams on future versions of ZFS. What Erik said is stronger than the man page in an important

Re: [zfs-discuss] zfs send from solaris 10/08 to zfs receive on solaris 10/09

2009-11-12 Thread Phil Harman
On 12 Nov 2009, at 19:54, David Dyer-Bennet d...@dd-b.net wrote: On Thu, November 12, 2009 13:36, Edward Ned Harvey wrote: I built a fileserver on solaris 10u6 (10/08) intending to back it up to another server via zfs send | ssh othermachine 'zfs receive' However, the new server is too new

Re: [zfs-discuss] sparc + zfs + nfs + mac osX = fail ?

2009-11-05 Thread Phil Harman
Ok, since we're doing weird, here's my experience ... MacOS X 10.5, amd64 snv_82, ZFS via NFS v3, iTunes 7-ish One ZFS filesystem with about 8000 mp3 files. One empty iTunes library. Drag and drop about 250 directories (containing the 800 files) into iTunes from NFS mounted volume. Select

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. cp(1) uses mmap(2). When you use cp(1) it brings pages of the files it copies into the

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
Joerg Schilling wrote: Phil Harman phil.har...@sun.com wrote: ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. cp(1) uses mmap(2). When

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: If you reboot, your cpio(1) tests will probably go fast again, until someone uses mmap(2) on the files again. I think tar(1) uses read(2), but from my iPod I can't be sure. It would be interesting to see how tar(1) performs

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
Gary Mills wrote: On Sat, Jul 04, 2009 at 08:48:33AM +0100, Phil Harman wrote: ZFS doesn't mix well with mmap(2). This is because ZFS uses the ARC instead of the Solaris page cache. But mmap() uses the latter. So if anyone maps a file, ZFS has to keep the two caches in sync. That's

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: However, this is only part of the problem. The fundamental issue is that ZFS has its own ARC apart from the Solaris page cache, so whenever mmap() is used, all I/O to that file has to make sure that the two caches are in sync

Re: [zfs-discuss] Why is Solaris 10 ZFS performance so terrible?

2009-07-04 Thread Phil Harman
Bob Friesenhahn wrote: On Sat, 4 Jul 2009, Phil Harman wrote: However, it seems that memory mapping is not responsible for the problem I am seeing here. Memory mapping may make the problem seem worse, but it is clearly not the cause. mmap(2) is what brings ZFS files into the page cache. I

Re: [zfs-discuss] An Academic Sysadmin's Lament for ZFS ?

2007-09-10 Thread Phil Harman
On 10 Sep 2007, at 16:41, Brian H. Nelson wrote: Stephen Usher wrote: Brian H. Nelson: I'm sure it would be interesting for those on the list if you could outline the gotchas so that the rest of us don't have to re-invent the wheel... or at least not fall down the pitfalls. Also, here's