Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Andrey Kuzmin
On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling richard.ell...@gmail.comwrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to sequential vs. random, but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread sensille
Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling richard.ell...@gmail.com mailto:richard.ell...@gmail.com wrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to sequential vs. random, but YMMW. As

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 10/06/2010 20:43, Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. But it actually can do over 100k. Also several thousand IOPS on a single FC

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 11/06/2010 09:22, sensille wrote: Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling richard.ell...@gmail.commailto:richard.ell...@gmail.com wrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Andrey Kuzmin
On Fri, Jun 11, 2010 at 1:26 PM, Robert Milkowski mi...@task.gda.pl wrote: On 11/06/2010 09:22, sensille wrote: Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling richard.ell...@gmail.commailto:richard.ell...@gmail.com wrote: On Jun 10, 2010, at 1:24 PM, Arne

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Robert Milkowski
On 11/06/2010 10:58, Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:26 PM, Robert Milkowski mi...@task.gda.pl mailto:mi...@task.gda.pl wrote: On 11/06/2010 09:22, sensille wrote: Andrey Kuzmin wrote: On Fri, Jun 11, 2010 at 1:54 AM, Richard Elling

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-11 Thread Garrett D'Amore
On Fri, 2010-06-11 at 13:58 +0400, Andrey Kuzmin wrote: # dd if=/dev/zero of=/dev/rdsk/cXtYdZs0 bs=512 I did a test on my workstation a moment ago and got about 21k IOPS from my sata drive (iostat). The trick here of course is that this is sequentail

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski
On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through a single device interface. The normal driver stack is likely limited as to how many IOPS it can sustain for a given LUN since the driver stack is optimized for

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski mi...@task.gda.pl wrote: On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through a single device interface. The normal driver stack is likely limited as to how many

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski
On 10/06/2010 15:39, Andrey Kuzmin wrote: On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski mi...@task.gda.pl mailto:mi...@task.gda.pl wrote: On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Sorry, my bad. _Reading_ from /dev/null may be an issue, but not writing to it, of course. Regards, Andrey On Thu, Jun 10, 2010 at 6:46 PM, Robert Milkowski mi...@task.gda.pl wrote: On 10/06/2010 15:39, Andrey Kuzmin wrote: On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Mike Gerdts
On Thu, Jun 10, 2010 at 9:39 AM, Andrey Kuzmin andrey.v.kuz...@gmail.com wrote: On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski mi...@task.gda.pl wrote: On 21/10/2009 03:54, Bob Friesenhahn wrote: I would be interested to know how many IOPS an OS like Solaris is able to push through a

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Ross Walker
On Jun 10, 2010, at 5:54 PM, Richard Elling richard.ell...@gmail.com wrote: On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to sequential vs. random, but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. Regards, Andrey On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski mi...@task.gda.pl wrote: On 21/10/2009 03:54, Bob

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Garrett D'Amore
For the record, with my driver (which is not the same as the one shipped by the vendor), I was getting over 150K IOPS with a single DDRdrive X1. It is possible to get very high IOPS with Solaris. However, it might be difficult to get such high numbers with systems based on SCSI/SCSA.

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen sensi...@gmx.net wrote: Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. Hundreds IOPS is not

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen sensi...@gmx.net mailto:sensi...@gmx.net wrote: Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: Well, I'm more accustomed to sequential vs. random, but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have write-back enabled? It's a sustained number, so it shouldn't matter. Regards, Andrey On Fri, Jun

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen
Andrey Kuzmin wrote: As to your results, it sounds almost too good to be true. As Bob has pointed out, h/w design targeted hundreds IOPS, and it was hard to believe it can scale 100x. Fantastic. Hundreds IOPS is not quite true, even with hard drives. I just tested a Hitachi 15k drive and it

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Well, I'm more accustomed to sequential vs. random, but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have write-back enabled? Regards, Andrey On Fri, Jun 11, 2010 at 12:03 AM, Arne Jansen sensi...@gmx.net wrote: Andrey Kuzmin wrote:

Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Richard Elling
On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote: Andrey Kuzmin wrote: Well, I'm more accustomed to sequential vs. random, but YMMW. As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting into cache), did you have write-back enabled? It's a sustained number, so it

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart
Hi list, If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running OSOL nv130. Power off the machine, removed the F20 and power back on. Machines boots OK and comes up normally with the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't be until late next week though. Running

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 14.28, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Jeroen Roodhart If you're running solaris proper, you better mirror your ZIL log device. ... I plan to get to test this as well, won't

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 13:58, Ragnar Sundblad wrote: Rather: ...=19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. For a file server, mail server, etc etc, where things are stored and supposed to be available later, you

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log device. If you have more than one, just add them

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski
On 07/04/2010 15:35, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Ragnar Sundblad wrote: So the recommendation for zpool 19 would be *strongly* recommended. Mirror your log device if you care about using your pool. And the recommendation for zpool =19 would be ... don't mirror your log

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The minimal read would be quite small, but it

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 09:19, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Robert Milkowski wrote: it is only read at boot if there are uncomitted data on it - during normal reboots zfs won't read data from slog. How does zfs know if there is uncomitted data on the slog device without reading it? The

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device at the same time, this is an important

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time. The writes are assumed to work if the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin
On 04/07/10 10:18, Edward Ned Harvey wrote: From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss- boun...@opensolaris.org] On Behalf Of Bob Friesenhahn It is also worth pointing out that in normal operation the slog is essentially a write-only device which is only read at boot time.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Mark J Musante
On Wed, 7 Apr 2010, Neil Perrin wrote: There have previously been suggestions to read slogs periodically. I don't know if there's a CR raised for this though. Roch wrote up CR 6938883 Need to exercise read from slog dynamically Regards, markm ___

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed log device

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn
On Wed, 7 Apr 2010, Edward Ned Harvey wrote: BTW, does the system *ever* read from the log device during normal operation? Such as perhaps during a scrub? It really would be nice to detect failure of log devices in advance, that are claiming to write correctly, but which are really

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Richard Elling
On Apr 7, 2010, at 10:19 AM, Bob Friesenhahn wrote: On Wed, 7 Apr 2010, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Miles Nordin
jr == Jeroen Roodhart j.r.roodh...@uva.nl writes: jr Running OSOL nv130. Power off the machine, removed the F20 and jr power back on. Machines boots OK and comes up normally with jr the following message in 'zpool status': yeah, but try it again and this time put rpool on the F20 as

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad
On 7 apr 2010, at 18.13, Edward Ned Harvey wrote: From: Ragnar Sundblad [mailto:ra...@csc.kth.se] Rather: ... =19 would be ... if you don't mind loosing data written the ~30 seconds before the crash, you don't have to mirror your log device. If you have a system crash, *and* a failed

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Jeroen Roodhart
Hi Roch, Can you try 4 concurrent tar to four different ZFS filesystems (same pool). Hmmm, you're on to something here: http://www.science.uva.nl/~jeroen/zil_compared_e1000_iostat_iops_svc_t_10sec_interval.pdf In short: when using two exported file systems total time goes down to around

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-06 Thread Edward Ned Harvey
We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact same version of firmware and recreated the volumes on new drives arriving

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Kyle McDonald
On 4/4/2010 11:04 PM, Edward Ned Harvey wrote: Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-05 Thread Edward Ned Harvey
From: Kyle McDonald [mailto:kmcdon...@egenera.com] So does your HBA have newer firmware now than it did when the first disk was connected? Maybe it's the HBA that is handling the new disks differently now, than it did when the first one was plugged in? Can you down rev the HBA FW? Do you

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Ragnar Sundblad
On 4 apr 2010, at 06.01, Richard Elling wrote: Thank you for your reply! Just wanted to make sure. Do not assume that power outages are the only cause of unclean shutdowns. -- richard Thanks, I have seen that mistake several times with other (file)systems, and hope I'll never ever make it

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
Hmm, when you did the write-back test was the ZIL SSD included in the write-back? What I was proposing was write-back only on the disks, and ZIL SSD with no write-back. The tests I did were: All disks write-through All disks write-back With/without SSD for ZIL All the permutations of the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-04 Thread Edward Ned Harvey
Actually, It's my experience that Sun (and other vendors) do exactly that for you when you buy their parts - at least for rotating drives, I have no experience with SSD's. The Sun disk label shipped on all the drives is setup to make the drive the standard size for that sun part number.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Casper . Dik
The only way to guarantee consistency in the snapshot is to always (regardless of ZIL enabled/disabled) give priority for sync writes to get into the TXG before async writes. If the OS does give priority for sync writes going into TXG's before async writes (even with ZIL disabled), then after

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Neil Perrin
On 04/02/10 08:24, Edward Ned Harvey wrote: The purpose of the ZIL is to act like a fast log for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Jeroen Roodhart
Hi Al, Have you tried the DDRdrive from Christopher George cgeo...@ddrdrive.com? Looks to me like a much better fit for your application than the F20? It would not hurt to check it out. Looks to me like you need a product with low *latency* - and a RAM based cache would be a much better

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Christopher George
Well, I did look at it but at that time there was no Solaris support yet. Right now it seems there is only a beta driver? Correct, we just completed functional validation of the OpenSolaris driver. Our focus has now turned to performance tuning and benchmarking. We expect to formally

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 1 apr 2010, at 06.15, Stuart Anderson wrote: Assuming you are also using a PCI LSI HBA from Sun that is managed with a utility called /opt/StorMan/arcconf and reports itself as the amazingly informative model number Sun STK RAID INT what worked for me was to run, arcconf delete (to delete

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Ragnar Sundblad
On 2 apr 2010, at 22.47, Neil Perrin wrote: Suppose there is an application which sometimes does sync writes, and sometimes async writes. In fact, to make it easier, suppose two processes open two files, one of which always writes asynchronously, and one of which always writes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-03 Thread Richard Elling
On Apr 3, 2010, at 5:47 PM, Ragnar Sundblad wrote: On 2 apr 2010, at 22.47, Neil Perrin wrote: Suppose there is an application which sometimes does sync writes, and sometimes async writes. In fact, to make it easier, suppose two processes open two files, one of which always writes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris an Linux are chosen such that one

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
Robert Milkowski writes: On 01/04/2010 20:58, Jeroen Roodhart wrote: I'm happy to see that it is now the default and I hope this will cause the Linux NFS client implementation to be faster for conforming NFS servers. Interesting thing is that apparently defaults on Solaris

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Seriously, all disks configured WriteThrough (spindle and SSD disks alike) using the dedicated ZIL SSD device, very noticeably faster than enabling the WriteBack. What do you get with both SSD ZIL and WriteBack disks enabled? I mean if you have both why not use both? Then both

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It seems like it should be unnecessary. It seems like

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Roch
When we use one vmod, both machines are finished in about 6min45, zilstat maxes out at about 4200 IOPS. Using four vmods it takes about 6min55, zilstat maxes out at 2200 IOPS. Can you try 4 concurrent tar to four different ZFS filesystems (same pool). -r

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: We were talking about ZFS, and under what circumstances data is flushed to disk, in what way sync and async writes are handled by the OS, and what happens if you disable ZIL and lose power to your system. We were

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
I am envisioning a database, which issues a small sync write, followed by a larger async write. Since the sync write is small, the OS would prefer to defer the write and aggregate into a larger block. So the possibility of the later async write being committed to disk before the older

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
hello i have had this problem this week. our zil ssd died (apt slc ssd 16gb). because we had no spare drive in stock, we ignored it. then we decided to update our nexenta 3 alpha to beta, exported the pool and made a fresh install to have a clean system and tried to import the pool. we

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
ZFS recovers to a crash-consistent state, even without the slog, meaning it recovers to some state through which the filesystem passed in the seconds leading up to the crash. This isn't what UFS or XFS do. The on-disk log (slog or otherwise), if I understand right, can actually make the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
If you have zpool less than version 19 (when ability to remove log device was introduced) and you have a non-mirrored log device that failed, you had better treat the situation as an emergency. Instead, do man zpool and look for zpool remove. If it says supports removing log devices

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: We were talking about ZFS, and under what circumstances data is flushed to disk, in what way sync and async writes are handled by the OS, and what happens if you disable ZIL and lose power to your system. We were

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
So you're saying that while the OS is building txg's to write to disk, the OS will never reorder the sequence in which individual write operations get ordered into the txg's. That is, an application performing a small sync write, followed by a large async write, will never have the second

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. You may say that, but then you post this: Acknowledged. I read something arrogant, and I replied even more arrogant. That was dumb of me.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
Only a broken application uses sync writes sometimes, and async writes at other times. Suppose there is a virtual machine, with virtual processes inside it. Some virtual process issues a sync write to the virtual OS, meanwhile another virtual process issues an async write. Then the virtual OS

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Edward Ned Harvey
The purpose of the ZIL is to act like a fast log for synchronous writes. It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly know a lot here. But I'm hearing conflicting information, and don't know what

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Casper . Dik
Questions to answer would be: Is a ZIL log device used only by sync() and fsync() system calls? Is it ever used to accelerate async writes? There are quite a few of sync writes, specifically when you mix in the NFS server. Suppose there is an application which sometimes does sync writes,

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Kyle McDonald
On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit. It

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Mattias Pantzare
On Fri, Apr 2, 2010 at 16:24, Edward Ned Harvey solar...@nedharvey.com wrote: The purpose of the ZIL is to act like a fast log for synchronous writes.  It allows the system to quickly confirm a synchronous write request with the minimum amount of work. Bob and Casper and some others clearly

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: So you're saying that while the OS is building txg's to write to disk, the OS will never reorder the sequence in which individual write operations get ordered into the txg's. That is, an application performing a small sync write, followed by a large

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Bob Friesenhahn
On Fri, 2 Apr 2010, Edward Ned Harvey wrote: were taking place at the same time. That is, if two processes both complete a write operation at the same time, one in sync mode and the other in async mode, then it is guaranteed the data on disk will never have the async data committed before the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Stuart Anderson
On Apr 2, 2010, at 5:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary). Then if you ever get a drive a little smaller it still should fit.

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Ross Walker
On Fri, Apr 2, 2010 at 8:03 AM, Edward Ned Harvey solar...@nedharvey.com wrote: Seriously, all disks configured WriteThrough (spindle and SSD disks alike) using the dedicated ZIL SSD device, very noticeably faster than enabling the WriteBack. What do you get with both SSD ZIL and

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Robert Milkowski
On 02/04/2010 16:04, casper@sun.com wrote: sync() is actually *async* and returning from sync() says nothing about to clarify - in case of ZFS sync() is actually synchronous. -- Robert Milkowski http://milek.blogspot.com ___ zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tirso Alonso
If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes that many manufatures use (IDEMA LBA1-02):

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Miles Nordin
enh == Edward Ned Harvey solar...@nedharvey.com writes: enh If you have zpool less than version 19 (when ability to remove enh log device was introduced) and you have a non-mirrored log enh device that failed, you had better treat the situation as an enh emergency. Ed the log device

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Tim Cook
On Fri, Apr 2, 2010 at 10:08 AM, Kyle McDonald kmcdon...@egenera.comwrote: On 4/2/2010 8:08 AM, Edward Ned Harvey wrote: I know it is way after the fact, but I find it best to coerce each drive down to the whole GB boundary using format (create Solaris partition just up to the boundary).

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Eric D. Mudama
On Fri, Apr 2 at 11:14, Tirso Alonso wrote: If my new replacement SSD with identical part number and firmware is 0.001 Gb smaller than the original and hence unable to mirror, what's to prevent the same thing from happening to one of my 1TB spindle disk mirrors? There is a standard for sizes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-02 Thread Al Hopper
Hi Jeroen, Have you tried the DDRdrive from Christopher George cgeo...@ddrdrive.com? Looks to me like a much better fit for your application than the F20? It would not hurt to check it out. Looks to me like you need a product with low *latency* - and a RAM based cache would be a much better

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
If you disable the ZIL, the filesystem still stays correct in RAM, and the only way you lose any data such as you've described, is to have an ungraceful power down or reboot. The advice I would give is: Do zfs autosnapshots frequently (say ... every 5 minutes, keeping the most recent 2 hours of

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
If you disable the ZIL, the filesystem still stays correct in RAM, and the only way you lose any data such as you've described, is to have an ungraceful power down or reboot. The advice I would give is: Do zfs autosnapshots frequently (say ... every 5 minutes, keeping the most recent 2

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
Can you elaborate? Just today, we got the replacement drive that has precisely the right version of firmware and everything. Still, when we plugged in that drive, and create simple volume in the storagetek raid utility, the new drive is 0.001 Gb smaller than the old drive. I'm still

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
If you have an ungraceful shutdown in the middle of writing stuff, while the ZIL is disabled, then you have corrupt data. Could be files that are partially written. Could be wrong permissions or attributes on files. Could be missing files or directories. Or some other problem. Some changes

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
If you have an ungraceful shutdown in the middle of writing stuff, while the ZIL is disabled, then you have corrupt data. Could be files that are partially written. Could be wrong permissions or attributes on files. Could be missing files or directories. Or some other problem. Some

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
This approach does not solve the problem. When you do a snapshot, the txg is committed. If you wish to reduce the exposure to loss of sync data and run with ZIL disabled, then you can change the txg commit interval -- however changing the txg commit interval will not eliminate the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Edward Ned Harvey
Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its own discretion. Meaning the async write function

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. You may say that, but then you post this: Why do you think that a Snapshot has a better quality than the last snapshot available? If you rollback to a

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its own discretion. Meaning the async write function

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
This approach does not solve the problem. When you do a snapshot, the txg is committed. If you wish to reduce the exposure to loss of sync data and run with ZIL disabled, then you can change the txg commit interval -- however changing the txg commit interval will not eliminate the

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:51 PM, Edward Ned Harvey solar...@nedharvey.com wrote: A MegaRAID card with write-back cache? It should also be cheaper than the F20. I haven't posted results yet, but I just finished a few weeks of extensive benchmarking various configurations. I can say this:

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Mar 31, 2010, at 11:58 PM, Edward Ned Harvey solar...@nedharvey.com wrote: We ran into something similar with these drives in an X4170 that turned out to be an issue of the preconfigured logical volumes on the drives. Once we made sure all of our Sun PCI HBAs where running the exact

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Apr 1, 2010, at 8:42 AM, casper@sun.com wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Darren J Moffat
On 01/04/2010 14:49, Ross Walker wrote: We're talking about the sync for NFS exports in Linux; what do they mean with sync NFS exports? See section A1 in the FAQ: http://nfs.sourceforge.net/ I think B4 is the answer to Casper's question: BEGIN QUOTE Linux servers (although not

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Ross Walker
On Thu, Apr 1, 2010 at 10:03 AM, Darren J Moffat darr...@opensolaris.org wrote: On 01/04/2010 14:49, Ross Walker wrote: We're talking about the sync for NFS exports in Linux; what do they mean with sync NFS exports? See section A1 in the FAQ: http://nfs.sourceforge.net/ I think B4 is

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: If I'm wrong about this, please explain. I am envisioning a database, which issues a small sync write, followed by a larger async write. Since the sync write is small, the OS would prefer to defer the write and aggregate into a larger block. So

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Robert Milkowski
On 01/04/2010 13:01, Edward Ned Harvey wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at its

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Casper . Dik
On 01/04/2010 13:01, Edward Ned Harvey wrote: Is that what sync means in Linux? A sync write is one in which the application blocks until the OS acks that the write has been committed to disk. An async write is given to the OS, and the OS is permitted to buffer the write to disk at

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-01 Thread Bob Friesenhahn
On Thu, 1 Apr 2010, Edward Ned Harvey wrote: Dude, don't be so arrogant. Acting like you know what I'm talking about better than I do. Face it that you have something to learn here. Geez! Yes, all the transactions in a transaction group are either committed entirely to disk, or not at

  1   2   3   >