Re: [zfs-discuss] Sharing root and cache on same SSD?

2010-06-10 Thread Brandon High
On Thu, Jun 10, 2010 at 7:31 AM, Peter Eriksson  wrote:
> Are there any potential problems that one should be aware of if you would 
> like to make dual-use of a pair of SSD MLC units and use parts of them as 
> mirrored (ZFS) boot disks, and then use the rest of them as ZFS L2ARC cache 
> devices (for another zpool)?
>
> The one thing I can think of is potential wear of the SSD devices due to 
> writing of cache data to them, making them potentially fail earlier that they 
> otherwise would have.

It's not possible to do this with the Caiman installer.

There are ways to make it work, but they require tinkering after the install.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Help with slow zfs send | receive performance within the same box.

2010-06-10 Thread Brandon High
On Thu, Jun 10, 2010 at 10:22 PM, valrh...@gmail.com  wrote:
> SYstem (brand new today): Dell Poweredge T410. Intel Xeon E5504 5.0 GHz (Core 
> i7-based) with 4 GB of RAM. I have one zpool of four 2-TB Hitachi Deskstar 
> SATA drives. I used the SATA mode on the motherboard (not the RAID mode, 
> because I don't want the motherboard's RAID controller to do something funny 
> to the drives). Everything gets recognized, and the EON storage "install" was 
> just fine.

Check that the system is using the AHCI driver. There usually an
option in the BIOS for AHCI, SATA, or RAID.

You can check with 'prtconf -D'

If you're using the pci-ide driver, performance is going to be poor.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Phil Harman
On 10 Jun 2010, at 19:20, Bob Friesenhahn  
 wrote:



On Thu, 10 Jun 2010, casper@sun.com wrote:


Swap is perhaps the wrong name; it is really "virtual memory";  
virtual

memory consists of real memory and swap on disk. In Solaris, a page
either exists on the physical swap device or in memory.  Of course,  
not
all memory is available as the kernel and other caches use a large  
part

of the memory.


Don't forget that virtual memory pages may also come from memory  
mapped files from the filesystem.  However, it seems that zfs is  
effectively diminishing this.



Processes are largely filesystem agnostic. For example, both ZFS and  
UFS provide implementations for open(2), close(2), stat(2), read(2),  
write(2) and mmap(2). Every process is a consumer of mmap(2) because  
dynamic linking is implemented using memory mapped files.


However ZFS does diminish the usefulness of memory mapped files for I/ 
O because the current implementation performs so poorly. ZFS is the  
first major filesystem for Solaris that majors on a cache (the ARC)  
distinct from the Solaris page cache.


Today, memory mapped files use the Solaris page cache, even for the  
ZFS implementation of mmap(2). A mapped ZFS file will be cached twice  
- which is one performance hit. Another is that the caches have to be  
kept in sync, so that consumers if read(2), write(2) and mmap(2) see a  
consistent view of the same file.


Linux already had a head start on mapped file performance compared to  
UFS on Solaris. With ZFS that gap got a lot bigger. Applications like  
kdb+ that are heavily dependent on high performance mapped files I/O  
are currently not viable with ZFS on Solaris (and considerably slower  
with UFS than on Linux).

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread Brandon High
On Thu, Jun 10, 2010 at 11:31 AM, Bob Friesenhahn
 wrote:
> I think that you may notice that most of the perpetrators are from Gmail.
>  It seems that Gmail is very good at hiding existing text in its user
> interface so people think nothing of including most/all of the email they
> are replying to.

Yeah, it's a major pet peeve of mine with gmail. Both work and
freaks.com use Google Apps for mail, and I've caught myself failing to
trim replies a few times.

When you know everyone reading it has the same quoted text hiding,
it's easy to get lazy.

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread Dave Koelmeyer
I trimmed, and then got complained at by a mailing list user that the context 
of what I was replying to was missing. Can't win :P
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Migrating to ZFS

2010-06-10 Thread valrh...@gmail.com
Are you going to use this machine as a fileserver, at least the OpenSolaris 
part? You might consider trying EON storage (http://eonstorage.blogspot.com/), 
which just runs on a CD. If that's all you need, then you don't have to worry 
about partitioning around Windows, since Windows won't be able to read your ZFS 
array anyway.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Help with slow zfs send | receive performance within the same box.

2010-06-10 Thread valrh...@gmail.com
I've today set up a new fileserver using EON 0.600 (based on SNV130). I'm now 
copying files between mirrors, and the performance is slower than I had hoped. 
I am trying to figure out what to do to make things a bit faster in terms of 
performance. Thanks in advance for reading, and sharing any thoughts you might 
have.

SYstem (brand new today): Dell Poweredge T410. Intel Xeon E5504 5.0 GHz (Core 
i7-based) with 4 GB of RAM. I have one zpool of four 2-TB Hitachi Deskstar SATA 
drives. I used the SATA mode on the motherboard (not the RAID mode, because I 
don't want the motherboard's RAID controller to do something funny to the 
drives). Everything gets recognized, and the EON storage "install" was just 
fine. 

I then configured the drives into an array of two mirrors, made with zpool 
create mirror (drives 1 and 2), then zpool add mirror (drives 3 and 4). 
The output from zpool status is:
 state: ONLINE
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
hextb_data  ONLINE   0 0 0
  mirror-0  ONLINE   0 0 0
c1d0ONLINE   0 0 0
c1d1ONLINE   0 0 0
  mirror-1  ONLINE   0 0 0
c2d0ONLINE   0 0 0
c2d1ONLINE   0 0 0

This is a 4TB array, initially empty, that I want to copy data TO.

I then added two more 2 TB drives that were an existing pool on an older 
machine. I want to move about 625 GB of deduped data from the old pool (the 
simple mirror of two 2 TB drives that I physically moved over) to the new pool. 
The case can accommodate all six drives. 

I snapshotted the old data on the 2 TB array, and made a new filesystem on the 
4 TB array. I then moved the data over with:

zfs send -RD data_on_old_p...@snapshot | zfs recv -dF data_on_new_pool

Here's the problem. When I run "iostat -xn", I get:

   extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   70.00.0 6859.40.3  0.2  0.22.12.4   5  10 c3d0
   69.80.0 6867.00.3  0.2  0.22.22.4   5  10 c4d0
   20.0   68.0  675.1 6490.6  0.9  0.6   10.06.6  22  32 c1d0
   19.5   68.0  675.4 6490.6  0.9  0.6   10.16.7  22  33 c1d1
   19.0   67.2  669.2 6492.5  1.2  0.7   13.87.8  28  36 c2d0
   20.2   67.1  676.8 6492.5  1.2  0.7   13.97.8  28  37 c2d1

The OLD pool is the mirror of c3d0 and c4d0. The NEW pool is the striped set of 
mirrors involving c1d0, c1d1, c2d0 and c2d1.

The transfer started out a few hours ago at about 3 MB/sec. Now it's nearly 7 
MB/sec. But why is this so low? Everything is deduped and compressed. And it's 
an internal transfer, within the same machine, from one set of hard drives to 
another, via the SATA controller. Yet the net effect is very slow. I'm trying 
to figure out what this is, since it's much slower than I would have hoped.

Any and all advice on what to do to troubleshoot and fix the problem would be 
quite welcome. Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-10 Thread Jason King
On Thu, Jun 10, 2010 at 11:32 PM, Erik Trimble  wrote:
> On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet wrote:
>>
>> On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwal
>>  wrote:
>>
>>>
>>> We at KQInfotech, initially started on an independent port of ZFS to
>>> linux.
>>> When we posted our progress about port last year, then we came to know
>>> about
>>> the work on LLNL port. Since then we started working on to re-base our
>>> changing on top Brian's changes.
>>>
>>> We are working on porting ZPL on that code. Our current status is that
>>> mount/unmount is working. Most of the directory operations and read/write
>>> is
>>> also working. There is still lot more development work and testing that
>>> needs to be going in this. But we are committed to make this happen so
>>> please stay tuned.
>>>
>>
>> Good times ahead!
>>
>
> I don't mean to be a PITA, but I'm assuming that someone lawyerly has had
> the appropriate discussions with the porting team about how linking against
> the GPL'd Linux kernel means your kernel module has to be GPL-compatible.
>  It doesn't matter if you distribute it outside the general kernel source
> tarball, what matters is that you're linking against a GPL program, and the
> old GPL v2 doesn't allow for a non-GPL-compatibly-licensed module to do
> that.
>
> As a workaround, take a look at what nVidia did for their X driver - it uses
> a GPL'd kernel module as a shim, which their codebase can then call from
> userland. Which is essentially what the ZFS FUSE folks have been reduced to
> doing.

How does EMC get away with it with powerpath, or Symantec with VxVM
and VxFS? -- I don't recall any shim modules with either product on
Linux when I used them at a previous job, yet they're still there.


> If the new work is a whole new implementation of the ZFS *design* intended
> for the linux kernel, then Yea! Great!  (fortunately, it does sound like
> this is what's going on)  Otherwise, OpenSolaris CDDL'd code can't go into a
> Linux kernel, module or otherwise.

Well technically they could start with the GRUB zfs code, which is GPL
licensed, but I don't think that's the case.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-10 Thread zfsnoob4
I'm very excited. Throw some code up on github as soon as you are able. I'm 
sure there are plenty of people (me) that would like to help test it out. I've 
already been playing around with ZFS using zvol on Fedora 12. I would love to 
have a ZPL, no matter how experimental.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-10 Thread Erik Trimble

On 6/10/2010 9:04 PM, Rodrigo E. De León Plicet wrote:

On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwal  wrote:
   

We at KQInfotech, initially started on an independent port of ZFS to linux.
When we posted our progress about port last year, then we came to know about
the work on LLNL port. Since then we started working on to re-base our
changing on top Brian's changes.

We are working on porting ZPL on that code. Our current status is that
mount/unmount is working. Most of the directory operations and read/write is
also working. There is still lot more development work and testing that
needs to be going in this. But we are committed to make this happen so
please stay tuned.
 


Good times ahead!
   

I don't mean to be a PITA, but I'm assuming that someone lawyerly has had the 
appropriate discussions with the porting team about how linking against the 
GPL'd Linux kernel means your kernel module has to be GPL-compatible.  It 
doesn't matter if you distribute it outside the general kernel source tarball, 
what matters is that you're linking against a GPL program, and the old GPL v2 
doesn't allow for a non-GPL-compatibly-licensed module to do that.

As a workaround, take a look at what nVidia did for their X driver - it uses a 
GPL'd kernel module as a shim, which their codebase can then call from 
userland. Which is essentially what the ZFS FUSE folks have been reduced to 
doing.


If the new work is a whole new implementation of the ZFS *design* intended for 
the linux kernel, then Yea! Great!  (fortunately, it does sound like this is 
what's going on)  Otherwise, OpenSolaris CDDL'd code can't go into a Linux 
kernel, module or otherwise.



--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Native ZFS for Linux

2010-06-10 Thread Rodrigo E . De León Plicet
On Tue, Jun 8, 2010 at 7:14 PM, Anurag Agarwal  wrote:
> We at KQInfotech, initially started on an independent port of ZFS to linux.
> When we posted our progress about port last year, then we came to know about
> the work on LLNL port. Since then we started working on to re-base our
> changing on top Brian's changes.
>
> We are working on porting ZPL on that code. Our current status is that
> mount/unmount is working. Most of the directory operations and read/write is
> also working. There is still lot more development work and testing that
> needs to be going in this. But we are committed to make this happen so
> please stay tuned.


Good times ahead!
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Richard Elling
On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote:

> Andrey Kuzmin wrote:
>> Well, I'm more accustomed to  "sequential vs. random", but YMMW.
>> As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting 
>> into cache), did you have write-back enabled?
> 
> It's a sustained number, so it shouldn't matter.

That is only 34 MB/sec.  The disk can do better for sequential writes.

Note: in ZFS, such writes will be coalesced into 128KB chunks.
 -- richard

-- 
ZFS and NexentaStor training, Rotterdam, July 13-15, 2010
http://nexenta-rotterdam.eventbrite.com/






___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS Replication hint req.

2010-06-10 Thread Tom Erickson

Jakob Tewes wrote:

Hey folks,

i´m trying my luck with scriptbased zfs replication and got no more ideas left 
so here comes my layout.

Got a small machine with two zfs pools, one protected via raidz2 and one 
including just 1 disk. Now i wanted
> to use zfs´s nice snapshot/replication options to ship data from the 
"unprotected" to the "protected" pool.
> Made snapshots of the included voulmes, shiped the snaps via "zfs 
send | zfs receive" - everything worked
> as expected. The "shiped" volume copied including all filesystem 
options held in that pool, also including
> the mountpoint. Now i got both zfs volumes with same mountpoint 
option. Caused by the running processes
> accessing the originating volume (and after replication even the 
target volume - it also gets zfs-mounted)

> im not able to change the mountpoint on the target volume nor unmount it.


i´d be very thankful if somebody could help me with an idea about how to avoid
the target volume to get mountet/receive the mountpoint option without 
interfering with the source volume.



What you really want is
6883722 want 'zfs recv -o prop=value' to set initial property values of 
received dataset


Until that's available, you could use the receive -u option to avoid 
mounting the received dataset, or you could set canmount=noauto on the 
source dataset before sending. If you set the mountpoint property 
locally on the received dataset, subsequent incremental receives should 
leave the mountpoint alone (after build 128).


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Well, I'm more accustomed to  "sequential vs. random", but YMMW.

As to 67000 512 byte writes (this sounds suspiciously close to 32Mb fitting
into cache), did you have write-back enabled?

Regards,
Andrey



On Fri, Jun 11, 2010 at 12:03 AM, Arne Jansen  wrote:

> Andrey Kuzmin wrote:
>
>  On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen > sensi...@gmx.net>> wrote:
>>
>>Andrey Kuzmin wrote:
>>
>>As to your results, it sounds almost too good to be true. As Bob
>>has pointed out, h/w design targeted hundreds IOPS, and it was
>>hard to believe it can scale 100x. Fantastic.
>>
>>
>>Hundreds IOPS is not quite true, even with hard drives. I just tested
>>a Hitachi 15k drive and it handles 67000 512 byte linear write/s, cache
>>
>>
>> Linear? May be sequential?
>>
>
> Aren't these synonyms? linear as opposed to random.
>
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen

Andrey Kuzmin wrote:
As to your results, it sounds almost too good to be true. As Bob has 
pointed out, h/w design targeted hundreds IOPS, and it was hard to 
believe it can scale 100x. Fantastic.


Hundreds IOPS is not quite true, even with hard drives. I just tested
a Hitachi 15k drive and it handles 67000 512 byte linear write/s, cache
enabled.

--Arne



Regards,
Andrey



On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski > wrote:


On 21/10/2009 03:54, Bob Friesenhahn wrote:


I would be interested to know how many IOPS an OS like Solaris
is able to push through a single device interface.  The normal
driver stack is likely limited as to how many IOPS it can
sustain for a given LUN since the driver stack is optimized for
high latency devices like disk drives.  If you are creating a
driver stack, the design decisions you make when requests will
be satisfied in about 12ms would be much different than if
requests are satisfied in 50us.  Limitations of existing
software stacks are likely reasons why Sun is designing hardware
with more device interfaces and more independent devices.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS Replication hint req.

2010-06-10 Thread Jakob Tewes
Hey folks,

i´m trying my luck with scriptbased zfs replication and got no more ideas left 
so here comes my layout.

Got a small machine with two zfs pools, one protected via raidz2 and one 
including just 1 disk. Now i wanted to use zfs´s nice snapshot/replication 
options to ship data from the "unprotected" to the "protected" pool. Made 
snapshots of the included voulmes, shiped the snaps via "zfs send | zfs 
receive" - everything worked as expected. The "shiped" volume copied including 
all filesystem options held in that pool, also including the mountpoint. Now i 
got both zfs volumes with same mountpoint option. Caused by the running 
processes accessing the originating volume (and after replication even the 
target volume - it also gets zfs-mounted) im not able to change the mountpoint 
on the target volume nor unmount it.

i´d be very thankful if somebody could help me with an idea about how to avoid
the target volume to get mountet/receive the mountpoint option without 
interfering with the source volume.

thanks and kind regards,

fuh
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen

Andrey Kuzmin wrote:

Well, I'm more accustomed to  "sequential vs. random", but YMMW.

As to 67000 512 byte writes (this sounds suspiciously close to 32Mb 
fitting into cache), did you have write-back enabled?




It's a sustained number, so it shouldn't matter.


Regards,
Andrey



On Fri, Jun 11, 2010 at 12:03 AM, Arne Jansen > wrote:


Andrey Kuzmin wrote:

On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen mailto:sensi...@gmx.net> >> wrote:

   Andrey Kuzmin wrote:

   As to your results, it sounds almost too good to be true.
As Bob
   has pointed out, h/w design targeted hundreds IOPS, and
it was
   hard to believe it can scale 100x. Fantastic.


   Hundreds IOPS is not quite true, even with hard drives. I
just tested
   a Hitachi 15k drive and it handles 67000 512 byte linear
write/s, cache


Linear? May be sequential?


Aren't these synonyms? linear as opposed to random.





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Arne Jansen

Andrey Kuzmin wrote:
On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen > wrote:


Andrey Kuzmin wrote:

As to your results, it sounds almost too good to be true. As Bob
has pointed out, h/w design targeted hundreds IOPS, and it was
hard to believe it can scale 100x. Fantastic.


Hundreds IOPS is not quite true, even with hard drives. I just tested
a Hitachi 15k drive and it handles 67000 512 byte linear write/s, cache


Linear? May be sequential?


Aren't these synonyms? linear as opposed to random.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen  wrote:

> Andrey Kuzmin wrote:
>
>> As to your results, it sounds almost too good to be true. As Bob has
>> pointed out, h/w design targeted hundreds IOPS, and it was hard to believe
>> it can scale 100x. Fantastic.
>>
>
> Hundreds IOPS is not quite true, even with hard drives. I just tested
> a Hitachi 15k drive and it handles 67000 512 byte linear write/s, cache
>

Linear? May be sequential?

Regards,
Andrey


> enabled.
>
> --Arne
>
>
>> Regards,
>> Andrey
>>
>>
>>
>>
>> On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski > mi...@task.gda.pl>> wrote:
>>
>>On 21/10/2009 03:54, Bob Friesenhahn wrote:
>>
>>
>>I would be interested to know how many IOPS an OS like Solaris
>>is able to push through a single device interface.  The normal
>>driver stack is likely limited as to how many IOPS it can
>>sustain for a given LUN since the driver stack is optimized for
>>high latency devices like disk drives.  If you are creating a
>>driver stack, the design decisions you make when requests will
>>be satisfied in about 12ms would be much different than if
>>requests are satisfied in 50us.  Limitations of existing
>>software stacks are likely reasons why Sun is designing hardware
>>with more device interfaces and more independent devices.
>>
>>
>>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Garrett D'Amore


For the record, with my driver (which is not the same as the one shipped 
by the vendor), I was getting over 150K IOPS with a single DDRdrive X1.  
It is possible to get very high IOPS with Solaris.  However, it might be 
difficult to get such high numbers with systems based on SCSI/SCSA.  
(SCSA does have assumptions which make it "overweight" for typical 
simple flash based devices.)


My solution was based around the "blkdev" device driver that I 
integrated into ON a couple of builds ago.


-- Garrett

On 06/10/10 12:57, Andrey Kuzmin wrote:
On Thu, Jun 10, 2010 at 11:51 PM, Arne Jansen > wrote:


Andrey Kuzmin wrote:

As to your results, it sounds almost too good to be true. As
Bob has pointed out, h/w design targeted hundreds IOPS, and it
was hard to believe it can scale 100x. Fantastic.


Hundreds IOPS is not quite true, even with hard drives. I just tested
a Hitachi 15k drive and it handles 67000 512 byte linear write/s,
cache


Linear? May be sequential?

Regards,
Andrey

enabled.

--Arne


Regards,
Andrey




On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski
mailto:mi...@task.gda.pl>
>> wrote:

   On 21/10/2009 03:54, Bob Friesenhahn wrote:


   I would be interested to know how many IOPS an OS like
Solaris
   is able to push through a single device interface.  The
normal
   driver stack is likely limited as to how many IOPS it can
   sustain for a given LUN since the driver stack is
optimized for
   high latency devices like disk drives.  If you are
creating a
   driver stack, the design decisions you make when
requests will
   be satisfied in about 12ms would be much different than if
   requests are satisfied in 50us.  Limitations of existing
   software stacks are likely reasons why Sun is designing
hardware
   with more device interfaces and more independent devices.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
As to your results, it sounds almost too good to be true. As Bob has pointed
out, h/w design targeted hundreds IOPS, and it was hard to believe it can
scale 100x. Fantastic.

Regards,
Andrey



On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski  wrote:

> On 21/10/2009 03:54, Bob Friesenhahn wrote:
>
>>
>> I would be interested to know how many IOPS an OS like Solaris is able to
>> push through a single device interface.  The normal driver stack is likely
>> limited as to how many IOPS it can sustain for a given LUN since the driver
>> stack is optimized for high latency devices like disk drives.  If you are
>> creating a driver stack, the design decisions you make when requests will be
>> satisfied in about 12ms would be much different than if requests are
>> satisfied in 50us.  Limitations of existing software stacks are likely
>> reasons why Sun is designing hardware with more device interfaces and more
>> independent devices.
>>
>
>
> Open Solaris 2009.06, 1KB READ I/O:
>
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
> # iostat -xnzCM 1|egrep "device|c[0123]$"
> [...]
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17497.30.0   17.10.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17498.80.0   17.10.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17277.60.0   16.90.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17441.30.0   17.00.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17333.90.0   16.90.0  0.0  0.80.00.0   0  82 c0
>
>
> Now lets see how it looks like for a single SAS connection but dd to 11x
> SSDs:
>
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t1d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t2d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t4d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t5d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t6d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t7d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t8d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t9d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t10d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t11d0p0&
>
> # iostat -xnzCM 1|egrep "device|c[0123]$"
> [...]
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104243.30.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104249.20.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104208.10.0  101.80.0  0.2  9.70.00.1   0 967 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104245.80.0  101.80.0  0.2  9.70.00.1   0 966 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104221.90.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104212.20.0  101.80.0  0.2  9.70.00.1   0 967 c0
>
>
> It looks like a single CPU core still hasn't been saturated and the
> bottleneck is in the device rather then OS/CPU. So the MPT driver in Solaris
> 2009.06 can do at least 100,000 IOPS to a single SAS port.
>
> It also scales well - I did run above dd's over 4x SAS ports at the same
> time and it scaled linearly by achieving well over 400k IOPS.
>
>
> hw used: x4270, 2x Intel X5570 2.93GHz, 4x SAS SG-PCIE8SAS-E-Z (fw.
> 1.27.3.0), connected to F5100.
>
>
> --
> Robert Milkowski
> http://milek.blogspot.com
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Casper . Dik

>On Thu, 10 Jun 2010, casper@sun.com wrote:
>>
>> Swap is perhaps the wrong name; it is really "virtual memory"; virtual
>> memory consists of real memory and swap on disk. In Solaris, a page
>> either exists on the physical swap device or in memory.  Of course, not
>> all memory is available as the kernel and other caches use a large part
>> of the memory.
>
>Don't forget that virtual memory pages may also come from memory 
>mapped files from the filesystem.  However, it seems that zfs is 
>effectively diminishing this.


I should have said "anonymous" virtual memory.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Ross Walker
On Jun 10, 2010, at 5:54 PM, Richard Elling   
wrote:



On Jun 10, 2010, at 1:24 PM, Arne Jansen wrote:


Andrey Kuzmin wrote:

Well, I'm more accustomed to  "sequential vs. random", but YMMW.
As to 67000 512 byte writes (this sounds suspiciously close to  
32Mb fitting into cache), did you have write-back enabled?


It's a sustained number, so it shouldn't matter.


That is only 34 MB/sec.  The disk can do better for sequential writes.


Not doing sector sized IO.

Besides this was a max IOPS number not max throughput number. If it  
were the OP might have used a 1M bs or better instead.


-Ross

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread Bob Friesenhahn

On Thu, 10 Jun 2010, Roy Sigurd Karlsbakk wrote:


The problem is all the top-posts and similar bottom-posts where 
everything in the thread is kept. This is not good netiquette, even 
in 2010.


I think that you may notice that most of the perpetrators are from 
Gmail.  It seems that Gmail is very good at hiding existing text in 
its user interface so people think nothing of including most/all of 
the email they are replying to.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Bob Friesenhahn

On Thu, 10 Jun 2010, casper@sun.com wrote:


Swap is perhaps the wrong name; it is really "virtual memory"; virtual
memory consists of real memory and swap on disk. In Solaris, a page
either exists on the physical swap device or in memory.  Of course, not
all memory is available as the kernel and other caches use a large part
of the memory.


Don't forget that virtual memory pages may also come from memory 
mapped files from the filesystem.  However, it seems that zfs is 
effectively diminishing this.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-10 Thread Bob Friesenhahn

On Wed, 9 Jun 2010, Edward Ned Harvey wrote:

disks.  That is, specifically:
 o If you do a large sequential read, with 3 mirrors (6 disks) then you get
6x performance of a single disk.


Should say "up to 6x".  Which disk in the pair will be read from is 
random so you are unlikely to get the full 6x.



 o If you do a large sequential read, with 7-disk raidz (capacity of 6
disks) then you get 6x performance of a single disk.


Probably should say "up to 6x" as well.  This configuration is more 
sensitive to latency and available disk IOPS becomes more critical.



 o If you do a large sequential write, with 3 mirrors (6 disks) then you
get 3x performance of a single disk.


Also an "up to" type value.  Perhaps you will only get 1.5X because of 
some I/O bottleneck between the CPU and the mirrored disks (i.e. two 
writes at once may cause I/O contention).


These rules of thumb are not terribly accurate.  If performance is 
important, then there is no substitute for actual testing.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread Roy Sigurd Karlsbakk
> > It's getting downright ridiculous. The digest people will kiss you.
> 
> But those reading via individual message email quite possibly will
> not. Quoting at least what you're actually responding to is crucial to
> making sense out here.

The problem is all the top-posts and similar bottom-posts where everything in 
the thread is kept. This is not good netiquette, even in 2010.

Vennlige hilsener / Best regards

roy
--
Roy Sigurd Karlsbakk
(+47) 97542685
r...@karlsbakk.net
http://blogg.karlsbakk.net/
--
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-10 Thread Bob Friesenhahn

On Wed, 9 Jun 2010, Travis Tabbal wrote:

NFS writes on ZFS blows chunks performance wise. The only way to 
increase the write speed is by using an slog


The above statement is not quite true. RAID-style adaptor cards which 
contain battery backed RAM or RAID arrays which include battery backed 
RAM also help immensely.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Casper . Dik



Swap is perhaps the wrong name; it is really "virtual memory"; virtual 
memory consists of real memory and swap on disk. In Solaris, a page
either exists on the physical swap device or in memory.  Of course, not
all memory is available as the kernel and other caches use a large part
of the memory.

When no swap based disk is in use, then there is sufficient free memory;
reserved is pages reserved, e.g., fork, (pages to copy when copy-on-write
happens) or allocated memory but not written to.

Casper

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Please trim posts

2010-06-10 Thread David Dyer-Bennet

On Thu, June 10, 2010 12:26, patto...@yahoo.com wrote:
> It's getting downright ridiculous. The digest people will kiss you.

But those reading via individual message email quite possibly will not. 
Quoting at least what you're actually responding to is crucial to making
sense out here.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Please trim posts

2010-06-10 Thread pattonme
It's getting downright ridiculous. The digest people will kiss you. 
Sent via BlackBerry from T-Mobile

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS host to host replication with AVS?

2010-06-10 Thread Maurice Volaski
Maybe there is another way to read those, but it looks to me like 
David says you

can trivially swap the roles of the nodes using the '-r' switch (and
he provides a
link to the documentation), and you say that you can't trivially swap
the roles of
the nodes.


The -r switch temporarily reverses the direction of data flow from 
the secondary to the primary to sync up an outdated primary. After 
that, the data flow reverts back to primary to secondary. Reversing 
the roles requires many more steps (and time)... 
http://docs.sun.com/source/819-6148-10/chap4.html#pgfId-1009132

--

Maurice Volaski, maurice.vola...@einstein.yu.edu
Computing Support, Rose F. Kennedy Center
Albert Einstein College of Medicine of Yeshiva University
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Reconfiguring a RAID-Z dataset

2010-06-10 Thread Roy Sigurd Karlsbakk




Hello, 

My understanding is that people are pretty much SOL if they want to reconfigure 
a RAID-Z or RAID-Z2 dataset to, say, a mirror+stripe? That is, there is no way 
to do this via a couple of simple commands? 

Just say, for the purpose of my general enlightenment and filing away for if I 
decide to change my config (as has been recommended), what would I have to do? 
Are we talking about copying the data off of the Solaris box, destroying the 
dataset and recreating it? You can't have the same disks setup to do both so 
that you can sort of plan for a switch between the two, can you? What other 
strategies might exist if I wanted to do this? What sort of pain would be in 
store for me if I were to go this route? Vennlige hilsener / Best regards 
Currently I know one strategy - copy the data out, destroy the zpool, create a 
new zpool, add flags you might want (compression, dedup etc, although I don't 
recommend dedup as of 134), restore the data. 

Best regards 

roy 
-- 
Roy Sigurd Karlsbakk 
(+47) 97542685 
r...@karlsbakk.net 
http://blogg.karlsbakk.net/ 
-- 
I all pedagogikk er det essensielt at pensum presenteres intelligibelt. Det er 
et elementært imperativ for alle pedagoger å unngå eksessiv anvendelse av 
idiomer med fremmed opprinnelse. I de fleste tilfeller eksisterer adekvate og 
relevante synonymer på norsk. 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Reconfiguring a RAID-Z dataset

2010-06-10 Thread Joe Auty




Hello,

My understanding is that people are pretty much SOL if they want to
reconfigure a RAID-Z or RAID-Z2 dataset to, say, a mirror+stripe? That
is, there is no way to do this via a couple of simple commands?

Just say, for the purpose of my general enlightenment and filing away
for if I decide to change my config (as has been recommended), what
would I have to do? Are we talking about copying the data off of the
Solaris box, destroying the dataset and recreating it? You can't have
the same disks setup to do both so that you can sort of plan for a
switch between the two, can you? What other strategies might exist if I
wanted to do this? What sort of pain would be in store for me if I were
to go this route?

Thanks in advance for your help, I've learned a lot from you guys!


-- 

Joe Auty, NetMusician
NetMusician
helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org
j...@netmusician.org




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Drive showing as "removed"

2010-06-10 Thread Joe Auty




Cindy Swearingen wrote:
Hi Joe,
  
  
I have no clue why this drive was removed, particularly for a one time
  
failure. I would reconnect/reseat this disk and see if the system
  
recognizes it. If it resilvers, then you're back in business, but I
  
would use zpool status and fmdump to monitor this pool and its devices
  
more often.
  
  
A current Solaris system also has the ability to retire a device that
  
is faulty. You can check this process with fmadm faulty. But I don't
  
think a one time device failure (May 31), would remove this disk from
  
service. I'm no device removal expert so maybe someone else will
  
comment.
  
  


Thanks again for all of your help Cindy and others!

I removed the drive and reinserted it, no change... So, I exported it
and imported it, and sure enough it was recognized and started to
resilver immediately. If this happens next time I'll know what to do!

Still no clue why this happened, there were no error messages, and
aside from having to add the -f flag with the export the whole task was
quite uneventful.



Thanks,
  
  
Cindy
  
  
On 06/08/10 23:56, Joe Auty wrote:
  
  Cindy Swearingen wrote:

According to this report, I/O to this
device caused a probe failure
  
because the device isn't available on May 31.
  
  
I was curious if this device had any previous issues over a longer
  
period of time.
  
  
Failing or faulted drives can also kill your pool's performance.

Any idea what happened here? Some weird one time fluky thing? Something
I ought to be concerned with?


Thanks,
  
  
Cindy
  
  
On 06/08/10 11:39, Joe Auty wrote:
  
  Cindy Swearingen wrote:

Joe,
  
  
Yes, the device should resilver when its back online.
  
  
You can use the fmdump -eV command to discover when this device was
  
removed and other hardware-related events to help determine when this
  
device was removed.
  
  
I would recommend exporting (not importing) the pool before physically
  
changing the hardware. After the device is back online and the pool is
  
imported, you might need to use zpool clear to clear the pool status. 

Here is the output of that command, does this reveal anything useful?
c0t7d0 is the drive that is marked as removed... I'll look into the
import and export functions to learn more about them. Thanks!


# fmdump -eV
  
TIME   CLASS
  
May 31 2010 05:33:36.363381880 ereport.fs.zfs.probe_failure
  
nvlist version: 0
  
    class = ereport.fs.zfs.probe_failure
  
    ena = 0x5d2206865ac00401
  
    detector = (embedded nvlist)
  
    nvlist version: 0
  
    version = 0x0
  
    scheme = zfs
  
    pool = 0x28ebd14a56dfe4df
  
    vdev = 0xdbdc49ecb5479c40
  
    (end detector)
  
  
    pool = nm
  
    pool_guid = 0x28ebd14a56dfe4df
  
    pool_context = 0
  
    pool_failmode = wait
  
    vdev_guid = 0xdbdc49ecb5479c40
  
    vdev_type = disk
  
    vdev_path = /dev/dsk/c0t7d0s0
  
    vdev_devid = id1,s...@n5000c5001e7cf7a7/a
  
    parent_guid = 0x16cbb2c1f07c5f51
  
    parent_type = raidz
  
    prev_state = 0x0
  
    __ttl = 0x1
  
    __tod = 0x4c038270 0x15a8c478 



Thanks,
  
  
Cindy
  
  
On 06/08/10 11:11, Joe Auty wrote:
  
  Cindy Swearingen wrote:

Hi Joe,
  
  
The REMOVED status generally means that a device was physically removed
  
from the system.
  
  
If necessary, physically reconnect c0t7d0 or if connected, check
  
cabling, power, and so on.
  
  
If the device is physically connected, see what cfgadm says about this
  
device. For example, a device that was unconfigured from the system
  
would look like  this:
  
  
# cfgadm -al | grep c4t2d0
  
c4::dsk/c4t2d0  disk connected    unconfigured  
unknown
  
  
(Finding the right cfgadm format for your h/w is another challenge.)
  
  
I'm very cautious about other people's data so consider this issue:
  
  
If possible, you might import the pool while you are physically
  
inspecting the device or changing it physically. Depending on your
  
hardware, I've heard of device paths changing if another device is
  
reseated or changes. 

Thanks Cindy

Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread devsk
Erik,

That doesn't explain anything. More of the same that I found in man page. What 
is swap allocated in physical memory? I have hard time wrapping my arms around 
that. Is it something like swap cache in Linux? If its disk-backed, where is 
the actual location of the backing store?

And the numbers? 473164k is not same as 256MB as per the table in that page. If 
you can explain the individual numbers and how they add up across 'swap -s', 
'swap -l' and 'top -b', that would be great!

-devsk






From: Erik Trimble 
Cc: devsk ; zfs-discuss@opensolaris.org
Sent: Wed, June 9, 2010 7:41:22 PM
Subject: Re: [zfs-discuss] swap - where is it coming from?

On 6/9/2010 7:20 PM, Greg Eanes wrote:
> On Wed, Jun 9, 2010 at 8:17 PM, devsk  wrote:
>
>> $ swap -s
>> total: 473164k bytes allocated + 388916k reserved = 862080k used, 6062060k 
>> available
>>
>> $ swap -l
>> swapfile devswaplo   blocks free
>> /dev/dsk/c6t0d0s1   215,1 8 12594952 12594952
>>
>> Can someone please do the math for me here? I am not able to figure the 
>> total.
>>
>> What is "473164k bytes allocated"? Where is it allocated? In some hidden zfs 
>> swap FS in my root pool?
>> What's the magic behind the number 473164k?
>> What is "388916k reserved"?
>> 862080k+6062060k != 12594952/2 - So, where did the rest of it come from? I 
>> just configured one device in /etc/vfstab.
>> --
>>  
>
> man swap
>
> "These numbers include swap  space  from  all  configured
>   swap  areas  as  listed  by  the -l option, as well swap
>   space in the form of physical memory."
>

This is also a reasonable explanation of what the output of 'swap -s' 
actually means.

http://www.softpanorama.org/Solaris/Processes_and_memory/swap_space_management.shtml

Look about half-way down the page, under "Monitoring Swap Resources".  
The whole page is worth a read, though.

-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA


  ___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] General help with understanding ZFS performance bottlenecks

2010-06-10 Thread Joe Auty




Garrett D'Amore wrote:

  You can hardly have too much.  At least 8 GB, maybe 16 would be good.

The benefit will depend on your workload, but zfs and buffer cache will use it all if you have a big enough read working set.
  


Could lack of RAM be contributing to some of my problems, do you think?



  
 -- Garrett

Joe Auty  wrote:

  
  
I'm also noticing that I'm a little short on RAM. I have 6 320 gig
drives and 4 gig of RAM. If the formula is POOL_SIZE/250, this would
mean that I need at least 6.4 gig of RAM.

What role does RAM play with queuing and caching and other things which
might impact overall disk performance? How much more RAM should I get?


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


  



-- 

Joe Auty, NetMusician
NetMusician
helps musicians, bands and artists create beautiful,
professional, custom designed, career-essential websites that are easy
to maintain and to integrate with popular social networks.
www.netmusician.org
j...@netmusician.org




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2010-06-10 Thread Eugen Leitl
On Thu, Jun 10, 2010 at 04:04:42PM +0300, Pasi Kärkkäinen wrote:

> > Intel X25-M G1 firmware 8820 (80GB MLC)
> > Intel X25-M G2 firmware 02HD (160GB MLC)
> > 
> 
> What problems did you have with the X25-M models?

I'm not the OP, but I've had two X25M G2's (80 and 160 GByte)
suddenly die out me, out of a sample size of maybe 20.

-- 
Eugen* Leitl http://leitl.org";>leitl http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Mike Gerdts
On Thu, Jun 10, 2010 at 9:39 AM, Andrey Kuzmin
 wrote:
> On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski  wrote:
>>
>> On 21/10/2009 03:54, Bob Friesenhahn wrote:
>>>
>>> I would be interested to know how many IOPS an OS like Solaris is able to
>>> push through a single device interface.  The normal driver stack is likely
>>> limited as to how many IOPS it can sustain for a given LUN since the driver
>>> stack is optimized for high latency devices like disk drives.  If you are
>>> creating a driver stack, the design decisions you make when requests will be
>>> satisfied in about 12ms would be much different than if requests are
>>> satisfied in 50us.  Limitations of existing software stacks are likely
>>> reasons why Sun is designing hardware with more device interfaces and more
>>> independent devices.
>>
>>
>> Open Solaris 2009.06, 1KB READ I/O:
>>
>> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
>
> /dev/null is usually a poor choice for a test lie this. Just to be on the
> safe side, I'd rerun it with /dev/random.
> Regards,
> Andrey

(aside from other replies about read vs. write and /dev/random...)

Testing performance of disk by reading from /dev/random and writing to
disk is misguided.  From random(7d):

   Applications retrieve random bytes by reading /dev/random
   or /dev/urandom. The /dev/random interface returns random
   bytes only when sufficient amount of entropy has been collected.

In other words, when the kernel doesn't think that it can give high
quality random numbers, it stops providing them until it has gathered
enough entropy.  It will pause your reads.

If instead you use /dev/urandom, the above problem doesn't exist, but
the generation of random numbers is CPU-intensive.  There is a
reasonable chance (particularly with slow CPU's and fast disk) that
you will be testing the speed of /dev/urandom rather than the speed of
the disk or other I/O components.

If your goal is to provide data that is not all 0's to prevent ZFS
compression from making the file sparse or want to be sure that
compression doesn't otherwise make the actual writes smaller, you
could try something like:

# create a file just over 100 MB
dd if=/dev/random of=/tmp/randomdata bs=513 count=204401
# repeatedly feed that file to dd
while true ; do cat /tmp/randomdataa ; done | dd of=/my/test/file
bs=... count=...

The above should make it so that it will take a while before there are
two blocks that are identical, thus confounding deduplication as well.

-- 
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] swap - where is it coming from?

2010-06-10 Thread Dennis Clarke

> Re-read the section on"Swap Space and Virtual Memory" for particulars on
> how Solaris does virtual memory mapping, and the concept of Virtual Swap
> Space, which is what 'swap -s' is really reporting on.

The Solaris Internals book is awesome for this sort of thing. A bit over
the top in detail but awesome regardless.

-- 
Dennis

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
Sorry, my bad. _Reading_ from /dev/null may be an issue, but not writing to
it, of course.

Regards,
Andrey



On Thu, Jun 10, 2010 at 6:46 PM, Robert Milkowski  wrote:

>  On 10/06/2010 15:39, Andrey Kuzmin wrote:
>
> On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski wrote:
>
>> On 21/10/2009 03:54, Bob Friesenhahn wrote:
>>
>>>
>>> I would be interested to know how many IOPS an OS like Solaris is able to
>>> push through a single device interface.  The normal driver stack is likely
>>> limited as to how many IOPS it can sustain for a given LUN since the driver
>>> stack is optimized for high latency devices like disk drives.  If you are
>>> creating a driver stack, the design decisions you make when requests will be
>>> satisfied in about 12ms would be much different than if requests are
>>> satisfied in 50us.  Limitations of existing software stacks are likely
>>> reasons why Sun is designing hardware with more device interfaces and more
>>> independent devices.
>>>
>>
>>
>> Open Solaris 2009.06, 1KB READ I/O:
>>
>> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
>>
>
>  /dev/null is usually a poor choice for a test lie this. Just to be on the
> safe side, I'd rerun it with /dev/random.
>
>
> That wouldn't work, would it?
> Please notice that I'm reading *from* an ssd and writing *to* /dev/null
>
>
> --
> Robert Milkowski
> http://milek.blogspot.com
>
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski

On 10/06/2010 15:39, Andrey Kuzmin wrote:
On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski > wrote:


On 21/10/2009 03:54, Bob Friesenhahn wrote:


I would be interested to know how many IOPS an OS like Solaris
is able to push through a single device interface.  The normal
driver stack is likely limited as to how many IOPS it can
sustain for a given LUN since the driver stack is optimized
for high latency devices like disk drives.  If you are
creating a driver stack, the design decisions you make when
requests will be satisfied in about 12ms would be much
different than if requests are satisfied in 50us.  Limitations
of existing software stacks are likely reasons why Sun is
designing hardware with more device interfaces and more
independent devices.



Open Solaris 2009.06, 1KB READ I/O:

# dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&


/dev/null is usually a poor choice for a test lie this. Just to be on 
the safe side, I'd rerun it with /dev/random.




That wouldn't work, would it?
Please notice that I'm reading *from* an ssd and writing *to* /dev/null

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Andrey Kuzmin
On Thu, Jun 10, 2010 at 6:06 PM, Robert Milkowski  wrote:

> On 21/10/2009 03:54, Bob Friesenhahn wrote:
>
>>
>> I would be interested to know how many IOPS an OS like Solaris is able to
>> push through a single device interface.  The normal driver stack is likely
>> limited as to how many IOPS it can sustain for a given LUN since the driver
>> stack is optimized for high latency devices like disk drives.  If you are
>> creating a driver stack, the design decisions you make when requests will be
>> satisfied in about 12ms would be much different than if requests are
>> satisfied in 50us.  Limitations of existing software stacks are likely
>> reasons why Sun is designing hardware with more device interfaces and more
>> independent devices.
>>
>
>
> Open Solaris 2009.06, 1KB READ I/O:
>
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
>

/dev/null is usually a poor choice for a test lie this. Just to be on the
safe side, I'd rerun it with /dev/random.

Regards,
Andrey


> # iostat -xnzCM 1|egrep "device|c[0123]$"
> [...]
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17497.30.0   17.10.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17498.80.0   17.10.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17277.60.0   16.90.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17441.30.0   17.00.0  0.0  0.80.00.0   0  82 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  17333.90.0   16.90.0  0.0  0.80.00.0   0  82 c0
>
>
> Now lets see how it looks like for a single SAS connection but dd to 11x
> SSDs:
>
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t1d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t2d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t4d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t5d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t6d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t7d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t8d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t9d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t10d0p0&
> # dd of=/dev/null bs=1k if=/dev/rdsk/c0t11d0p0&
>
> # iostat -xnzCM 1|egrep "device|c[0123]$"
> [...]
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104243.30.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104249.20.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104208.10.0  101.80.0  0.2  9.70.00.1   0 967 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104245.80.0  101.80.0  0.2  9.70.00.1   0 966 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104221.90.0  101.80.0  0.2  9.70.00.1   0 968 c0
>extended device statistics
>r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
>  104212.20.0  101.80.0  0.2  9.70.00.1   0 967 c0
>
>
> It looks like a single CPU core still hasn't been saturated and the
> bottleneck is in the device rather then OS/CPU. So the MPT driver in Solaris
> 2009.06 can do at least 100,000 IOPS to a single SAS port.
>
> It also scales well - I did run above dd's over 4x SAS ports at the same
> time and it scaled linearly by achieving well over 400k IOPS.
>
>
> hw used: x4270, 2x Intel X5570 2.93GHz, 4x SAS SG-PCIE8SAS-E-Z (fw.
> 1.27.3.0), connected to F5100.
>
>
> --
> Robert Milkowski
> http://milek.blogspot.com
>
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
>
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sharing root and cache on same SSD?

2010-06-10 Thread Peter Eriksson
Are there any potential problems that one should be aware of if you would like 
to make dual-use of a pair of SSD MLC units and use parts of them as mirrored 
(ZFS) boot disks, and then use the rest of them as ZFS L2ARC cache devices (for 
another zpool)?

The one thing I can think of is potential wear of the SSD devices due to 
writing of cache data to them, making them potentially fail earlier that they 
otherwise would have.

- Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2010-06-10 Thread Robert Milkowski

On 21/10/2009 03:54, Bob Friesenhahn wrote:


I would be interested to know how many IOPS an OS like Solaris is able 
to push through a single device interface.  The normal driver stack is 
likely limited as to how many IOPS it can sustain for a given LUN 
since the driver stack is optimized for high latency devices like disk 
drives.  If you are creating a driver stack, the design decisions you 
make when requests will be satisfied in about 12ms would be much 
different than if requests are satisfied in 50us.  Limitations of 
existing software stacks are likely reasons why Sun is designing 
hardware with more device interfaces and more independent devices.



Open Solaris 2009.06, 1KB READ I/O:

# dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
# iostat -xnzCM 1|egrep "device|c[0123]$"
[...]
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 17497.30.0   17.10.0  0.0  0.80.00.0   0  82 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 17498.80.0   17.10.0  0.0  0.80.00.0   0  82 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 17277.60.0   16.90.0  0.0  0.80.00.0   0  82 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 17441.30.0   17.00.0  0.0  0.80.00.0   0  82 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 17333.90.0   16.90.0  0.0  0.80.00.0   0  82 c0


Now lets see how it looks like for a single SAS connection but dd to 11x 
SSDs:


# dd of=/dev/null bs=1k if=/dev/rdsk/c0t0d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t1d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t2d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t4d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t5d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t6d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t7d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t8d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t9d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t10d0p0&
# dd of=/dev/null bs=1k if=/dev/rdsk/c0t11d0p0&

# iostat -xnzCM 1|egrep "device|c[0123]$"
[...]
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104243.30.0  101.80.0  0.2  9.70.00.1   0 968 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104249.20.0  101.80.0  0.2  9.70.00.1   0 968 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104208.10.0  101.80.0  0.2  9.70.00.1   0 967 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104245.80.0  101.80.0  0.2  9.70.00.1   0 966 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104221.90.0  101.80.0  0.2  9.70.00.1   0 968 c0
extended device statistics
r/sw/s   Mr/s   Mw/s wait actv wsvc_t asvc_t  %w  %b device
 104212.20.0  101.80.0  0.2  9.70.00.1   0 967 c0


It looks like a single CPU core still hasn't been saturated and the 
bottleneck is in the device rather then OS/CPU. So the MPT driver in 
Solaris 2009.06 can do at least 100,000 IOPS to a single SAS port.


It also scales well - I did run above dd's over 4x SAS ports at the same 
time and it scaled linearly by achieving well over 400k IOPS.



hw used: x4270, 2x Intel X5570 2.93GHz, 4x SAS SG-PCIE8SAS-E-Z (fw. 
1.27.3.0), connected to F5100.



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2010-06-10 Thread Pasi Kärkkäinen
On Thu, Jun 10, 2010 at 05:46:19AM -0700, Peter Eriksson wrote:
> Just a quick followup that the same issue still seems to be there on our 
> X4500s with the latest Solaris 10 with all the latest patches and the 
> following SSD disks:
> 
> Intel X25-M G1 firmware 8820 (80GB MLC)
> Intel X25-M G2 firmware 02HD (160GB MLC)
> 

What problems did you have with the X25-M models?

-- Pasi

> However - things seem to work smoothly with:
> 
> Intel X25-E G1 firmware 8850 (32GB SLC)
> OCZ Vertex 2 firmware 1.00 and 1.02 (100GB MLC)
> 
> I'm currently testing a setup with dual OCZ Vertex 2 100GB SSD units that 
> will be used both as mirrored boot/root (32GB of the 100GB), and the use the 
> rest of those disks as L2ARC cache devices for the big data zpool. And have 
> two mirrored X25-E as slog devices:
> 
> zpool create DATA raidz2 c0t0d0 c0t1d0 c1t0d0 c1t1d0 c2t0d0 c2t1d0 c3t1d0 \
>   raidz2 c4t0d0 c4t1d0 c5t0d0 c5t1d0 c0t2d0 c0t3d0 c3t2d0 \
>   raidz2 c1t2d0 c1t3d0 c2t2d0 c2t3d0 c4t2d0 c4t3d0 c3t3d0 \
>   raidz2 c5t2d0 c5t3d0 c0t4d0 c0t5d0 c1t4d0 c1t5d0 c3t5d0 \
>   raidz2 c2t4d0 c2t5d0 c4t4d0 c4t5d0 c5t4d0 c5t5d0 c3t6d0 \
>   raidz2 c0t6d0 c0t7d0 c1t6d0 c1t7d0 c2t6d0 c2t7d0 c3t7d0 \
>   spare c4t6d0 c5t6d0 \
>   cache c3t0d0s3 c3t4d0s3 \
>   log mirror c4t7d0 c5t7d0
> -- 
> This message posted from opensolaris.org
> ___
> zfs-discuss mailing list
> zfs-discuss@opensolaris.org
> http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Intel X25-E SSD in x4500 followup

2010-06-10 Thread Peter Eriksson
Just a quick followup that the same issue still seems to be there on our X4500s 
with the latest Solaris 10 with all the latest patches and the following SSD 
disks:

Intel X25-M G1 firmware 8820 (80GB MLC)
Intel X25-M G2 firmware 02HD (160GB MLC)

However - things seem to work smoothly with:

Intel X25-E G1 firmware 8850 (32GB SLC)
OCZ Vertex 2 firmware 1.00 and 1.02 (100GB MLC)

I'm currently testing a setup with dual OCZ Vertex 2 100GB SSD units that will 
be used both as mirrored boot/root (32GB of the 100GB), and the use the rest of 
those disks as L2ARC cache devices for the big data zpool. And have two 
mirrored X25-E as slog devices:

zpool create DATA raidz2 c0t0d0 c0t1d0 c1t0d0 c1t1d0 c2t0d0 c2t1d0 c3t1d0 \
  raidz2 c4t0d0 c4t1d0 c5t0d0 c5t1d0 c0t2d0 c0t3d0 c3t2d0 \
  raidz2 c1t2d0 c1t3d0 c2t2d0 c2t3d0 c4t2d0 c4t3d0 c3t3d0 \
  raidz2 c5t2d0 c5t3d0 c0t4d0 c0t5d0 c1t4d0 c1t5d0 c3t5d0 \
  raidz2 c2t4d0 c2t5d0 c4t4d0 c4t5d0 c5t4d0 c5t5d0 c3t6d0 \
  raidz2 c0t6d0 c0t7d0 c1t6d0 c1t7d0 c2t6d0 c2t7d0 c3t7d0 \
  spare c4t6d0 c5t6d0 \
  cache c3t0d0s3 c3t4d0s3 \
  log mirror c4t7d0 c5t7d0
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS host to host replication with AVS?

2010-06-10 Thread David Magda

On Jun 10, 2010, at 03:50, Fredrich Maney wrote:


David Magda wrote:


Either the primary node OR the secondary node can have active writes
to a volume, but NOT BOTH at the same time. Once the secondary
becomes active, and has made changes, you have to replicate the
changes back to the primary. Here's a good (though dated) demo of
the basic functionality:

 http://hub.opensolaris.org/bin/view/Project+avs/Demos


Maurice, watch the two parts of the demo. It will show how things work  
(at least with ZFS).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS host to host replication with AVS?

2010-06-10 Thread Fredrich Maney
On Wed, Jun 9, 2010 at 5:06 PM, Maurice Volaski
 wrote:
>> Are you sure of that? This directly contradicts what David Magda said
>> yesterday.
>
> Yes. Just how is what he said contradictory?

To quote from his message:

> Either the primary node OR the secondary node can have active writes
> to a volume, but NOT BOTH at the same time. Once the secondary
> becomes active, and has made changes, you have to replicate the
> changes back to the primary. Here's a good (though dated) demo of
> the basic functionality:
>
>   http://hub.opensolaris.org/bin/view/Project+avs/Demos
>
> The reverse replication is in Part 2, but I recommend watching them in
> order for proper context. For making the secondary send data to the primary:
>
>>-r
>>
>>Reverses the direction of the synchronization so the primary volume is
>>synchronized from the secondary volume. [...]
>>
>>   http://docs.sun.com/app/docs/doc/819-2240/sndradm-1m

>From your message:

> However, a critical difference is that after the primary fails and the 
> secondary
> takes over, you won't have a mirror until you bring the primary completely 
> back
> online as the primary. You can't make it the secondary temporarily. DRBD can
> trivially reverse the roles on the fly, so you can run the secondary as a 
> primary
> and primary as the secondary and the mirroring works in reverse automatically.

Maybe there is another way to read those, but it looks to me like David says you
can trivially swap the roles of the nodes using the '-r' switch (and
he provides a
link to the documentation), and you say that you can't trivially swap
the roles of
the nodes.

[...]

>>  > Unfortunately, there are no simple, easy to implement heartbeat
>> mechanisms
>>>
>>>  for Solaris.
>>
>> Not so. Sun/Solaris Cluster is (fairly) simple and (relatively) easy
>> to implement and it will handle all of the requirements that Moazam
>
> To be fair, I didn't actually try it, but, for one thing, though I might be
> wrong, it must be compiled manually to work with developer builds. Rather,
> ironically, perhaps, I cobbled together some bash scripts that perform basic
> heartbeat functionality.

I pretty much stick with the production release Solaris, not developer builds.
No compilation is necessary for Cluster on Solaris. It's a fairly
straight forward
pkg install and a few configuration commands.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss