Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity

2009-07-17 Thread Will Murnane
On Thu, Jul 16, 2009 at 21:30, Rob Loganr...@logan.com wrote:
 c4                             scsi-bus     connected    configured
 unknown
 c4::dsk/c4t15d0                disk         connected    configured
 unknown
  :
 c4::dsk/c4t33d0                disk         connected    configured
 unknown
 c4::es/ses0                    ESI          connected    configured
 unknown

 thanks! so SATA disks show up JBOD in IT mode.. Is there some magic that
 load balances the 4 SAS ports as this shows up as one scsi-bus?
Hypothetically, yes.  In practical terms, though, I've seen more than
300 MB/s of I/O over it:
capacity operationsbandwidth
pool  used  avail   read  write   read  write
---  -  -  -  -  -  -
data11.06T  1.21T  1  1.61K  2.49K   200M
  mirror  460G   236G  0522  1.15K  63.8M
c4t18d0  -  -  0518  6.38K  63.6M
c4t21d0  -  -  0518  12.8K  63.8M
  mirror  467G   229G  0533306  64.8M
c4t23d0  -  -  0523  6.38K  64.3M
c4t25d0  -  -  0529  0  65.0M
  mirror  153G   775G  0597  1.05K  71.8M
c4t20d0  -  -  0589  12.8K  72.5M
c4t22d0  -  -  0584  0  71.8M
---  -  -  -  -  -  -

Note that the pool is only doing 200 MB/s, but the individual devices
are doing a total of 400 MB/s.  It's not possible to put more than 300
MB/s into or out of a single device, so there's no link aggregation
to worry about.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] assertion failure

2009-07-17 Thread Thomas Maier-Komor
Hi,

I am just having trouble with my opensolaris in a virtual box. It
refuses to boot with the following crash dump:

panic[cpu0]/thread=d5a3edc0: assertion failed: 0 ==
dmu_buf_hold_array(os, object, offset, size, FALSE, FTAG, numbufs,
dbp), file: ../../common/fs/zfs/dmu.c, line: 614

d5a3eb08 genunix:assfail+5a (f9ce09da4, f9ce0a9c)
d5a3eb68 zfs:dmu_write+1a0 (d55af620, 57, 0, ba)
d5a3ec08 zfs:space_map_sync+304 (d5f13ed4, 1, d5f13c)
d5a3ec7b zfs:metaslab_sync+284 (d5f1ecc0, 122f3, 0,)
d5a3ecb8 zfs:vdev_sync+c6 (d579d940, 122f3,0)
d5a3ed28 zfs:spa_sync+3d0 (d579c980, 122f3,0,)
d5a3eda8 zfs:txg_sync_thread+308 (d55045c0, 0)
d5a3edb8 unix:thread_start+8 ()

This is on snv_117 32-bit

Is this a known issue? Any workarounds?

- Thomas
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS as a native cluster file system

2009-07-17 Thread Michael Sichler
I understand that ZFS in not a native cluster file system that permits 
concurrent access to multiple hosts as noted in the ZFS FAQs. In the FAQ it 
states that in the long term, it will be investigated.  ZFS is an excellent 
file system without the clustering feature, but if it had it, it would put it 
way over the top of any other file system.

Does anyone have an idea of when work may start on this?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] triple-parity: RAID-Z3

2009-07-17 Thread David Magda
Don't hear about triple-parity RAID that often:

 Author: Adam Leventhal
 Repository: /hg/onnv/onnv-gate
 Latest revision: 17811c723fb4f9fce50616cb740a92c8f6f97651
 Total changesets: 1
 Log message:
 6854612 triple-parity RAID-Z

http://mail.opensolaris.org/pipermail/onnv-notify/2009-July/009872.html
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6854612

(Via Blog O' Matty.)

Would be curious to see performance characteristics.




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity

2009-07-17 Thread Adam Sherman

On 17-Jul-09, at 1:45 , Will Murnane wrote:
I'm looking at the LSI SAS3801X because it seems to be what Sun  
OEMs for my

X4100s:

If you're given the choice (i.e., you have the M2 revision), PCI
Express is probably the bus to go with.  It's basically the same card,
but on a faster bus.  But there's nothing wrong with the PCI-X
version.


I have a stack of the original X4100s.


$280 or so, looks like. Might be overkill for me though.

The 3442X-R is a little cheaper: $205 from Provantage.
http://www.provantage.com/lsi-logic-lsi00164~7LSIG06K.htm



I don't get it, why is that one cheaper than:

http://www.provantage.com/lsi-logic-lsi00124~7LSIG03W.htm

Just newer?

A.

--
Adam Sherman
CTO, Versature Corp.
Tel: +1.877.498.3772 x113



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Can't offline a RAID-Z2 device: no valid replica

2009-07-17 Thread Cindy . Swearingen

Hi Laurent,

Yes, you should able to offline a faulty device in a redundant
configuration as long as enough devices are available to keep
the pool redundant.

On my Solaris Nevada system (latest bits), injecting a fault
into a disk in a RAID-Z configuration and then offlining a disk
works as expected.

On my Solaris 10 system, I'm unable to offline a faulted disk in
a RAID-Z configuration so I will get back to you with a bug ID
or some other plausible explanation.

Thanks for reporting this problem.

Cindy




Laurent Blume wrote:

You could offline the disk if [b]this[/b] disk (not
the pool) had a replica. Nothing wrong with the
documentation. Hmm, maybe it is little misleading
here. I walked into the same trap.



I apologize for being daft here, but I don't find any ambiguity in the 
documentation.
This is explicitly stated as being possible.

This scenario is possible assuming that the systems in question see the storage 
once it is attached to the new switches, possibly through different controllers than 
before, and your pools are set up as RAID-Z or mirrored configurations.

And lower, it even says that it's not possible to offline two devices in a 
RAID-Z, with that exact error as an example:

You cannot take a pool offline to the point where it becomes faulted. For 
example, you cannot take offline two devices out of a RAID-Z configuration, nor can 
you take offline a top-level virtual device.

# zpool offline tank c1t0d0
cannot offline c1t0d0: no valid replicas


http://docs.sun.com/app/docs/doc/819-5461/gazgm?l=ena=view

I don't understand what you mean by this disk not having a replica. It's 
RAID-Z2: by definition, all the data it contains is replicated on two other 
disks in the pool. That's why the pool is still working fine.



The pool is not using the disk anymore anyway, so
(from the zfs point of view) there is no need to
offline the disk. If you want to stop the io-system
from trying to access the disk, pull it out or wait
until it gives up...



Yes, there is. I don't want the disk to become online if the system reboots, 
because what actually happens is that it *never* gives up (well, at least not 
in more than 24 hours), and all I/O to the zpool stop as long as there are 
those errors. Yes, I know it should continue working. In practice, it does not 
(though it used to be much worse in previous versions of S10, with all I/O 
stopping on all disks and volumes, both ZFS and UFS, and usually ending in a 
panic).
And the zpool command hangs, and never finished. The only way to get out of it 
is to use cfgadm to send multiple hardware resets to the SATA device, then 
disconnect it. At this point, zpool completes and shows the disk as having 
faulted.


Laurent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity

2009-07-17 Thread Will Murnane
On Fri, Jul 17, 2009 at 11:43, Adam Shermanasher...@versature.com wrote:
 On 17-Jul-09, at 1:45 , Will Murnane wrote:

 I'm looking at the LSI SAS3801X because it seems to be what Sun OEMs for
 my
 X4100s:

 If you're given the choice (i.e., you have the M2 revision), PCI
 Express is probably the bus to go with.  It's basically the same card,
 but on a faster bus.  But there's nothing wrong with the PCI-X
 version.

 I have a stack of the original X4100s.
Ah, okay.  PCI-X it is.

 $280 or so, looks like. Might be overkill for me though.

 The 3442X-R is a little cheaper: $205 from Provantage.
 http://www.provantage.com/lsi-logic-lsi00164~7LSIG06K.htm


 I don't get it, why is that one cheaper than:

 http://www.provantage.com/lsi-logic-lsi00124~7LSIG03W.htm
When I understand LSI's pricing, I will let you know.  It's got a
different connector (8470 instead of 8088) on the outside, and one
internal port.

On the other hand, that card only gets you one external port.  If you
plan to have more than one JBOD per host, the 3801 makes more sense.

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pegging the system

2009-07-17 Thread Scott Laird
Have each node record results locally, and then merge pair-wise until
a single node is left with the final results?  If you can do merges
that way while reducing the size of the result set, then that's
probably going to be the most scalable way to generate overall
results.

On Thu, Jul 16, 2009 at 10:51 AM, Jeff Hafermanj...@haferman.com wrote:

 We have a SGE array task that we wish to run with elements 1-7.
 Each task generates output and takes roughly 20 seconds to 4 minutes
 of CPU time.  We're doing them on a machine with about 144 8-core nodes,
 and we've divvied the job up to do about 500 at a time.

 So, we have 500 jobs at a time writing to the same ZFS partition.

 What is the best way to collect the results of the task? Currently we
 are having each task write to STDOUT and then are combining the
 results. This nails our ZFS partition to the wall and kills
 performance for other users of the system.  We tried setting up a
 MySQL server to receive the results, but it couldn't take 1000
 simultaneous inbound connections.

 Jeff

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Understanding SAS/SATA Backplanes and Connectivity

2009-07-17 Thread Miles Nordin
 rl == Rob Logan r...@logan.com writes:

rl Is there some magic that load balances the 4 SAS ports as this
rl shows up as one scsi-bus?

The LSI card is not SATA framework.  I've the impression drive
enumeration and topology is handled by the proprietary firmware on the
card, so it's likely there isn't any explicit support for SAS
expanders inside solaris's binary mpt driver at all.  If you have x86
I think you can explore topology using the bootup Blue Screens of
Setup, but I don't have anything with SAS expander to test it.

I think the SAS standard itself has a concept of ``wide ports'' like
infiniband or PCIe, so I would speculate the 4 pairs are treated as
lanes rather than ports.


pgpUBrpiAUhC1.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] deduplication

2009-07-17 Thread Brandon High
The keynote was given on Wednesday. Any more willingness to discuss
dedup on the list now?

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS pegging the system

2009-07-17 Thread Louis-Frédéric Feuillette
On Thu, 2009-07-16 at 10:51 -0700, Jeff Haferman wrote:
 We have a SGE array task that we wish to run with elements 1-7.  
 Each task generates output and takes roughly 20 seconds to 4 minutes  
 of CPU time.  We're doing them on a machine with about 144 8-core nodes,
 and we've divvied the job up to do about 500 at a time.
 
 So, we have 500 jobs at a time writing to the same ZFS partition.

Sorry no answers, just some question that first came to mind.

Where is your bottleneck?  Is it drive I/O or Network?

Are all nodes accessing/writing via NFS?  Is this a NFS sync issue?
Might a SSD ZIL help?
-- 
Louis-Frédéric Feuillette jeb...@gmail.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss