[zfs-discuss] how to upgrade

2010-10-22 Thread sridhar surampudi
Hi,

zfs upgrade shows version as 4 and zpool upgrade shows version as 15.

and etc/release show Solaris 10 10/09 s10s_u8wos_08a SPARC.

And my zpool doen't have support for split.

Could you please suggest me how to upgrade my Solaris box with latest version 
for zfs and zpool to get 
updated support. 


Thanks  Regards,
sridhar.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to upgrade

2010-10-22 Thread Erik Trimble

 On 10/22/2010 1:51 AM, sridhar surampudi wrote:

Hi,

zfs upgrade shows version as 4 and zpool upgrade shows version as 15.

and etc/release show Solaris 10 10/09 s10s_u8wos_08a SPARC.

And my zpool doen't have support for split.

Could you please suggest me how to upgrade my Solaris box with latest version 
for zfs and zpool to get
updated support.


Thanks  Regards,
sridhar.
Solaris 10 Update 9 (09/10) supports the 'zpool split' command.  You'll 
need to perform a upgrade install to get that version installed first.


You'll need to upgrade any pool you plan to use the command on by doing 
this:


# zpool upgrade poolname






--
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Shared LUN's and ZFS

2010-10-22 Thread Tony MacDoodle
Is it possible to have a shared LUN between 2 servers using zfs?  The server
can see both LUN's but when I do an impoer I get:


bash-3.00# zpool import
pool: logs
id: 3700399958960377217
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
config:

logs ONLINE
c1t600144F0E849ECE34CC17CAB000Dd0 ONLINE
bash-3.00# zpool import logs
cannot import 'logs': pool may be in use from other system, it was last
accessed by pbmaster1 (hostid: 0x84fabfb5) on Fri Oct 22 07:57:53 2010
use '-f' to import anyway

When I do import it using the -f I can't see the files created on the other
node.


Thanks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shared LUN's and ZFS

2010-10-22 Thread Stephan Budach

Hi Tony,

Am 22.10.10 14:07, schrieb Tony MacDoodle:
Is it possible to have a shared LUN between 2 servers using zfs?  The 
server can see both LUN's but when I do an impoer I get:



bash-3.00# zpool import
pool: logs
id: 3700399958960377217
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
the '-f' flag.
see: http://www.sun.com/msg/ZFS-8000-EY
config:

logs ONLINE
c1t600144F0E849ECE34CC17CAB000Dd0 ONLINE
bash-3.00# zpool import logs
cannot import 'logs': pool may be in use from other system, it was 
last accessed by pbmaster1 (hostid: 0x84fabfb5) on Fri Oct 22 07:57:53 
2010

use '-f' to import anyway

When I do import it using the -f I can't see the files created on the 
other node.



ZFS is not a clustered file system, which can be mounted on several 
computers at the same time. Mounting a non-clustered file system on 
multiple nodes almost guarantees that you damage your file system 
immediately.


So, don't do that!

Cheers,
budy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Shared LUN's and ZFS

2010-10-22 Thread David Magda
On Fri, October 22, 2010 08:07, Tony MacDoodle wrote:
 Is it possible to have a shared LUN between 2 servers using zfs?  The
 server can see both LUN's but when I do an impoer I get:
[...]
 When I do import it using the -f I can't see the files created on the
 other node.

No, it is not possible. ZFS is not a clustered/shared file system. If you
want that functionality on Solaris you'll need to get something like QFS:

http://en.wikipedia.org/wiki/QFS

Under Linux a good example would be:

   http://en.wikipedia.org/wiki/Global_File_System

Many machines can see the LUNs, but a ZFS pool can only be mounted by one
system at a time. Having multiple machines seeing the pools is handy for
high availability fail over:

   http://en.wikipedia.org/wiki/Solaris_Cluster


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Does a zvol use the zil?

2010-10-22 Thread Miles Nordin
 re == Richard Elling richard.ell...@gmail.com writes:

re The risk here is not really different that that faced by
re normal disk drives which have nonvolatile buffers (eg
re virtually all HDDs and some SSDs).  This is why applications
re can send cache flush commands when they need to ensure the
re data is on the media.

It's probably different because of the iSCSI target reboot problem
I've written about before:

iSCSI initiator iSCSI target   nonvolatile medium

write A   
   -  ack A
write B   
   -  ack B
  --[A]
 [REBOOT]
write C   
[timeout!]
reconnect 
   -  ack Connected
write C   
   -  ack C
flush 
  - [C]
   -  ack Flush

in the above time chart, the initiator thinks A, B, and C are written,
but in fact only A and C are written.  I regard this as a failing of
imagination in the SCSI protocol, but probably with better
understanding of the details than I have the initiator could be made
to provably work around the problem.  My guess has always been that no
current initiators actually do, though.

I think it could happen also with a directly-attached SATA disk if you
remove power from the disk without rebooting the host, so as Richard
said it is not really different, except that in the real world it's
much more common for an iSCSI target to lose power without the
initiator's also losing power than it is for a disk to lose power
without its host adapter losing power.  The ancient practice of unix
filesystem design always considers cord-yanking as something happening
to the entire machine, and failing disks are not the filesystem's
responsibility to work arund because how could it?  This assumption
should have been changed and wasn't, when we entered the era of RAID
and removable disks, where the connections to disks and disks
themselves are both allowed to fail.  However, when NFS was designed,
the assumption *WAS* changed, and indeed NFSv2 and earlier operated
always with the write cache OFF to be safe from this, just as COMSTAR
does in its (default?) abyssmal-performance mode (so campuses bought
prestoserve cards (equivalent to a DDRDrive except much less silly
because they have onboard batteries), or auspex servers with included
NVRAM, which are analagous outside the NFS world to netapp/hitachi/emc
FC/iSCSI targets which always have big NVRAM's so they can leave the
write cache off), and NFSv3 has a commit protocol that is smart enough
to replay the 'write B' which makes the nonvolatile caches less
necessary (so long as you're not closing files frequently, I guess?).

I think it would be smart to design more storage systems so NFS can
replace the role of iSCSI, for disk access.  In Isilon or Lustre
clusters this trick is common when a node can settle with unshared
access to a subtree: create an image file on the NFS/Lustre back-end
and fill it with an ext3 or XFS, and writes to that inner filesystem
become much faster because this rube goldberg arrangement discards the
clsoe-to-open consistency guarantee.  We might use it in the ZFS world
for actual physical disk acess instead of iSCSI, ex., it should be
possible to NFS-export a zvol and see a share with a single file in it
named 'theTarget' or something, but this file would be without
read-ahead.  Better yet, to accomodate VMWare limitations, would be to
export a single fake /zvol share containing all NFS-shared zvol's, and
as you export zvol's their files appear within this share.  Also it
should be possible to mount vdev elements over NFS without
deadlocks---I know that is difficult, but VMWare does it.  Perahps it
cannot be done through the existing NFS client, but obviously it can
be done somehow, and it would both solve the iSCSI target reboot
problem and also allow using more kinds of proprietary storage
backend---the same reasons VMWare wants to give admins a choice
applies to ZFS.  When NFS is used in this way the disk image file is
never closed, so the NFS server will not need a slog to give good
performance: the same job is accomplished by double-caching the
uncommitted data on the client so it can be replayed if the time
diagram above happens.


pgp5D3EwpiIVp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Ian D
Some numbers...

zpool status

  pool: Pool_sas
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
Pool_sas ONLINE   0 0 0
  c4t5000C506A6D3d0  ONLINE   0 0 0
  c4t5000C506A777d0  ONLINE   0 0 0
  c4t5000C506AA43d0  ONLINE   0 0 0
  c4t5000C506AC4Fd0  ONLINE   0 0 0
  c4t5000C506AEF7d0  ONLINE   0 0 0
  c4t5000C506B27Fd0  ONLINE   0 0 0
  c4t5000C506B28Bd0  ONLINE   0 0 0
  c4t5000C506B46Bd0  ONLINE   0 0 0
  c4t5000C506B563d0  ONLINE   0 0 0
  c4t5000C506B643d0  ONLINE   0 0 0
  c4t5000C506B6D3d0  ONLINE   0 0 0
  c4t5000C506BBE7d0  ONLINE   0 0 0
  c4t5000C506C407d0  ONLINE   0 0 0
  c4t5000C506C657d0  ONLINE   0 0 0

errors: No known data errors

  pool: Pool_test
 state: ONLINE
 scan: none requested
config:

NAME STATE READ WRITE CKSUM
Pool_testONLINE   0 0 0
  c4t5000C5002103F093d0  ONLINE   0 0 0
  c4t5000C50021101683d0  ONLINE   0 0 0
  c4t5000C50021102AA7d0  ONLINE   0 0 0
  c4t5000C500211034D3d0  ONLINE   0 0 0
  c4t5000C500211035DFd0  ONLINE   0 0 0
  c4t5000C5002110480Fd0  ONLINE   0 0 0
  c4t5000C50021104F0Fd0  ONLINE   0 0 0
  c4t5000C50021119A43d0  ONLINE   0 0 0
  c4t5000C5002112392Fd0  ONLINE   0 0 0

errors: No known data errors

  pool: syspool
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
syspool ONLINE   0 0 0
  c0t0d0s0  ONLINE   0 0 0

errors: No known data errors
=

Pool_sas is made of 14x 146G 15K SAS Drives in a big stripe.  For this test 
there is no log device or cache.  Connected to it is a RedHat box using iSCSI 
through an Intel X520 10GbE NIC. It runs several large MySQL queries at once- 
each taking minutes to compute.

Pool_test is a stripe of 2TB SATA drives and a terrabyte of files is being 
copied to it for another box during this test.

Here's the pastebin of iostat -xdn 10 on the Linux box:
http://pastebin.com/431ESYaz

Here's the pastebin of iostat -xdn 10 on the Nexenta box:
http://pastebin.com/9g7KD3Ku

Here's the pastebin zpool iostat -v 10 on the Nexenta box:
http://pastebin.com/05fJL5sw

From these numbers it looks like the Linux box is waiting for data all the 
time while the Nexenta box isn't pulling nearly as much throughput and IOPS as 
it could.  Where is the bottleneck?

One thing suspicious is that we notice a slow down of one pool when the other 
is under load.  How can that be?

Ian
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Eric D. Mudama

On Wed, Oct 13 at 15:44, Edward Ned Harvey wrote:

From: Henrik Johansen [mailto:hen...@scannet.dk]

The 10g models are stable - especially the R905's are real workhorses.


You would generally consider all your machines stable now?
Can you easily pdsh to all those machines?

kstat | grep current_cstate ; kstat | grep supported_max_cstates


Dell T610, machine has been stable since we got it (relative to the
failure modes you've mentioned)

current_cstate  1
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
current_cstate  1
current_cstate  0
current_cstate  1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1
supported_max_cstates   1


--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Peter Taps
Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been turned on.

Let's say I copy the same file twice under different names at the initiator 
end. Let's say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block 
from the other file. Essentially, each pair of block being compared must have 
the same start location into the actual data.

For a shared filesystem, ZFS may internally ensure that the block starts match. 
However, over iSCSI, the initiator does not even know about the whole block 
mechanism that zfs has. It is just sending raw bytes to the target. This makes 
me wonder if dedup actually works over iSCSI. 

Can someone please enlighten me on what I am missing?

Thank you in advance for your help.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Neil Perrin

On 10/22/10 15:34, Peter Taps wrote:

Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been turned on.

Let's say I copy the same file twice under different names at the initiator 
end. Let's say each file ends up taking 5 blocks.

For dedupe to work, each block for a file must match the corresponding block 
from the other file. Essentially, each pair of block being compared must have 
the same start location into the actual data.
  


No,  ZFS doesn't care about the file offset, just that the checksum of 
the blocks matches.


For a shared filesystem, ZFS may internally ensure that the block starts match. However, over iSCSI, the initiator does not even know about the whole block mechanism that zfs has. It is just sending raw bytes to the target. This makes me wonder if dedup actually works over iSCSI. 


Can someone please enlighten me on what I am missing?

Thank you in advance for your help.

Regards,
Peter
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Peter Taps
Hi Neil,

if the file offset does not match, the chances that the checksum would match, 
especially sha256, is almost 0.

May be I am missing something. Let's say I have a file that contains 11 letters 
- ABCDEFGHIJK. Let's say the block size is 5.

For the first file, the block contents are ABCDE, FGHIJ, and K.

For the second file, let's say the blocks are  ABCD, EFGHI, and JK.

The chance that any checksum would match is very less. The chance that any 
checksum+verify would match is even less.

Regards,
Peter
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Kyle McDonald

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Hi All,

I'm currently considering purchasing 1 or 2 Dell R515's.

With up to 14 drives, and up to 64GB of RAM, it seems like it's well
suited
for a low-end ZFS server.

I know this box is new, but I wonder if anyone out there has any
experience with it?

How about the H700 SAS controller?

Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
want to put some SSD's in a box like this, but there's no way I'm
going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
they kidding?

  -Kyle

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.14 (MingW32)
 
iQEcBAEBAgAGBQJMwiMEAAoJEEADRM+bKN5w5IkH/AjOBKmnEUHIsSbW44Tmo94o
83kISEBx/hRYhLzNEpFYOW6IBD3pqYDGQP7da4ULMdPBINCWE6zcUT83BTct6O0D
MSHJXacciOILIMMj6SM6+auvv9WloWwrbV/S+KsvkKoLxzhBafYkxZOEMJlkBwp1
Jpm/P3EoWpNLBasSHCCvKsGskZUDpIgVnzKrMkzXV6R5ROlgYlmFNPGlC/1kbL1Y
9DZrlKow0Ai0W5fCXjGSafZbzawa4SpBj02ES7CUQLvn45EhaRrSkneAM4dy1obo
Oif4c1Nt2c0yV5xa1tc4i84Vd2iy9LR6g5C+1Hm3UqAKjcwPEEEUyAYhQpsKAIA=
=DW76
-END PGP SIGNATURE-

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Neil Perrin

On 10/22/10 17:28, Peter Taps wrote:

Hi Neil,

if the file offset does not match, the chances that the checksum would match, 
especially sha256, is almost 0.

May be I am missing something. Let's say I have a file that contains 11 letters 
- ABCDEFGHIJK. Let's say the block size is 5.

For the first file, the block contents are ABCDE, FGHIJ, and K.

For the second file, let's say the blocks are  ABCD, EFGHI, and JK.

The chance that any checksum would match is very less. The chance that any 
checksum+verify would match is even less.

Regards,
Peter


The block size and contents has to match for ZFS dedup.
See http://blogs.sun.com/bonwick/entry/zfs_dedup

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Changing vdev controller

2010-10-22 Thread Dave
I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog devices 
attached. 
I had a port go bad on one of my controllers (both are sat2-mv8), so I need to 
replace it (I have no spare ports on either card). My spare controller is a LSI 
1068 based 8 port card. 

My plan is to remove the l2arc and slog from the pool (to try and minimize any 
glitches), export the pool, change the controller, re-import and the add the 
l2arc and slog. Is that basically the correct process, or are there any tips 
for avoiding potential issues?

Thanks.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How does dedup work over iSCSI?

2010-10-22 Thread Haudy Kazemi

Neil Perrin wrote:

On 10/22/10 15:34, Peter Taps wrote:

Folks,

Let's say I have a volume being shared over iSCSI. The dedup has been 
turned on.


Let's say I copy the same file twice under different names at the 
initiator end. Let's say each file ends up taking 5 blocks.


For dedupe to work, each block for a file must match the 
corresponding block from the other file. Essentially, each pair of 
block being compared must have the same start location into the 
actual data.
  


No,  ZFS doesn't care about the file offset, just that the checksum of 
the blocks matches.




One conclusion is that one should be careful not to mess up file 
alignments when working with large files (like you might have in 
virtualization scenarios).  I.e. if you have a bunch of virtual machine 
image clones, they'll dedupe quite well initially.  However, if you then 
make seemingly minor changes inside some of those clones (like changing 
their partition offsets to do 1mb alignment), you'll lose most or all of 
the dedupe benefits.


General purpose compression tends to be less susceptible to changes in 
data offsets but also has its limits based on algorithm and dictionary 
size.  I think dedupe can be viewed as a special-case of compression 
that happens to work quite well for certain workloads when given ample 
hardware resources (compared to what would be needed to run without dedupe).


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Haudy Kazemi




One thing suspicious is that we notice a slow down of one pool when the other 
is under load.  How can that be?

Ian
  
A network switch that is being maxed out?  Some switches cannot switch 
at rated line speed on all their ports all at the same time.  Their 
internal buses simply don't have the bandwidth needed for that.  Maybe 
you are running into that limit?  (I know you mentioned bypassing the 
switch completely in some other tests and not noticing any difference.)


Any other hardware in common?

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Newbie ZFS Question: RAM for Dedup

2010-10-22 Thread Haudy Kazemi

Never Best wrote:

Sorry I couldn't find this anywhere yet.  For deduping it is best to have the 
lookup table in RAM, but I wasn't too sure how much RAM is suggested?

::Assuming 128KB Block Sizes, and 100% unique data:
1TB*1024*1024*1024/128 = 8388608 Blocks
::Each Block needs 8 byte pointer?
8388608*8 = 67108864 bytes
::Ram suggest per TB
67108864/1024/1024 = 64MB

So if I understand correctly we should have a min of 64MB RAM per TB for 
deduping? *hopes my math wasn't way off*, or is there significant extra 
overhead stored per block for the lookup table?  For example is there some kind 
of redundancy on the lookup table (relation to RAM space requirments) to 
counter corruption?

I read some articles and they all mention that there is significant performance 
loss if the table isn't in RAM, but none really mentioned how much RAM one 
should have per TB of duping.

Thanks, hope someone can confirm *or give me the real numbers* for me.  I know 
blocksize is variable; I'm most interessted in the default zfs setup right now.
  
There were several detailed discussions about this over the past 6 
months that should be in the archives.  I believe most of the info came 
from Richard Elling.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] vdev failure - pool loss ?

2010-10-22 Thread Haudy Kazemi

Bob Friesenhahn wrote:

On Tue, 19 Oct 2010, Cindy Swearingen wrote:


unless you use copies=2 or 3, in which case your data is still safe
for those datasets that have this option set.


This advice is a little too optimistic. Increasing the copies property
value on datasets might help in some failure scenarios, but probably not
in more catastrophic failures, such as multiple device or hardware
failures.


It is 100% too optimistic.  The copies option only duplicates the user 
data.  While zfs already duplicates the metadata (regardless of copies 
setting), it is not designed to function if a vdev fails.


Bob


Some future filesystem (not zfs as currently implemented) could be 
designed to handle certain vdev failures where multiple vdevs were used 
without redundancy at the vdev level.  In this scenario, the redundant 
metadata and user data with copies=2+ would still be accessible by 
virtue of it having been spread across the vdevs, with at least one copy 
surviving.  Expanding upon this design would allow raw space to be 
added, with redundancy being set by a 'copies' parameter.


I understand the copies parameter to currently be designed and intended 
as an extra assurance against failures that affect single blocks but not 
whole devices.  I.e. run ZFS on a laptop with a single hard drive, and 
use copies=2 to protect against bad sectors but not complete drive 
failures.  I have not tested this, however I imagine that performance is 
the reason to use copies=2 instead of partitioning/slicing the drive 
into two halves and mirroring the two halves back together.  I also 
recall seeing something about the copies parameter attempting to spread 
the copies across different devices, as much as possible.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Kyle McDonald
 
 I'm currently considering purchasing 1 or 2 Dell R515's.
 
 With up to 14 drives, and up to 64GB of RAM, it seems like it's well
 suited
 for a low-end ZFS server.
 
 I know this box is new, but I wonder if anyone out there has any
 experience with it?
 
 How about the H700 SAS controller?
 
 Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
 want to put some SSD's in a box like this, but there's no way I'm
 going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
 they kidding?

You are asking for a world of hurt.  You may luck out, and it may work
great, thus saving you money.  Take my example for example ... I took the
safe approach (as far as any non-sun hardware is concerned.)  I bought an
officially supported dell server, with all dell blessed and solaris
supported components, with support contracts on both the hardware and
software, fully patched and updated on all fronts, and I am getting system
failures approx once per week.  I have support tickets open with both dell
and oracle right now ... Have no idea how it's all going to turn out.  But
if you have a problem like mine, using unsupported hardware, you have no
alternative.  You're up a tree full of bees, naked, with a hunter on the
ground trying to shoot you.  And IMHO, I think the probability of having a
problem like mine is higher when you use the unsupported hardware.  But of
course there's no definable way to quantize that belief.

My advice to you is:  buy the supported hardware, and the support contracts
for both the hardware and software.  But of course, that's all just a
calculated risk, and I doubt you're going to take my advice.  ;-)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Tim Cook
On Fri, Oct 22, 2010 at 10:40 PM, Haudy Kazemi kaze0...@umn.edu wrote:



 One thing suspicious is that we notice a slow down of one pool when the
 other is under load.  How can that be?

 Ian


 A network switch that is being maxed out?  Some switches cannot switch at
 rated line speed on all their ports all at the same time.  Their internal
 buses simply don't have the bandwidth needed for that.  Maybe you are
 running into that limit?  (I know you mentioned bypassing the switch
 completely in some other tests and not noticing any difference.)

 Any other hardware in common?




There's almost 0 chance a switch is being overrun by a single gigE
connection.  The worst switch I've seen is roughly 8:1 oversubscribed.
 You'd have to be maxing out many, many ports for a switch to be a problem.

Likely you don't have enough ram or CPU in the box.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Running on Dell hardware?

2010-10-22 Thread Tim Cook
On Fri, Oct 22, 2010 at 10:53 PM, Edward Ned Harvey sh...@nedharvey.comwrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of Kyle McDonald
 
  I'm currently considering purchasing 1 or 2 Dell R515's.
 
  With up to 14 drives, and up to 64GB of RAM, it seems like it's well
  suited
  for a low-end ZFS server.
 
  I know this box is new, but I wonder if anyone out there has any
  experience with it?
 
  How about the H700 SAS controller?
 
  Anyone know where to find the Dell 3.5 sleds that take 2.5 drives? I
  want to put some SSD's in a box like this, but there's no way I'm
  going to pay Dell's SSD prices. $1300 for a 50GB 'mainstream' SSD? Are
  they kidding?

 You are asking for a world of hurt.  You may luck out, and it may work
 great, thus saving you money.  Take my example for example ... I took the
 safe approach (as far as any non-sun hardware is concerned.)  I bought an
 officially supported dell server, with all dell blessed and solaris
 supported components, with support contracts on both the hardware and
 software, fully patched and updated on all fronts, and I am getting system
 failures approx once per week.  I have support tickets open with both dell
 and oracle right now ... Have no idea how it's all going to turn out.  But
 if you have a problem like mine, using unsupported hardware, you have no
 alternative.  You're up a tree full of bees, naked, with a hunter on the
 ground trying to shoot you.  And IMHO, I think the probability of having a
 problem like mine is higher when you use the unsupported hardware.  But of
 course there's no definable way to quantize that belief.

 My advice to you is:  buy the supported hardware, and the support contracts
 for both the hardware and software.  But of course, that's all just a
 calculated risk, and I doubt you're going to take my advice.  ;-)




Dell requires Dell branded drives as of roughly 8 months ago.  I don't think
there was ever an H700 firmware released that didn't require this.  I'd bet
you're going to waste a lot of money to get a drive the system refuses to
recognize.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Changing vdev controller

2010-10-22 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Dave
 
 I have a 14 drive pool, in a 2x 7 drive raidz2, with l2arc and slog
 devices attached.
 I had a port go bad on one of my controllers (both are sat2-mv8), so I
 need to replace it (I have no spare ports on either card). My spare
 controller is a LSI 1068 based 8 port card.
 
 My plan is to remove the l2arc and slog from the pool (to try and
 minimize any glitches), export the pool, change the controller, re-
 import and the add the l2arc and slog. Is that basically the correct
 process, or are there any tips for avoiding potential issues?

You really don't need to do that.  You can just export (or shutdown)... swap
controllers, and bring it up again.  No need to remove the l2arc or slog.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Phil Harman
What more info could you provide? Quite a lot more, actually, like: how many 
streams of SQL and copy are you running? how are the filesystems/zvols 
configured (recordsize, etc)? some CPU, VM and network stats would also be nice.

Based on the nexenta iostats you've provided (a tiny window on what's 
happening), it appears that you have an 8k recordsize for SQL.

If you add up all the IOPS for the SQL, it's roughly 2000 reads at around 3ms 
each. Which might indicate at least 6 reads outstanding at any time. So how 
many queries do you have running in parallel? If you add more, I'd expect the 
service times to increase.

3ms isn't much for spinning rust, but isn't this why you are planning to use 
lots of L2ARC?

Could be a similar story on writes. How many parallel streams? How many files? 
What's the average file size? What's the client filesysyem? How much does it 
sync to the server? Could it be that your client apps are always waiting for 
the spinning rust? Does an SSD log make any difference on this pool?

Sent from my iPhone

On 22 Oct 2010, at 19:57, Ian D rewar...@hotmail.com wrote:

 Some numbers...
 
 zpool status
 
  pool: Pool_sas
 state: ONLINE
 scan: none requested
 config:
 
NAME STATE READ WRITE CKSUM
Pool_sas ONLINE   0 0 0
  c4t5000C506A6D3d0  ONLINE   0 0 0
  c4t5000C506A777d0  ONLINE   0 0 0
  c4t5000C506AA43d0  ONLINE   0 0 0
  c4t5000C506AC4Fd0  ONLINE   0 0 0
  c4t5000C506AEF7d0  ONLINE   0 0 0
  c4t5000C506B27Fd0  ONLINE   0 0 0
  c4t5000C506B28Bd0  ONLINE   0 0 0
  c4t5000C506B46Bd0  ONLINE   0 0 0
  c4t5000C506B563d0  ONLINE   0 0 0
  c4t5000C506B643d0  ONLINE   0 0 0
  c4t5000C506B6D3d0  ONLINE   0 0 0
  c4t5000C506BBE7d0  ONLINE   0 0 0
  c4t5000C506C407d0  ONLINE   0 0 0
  c4t5000C506C657d0  ONLINE   0 0 0
 
 errors: No known data errors
 
  pool: Pool_test
 state: ONLINE
 scan: none requested
 config:
 
NAME STATE READ WRITE CKSUM
Pool_testONLINE   0 0 0
  c4t5000C5002103F093d0  ONLINE   0 0 0
  c4t5000C50021101683d0  ONLINE   0 0 0
  c4t5000C50021102AA7d0  ONLINE   0 0 0
  c4t5000C500211034D3d0  ONLINE   0 0 0
  c4t5000C500211035DFd0  ONLINE   0 0 0
  c4t5000C5002110480Fd0  ONLINE   0 0 0
  c4t5000C50021104F0Fd0  ONLINE   0 0 0
  c4t5000C50021119A43d0  ONLINE   0 0 0
  c4t5000C5002112392Fd0  ONLINE   0 0 0
 
 errors: No known data errors
 
  pool: syspool
 state: ONLINE
 scan: none requested
 config:
 
NAMESTATE READ WRITE CKSUM
syspool ONLINE   0 0 0
  c0t0d0s0  ONLINE   0 0 0
 
 errors: No known data errors
 =
 
 Pool_sas is made of 14x 146G 15K SAS Drives in a big stripe.  For this test 
 there is no log device or cache.  Connected to it is a RedHat box using iSCSI 
 through an Intel X520 10GbE NIC. It runs several large MySQL queries at once- 
 each taking minutes to compute.
 
 Pool_test is a stripe of 2TB SATA drives and a terrabyte of files is being 
 copied to it for another box during this test.
 
 Here's the pastebin of iostat -xdn 10 on the Linux box:
 http://pastebin.com/431ESYaz
 
 Here's the pastebin of iostat -xdn 10 on the Nexenta box:
 http://pastebin.com/9g7KD3Ku
 
 Here's the pastebin zpool iostat -v 10 on the Nexenta box:
 http://pastebin.com/05fJL5sw
 
 From these numbers it looks like the Linux box is waiting for data all the 
 time while the Nexenta box isn't pulling nearly as much throughput and IOPS 
 as it could.  Where is the bottleneck?
 
 One thing suspicious is that we notice a slow down of one pool when the other 
 is under load.  How can that be?
 
 Ian
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance issues with iSCSI under Linux

2010-10-22 Thread Haudy Kazemi

Tim Cook wrote:



On Fri, Oct 22, 2010 at 10:40 PM, Haudy Kazemi kaze0...@umn.edu 
mailto:kaze0...@umn.edu wrote:




One thing suspicious is that we notice a slow down of one pool
when the other is under load.  How can that be?

Ian
 


A network switch that is being maxed out?  Some switches cannot
switch at rated line speed on all their ports all at the same
time.  Their internal buses simply don't have the bandwidth needed
for that.  Maybe you are running into that limit?  (I know you
mentioned bypassing the switch completely in some other tests and
not noticing any difference.)

Any other hardware in common?




There's almost 0 chance a switch is being overrun by a single gigE 
connection.  The worst switch I've seen is roughly 8:1 oversubscribed. 
 You'd have to be maxing out many, many ports for a switch to be a 
problem.


Likely you don't have enough ram or CPU in the box.

--Tim



I agree, but also trying not to assume anything.  Looking back, Ian's 
first email said '10GbE on a dedicated switch'.  I don't think the 
switch model was ever identified...perhaps it is a 1 GbE switch with a 
few 10 GbE ports?  (Drawing at straws.)



What happens when Windows is the iSCSI initiator connecting to an iSCSI 
target on ZFS?  If that is also slow, the issue is likely not in Windows 
or in Linux.


Do CIFS shares (connected to from Linux and from Windows) show the same 
performance problems as iSCSI and NFS?  If yes, this would suggest a 
common cause item on the ZFS side.



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss