Re: [zfs-discuss] zfs and iscsi performance help

2012-01-27 Thread Hung-Sheng Tsao (laoTsao)
hi
IMHO, upgrade to s11 if possible
use the COMSTAR based iscsi 

Sent from my iPad

On Jan 26, 2012, at 23:25, Ivan Rodriguez ivan...@gmail.com wrote:

 Dear fellows,
 
 We have a backup server with a zpool size of 20 TB, we transfer
 information using zfs snapshots every day (we have around 300 fs on
 that pool),
 the storage is a dell md3000i connected by iscsi, the pool is
 currently version 10, the same storage is connected
 to another server with a smaller pool of 3 TB(zpool version 10) this
 server is working fine and speed is good between the storage
 and the server, however  in the server with 20 TB pool performance is
 an issue  after we restart the server
 performance is good but with the time lets say a week the performance
 keeps dropping until we have to
 bounce the server again (same behavior with new version of solaris in
 this case performance drops in 2 days), no errors in logs or storage
 or the zpool status -v
 
 We suspect that the pool has some issues probably there is corruption
 somewhere, we tested solaris 10 8/11 with zpool 29,
 although we haven't update the pool itself, with the new solaris the
 performance is even worst and every time
 that we restart the server we get stuff like this:
 
 SOURCE: zfs-diagnosis, REV: 1.0
 EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
 DESC: All faults associated with an event id have been addressed.
 Refer to http://sun.com/msg/FMD-8000-4M for more information.
 AUTO-RESPONSE: Some system components offlined because of the
 original fault may have been brought back online.
 IMPACT: Performance degradation of the system due to the original
 fault may have been recovered.
 REC-ACTION: Use fmdump -v -u EVENT-ID to identify the repaired components.
 [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
 VER: 1, SEVERITY: Minor
 
 And we need to export and import the pool in order to be  able to  access it.
 
 Now my question is do you guys know if we upgrade the pool does this
 process  fix some issues in the metadata of the pool ?
 We've been holding back the upgrade because we know that after the
 upgrade there is no way to return to version 10.
 
 Does anybody has experienced corruption in the pool without a hardware
 failure ?
 Is there any tools or procedures to find corruption on the pool or
 File systems inside the pool ? (besides scrub)
 
 So far we went through the connections cables, ports and controllers
 between the storage and the server everything seems fine, we've
 swapped network interfaces, cables, switch ports etc etc.
 
 
 Any ideas would be really appreciate it.
 
 Cheers
 Ivan
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and iscsi performance help

2012-01-27 Thread Gary Mills
On Fri, Jan 27, 2012 at 03:25:39PM +1100, Ivan Rodriguez wrote:
 
 We have a backup server with a zpool size of 20 TB, we transfer
 information using zfs snapshots every day (we have around 300 fs on
 that pool),
 the storage is a dell md3000i connected by iscsi, the pool is
 currently version 10, the same storage is connected
  to another server with a smaller pool of 3 TB(zpool version 10) this
 server is working fine and speed is good between the storage
 and the server, however  in the server with 20 TB pool performance is
 an issue  after we restart the server
 performance is good but with the time lets say a week the performance
 keeps dropping until we have to
 bounce the server again (same behavior with new version of solaris in
 this case performance drops in 2 days), no errors in logs or storage
 or the zpool status -v

This sounds like a ZFS cache problem on the server.  You might check
on how cache statistics change over time.  Some tuning may eliminate
this degradation.  More memory may also help.  Does a scrub show any
errors?  Does the performance drop affect reads or writes or both?

 We suspect that the pool has some issues probably there is corruption
 somewhere, we tested solaris 10 8/11 with zpool 29,
 although we haven't update the pool itself, with the new solaris the
 performance is even worst and every time
 that we restart the server we get stuff like this:
 
  SOURCE: zfs-diagnosis, REV: 1.0
  EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
 DESC: All faults associated with an event id have been addressed.
  Refer to http://sun.com/msg/FMD-8000-4M for more information.
  AUTO-RESPONSE: Some system components offlined because of the
 original fault may have been brought back online.
  IMPACT: Performance degradation of the system due to the original
 fault may have been recovered.
  REC-ACTION: Use fmdump -v -u EVENT-ID to identify the repaired components.
 [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
 VER: 1, SEVERITY: Minor
 
 And we need to export and import the pool in order to be  able to  access it.

This is a separate problem, introduced with an upgrade to the Iscsi
service.  The new one has a dependancy on the name service (typically
DNS), which means that it isn't available when the zpool import is
done during the boot.  Check with Oracle support to see if they have
found a solution.

-- 
-Gary Mills--refurb--Winnipeg, Manitoba, Canada-
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and iscsi performance help

2012-01-27 Thread Richard Elling
Hi Ivan,

On Jan 26, 2012, at 8:25 PM, Ivan Rodriguez wrote:

 Dear fellows,
 
 We have a backup server with a zpool size of 20 TB, we transfer
 information using zfs snapshots every day (we have around 300 fs on
 that pool),
 the storage is a dell md3000i connected by iscsi, the pool is
 currently version 10, the same storage is connected
 to another server with a smaller pool of 3 TB(zpool version 10) this
 server is working fine and speed is good between the storage
 and the server, however  in the server with 20 TB pool performance is
 an issue  after we restart the server
 performance is good but with the time lets say a week the performance
 keeps dropping until we have to
 bounce the server again (same behavior with new version of solaris in
 this case performance drops in 2 days), no errors in logs or storage
 or the zpool status -v
 
 We suspect that the pool has some issues probably there is corruption
 somewhere, we tested solaris 10 8/11 with zpool 29,
 although we haven't update the pool itself, with the new solaris the
 performance is even worst and every time

If you upgrade to zpool version 29 or later, then you will be tied to the
lawnmower (Oracle) forever. Several changes related to snapshot performance
were introduced in version 28 and earlier.

 that we restart the server we get stuff like this:
 
 SOURCE: zfs-diagnosis, REV: 1.0
 EVENT-ID: 0168621d-3f61-c1fc-bc73-c50efaa836f4
 DESC: All faults associated with an event id have been addressed.
 Refer to http://sun.com/msg/FMD-8000-4M for more information.
 AUTO-RESPONSE: Some system components offlined because of the
 original fault may have been brought back online.
 IMPACT: Performance degradation of the system due to the original
 fault may have been recovered.
 REC-ACTION: Use fmdump -v -u EVENT-ID to identify the repaired components.
 [ID 377184 daemon.notice] SUNW-MSG-ID: FMD-8000-6U, TYPE: Resolved,
 VER: 1, SEVERITY: Minor
 
 And we need to export and import the pool in order to be  able to  access it.

The MD3000i systems that I have used have an irritating behavior when the LUNs
are scanned (eg during zpool import). There is an out-of-band systems management
LUN that takes up to 1 minute to respond to a SCSI inquiry. During a zpool 
import, 
Solaris tries to inquire each of the LUNs to see if they contain pool parts. 
Depending
on the various timeout values set in the iSCSI client stack, this can be 
painful. I am not
aware of a workaround or bug fix on the Dell side and Dell docs just say don't 
use that
LUN

 
 Now my question is do you guys know if we upgrade the pool does this
 process  fix some issues in the metadata of the pool ?
 We've been holding back the upgrade because we know that after the
 upgrade there is no way to return to version 10.

To remain more flexible, avoid zpool version 29 or later.

 
 Does anybody has experienced corruption in the pool without a hardware
 failure ?

Yes, but I don't think that is your current problem.

 Is there any tools or procedures to find corruption on the pool or
 File systems inside the pool ? (besides scrub)

scrub is the method.

 
 So far we went through the connections cables, ports and controllers
 between the storage and the server everything seems fine, we've
 swapped network interfaces, cables, switch ports etc etc.
 
 
 Any ideas would be really appreciate it.

HTH,
 -- richard

--
ZFS Performance and Training
richard.ell...@richardelling.com
+1-760-896-4422



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss