Re: [zfs-discuss] Promise Ultra133TX2?

2007-03-05 Thread James Blackburn

I have one working under OpenSolaris x86.
See:
http://jimmery.blogspot.com/2007/01/promise-ide-ultra133-tx2-and.html
 someone else:
http://wiki.complexfission.com/twiki/bin/view/Main/OpenSolarisOS

Cheers,

James

On 5 Mar 2007, at 03:45, Luke Scharf wrote:

Has anyone made the Promise Ultra133TX2 2-port PCI-IDE card work  
with Solaris x86 11/06?


I've seen some references to the Ultra100TX2, but it doesn't seem  
to refer to the version that I'm using.


Thanks,
-Luke

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] How to interrupt a zpool scrub?

2007-03-05 Thread Peter Dennis - Solaris Sustaining Engineering

Hi Thomas,

The man page for zpool has:

zpool scrub [-s] pool ...

 Begins a scrub. The  scrub  examines  all  data  in  the
 specified  pools  to verify that it checksums correctly.
 For replicated (mirror or raidz) devices, ZFS  automati-
 cally  repairs  any  damage discovered during the scrub.
 The zpool status command reports the progress  of  the
 scrub  and summarizes the results of the scrub upon com-
 pletion.


Because  scrubbing  and  resilvering  are  I/O-intensive
 operations, ZFS only allows one at a time. If a scrub is
 already in progress,  the  zpool  scrub  command  ter-
 minates  it  and starts a new scrub. If a resilver is in
 progress, ZFS does not allow a scrub to be started until
 the resilver completes.

 -s  Stop scrubbing.

When run the status of the pool has:

 scrub: scrub stopped with 0 errors on Mon Mar  5 09:51:52 2007

as opposed to:

 scrub: scrub completed with 0 errors on Mon Mar  5 09:51:16 2007

Hope that helps,

pete

Thomas Werschlein wrote:

Dear all

Is there a way to stop a running scrub on a zfs pool? Same question applies to 
a running resilver.
Both render our fileserver unusable due to massive CPU load so we'd like to 
postpone them.

In the docs it says that resilvering and scrubbing survive a reboot, so I am 
not even sure if a reboot would stop scrubbing or resilvering.

Any help greatly appreciated!

Cheers, Thomas
 
 
This message posted from opensolaris.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS stalling problem

2007-03-05 Thread Selim Daoud

one question,
is there a way to stop the default txg push behaviour (push at regular
timestep-- default is 5sec) but instead push them on the fly...I
would imagine this is better in the case of an application doing big
sequential write (video streaming... )

s.

On 3/5/07, Jeff Bonwick [EMAIL PROTECTED] wrote:

Jesse,

This isn't a stall -- it's just the natural rhythm of pushing out
transaction groups.  ZFS collects work (transactions) until either
the transaction group is full (measured in terms of how much memory
the system has), or five seconds elapse -- whichever comes first.

Your data would seem to suggest that the read side isn't delivering
data as fast as ZFS can write it.  However, it's possible that
there's some sort of 'breathing' effect that's hurting performance.
One simple experiment you could try: patch txg_time to 1.  That
will cause ZFS to push transaction groups every second instead of
the default of every 5 seconds.  If this helps (or if it doesn't),
please let us know.

Thanks,

Jeff

Jesse DeFer wrote:
 Hello,

 I am having problems with ZFS stalling when writing, any help in 
troubleshooting would be appreciated.  Every 5 seconds or so the write bandwidth 
drops to zero, then picks up a few seconds later (see the zpool iostat at the 
bottom of this message).  I am running SXDE, snv_55b.

 My test consists of copying a 1gb file (with cp) between two drives, one 80GB 
PATA, one 500GB SATA.  The first drive is the system drive (UFS), the second is 
for data.  I have configured the data drive with UFS and it does not exhibit the 
stalling problem and it runs in almost half the time.  I have tried many different 
ZFS settings as well: atime=off, compression=off, checksums=off, zil_disable=1 all 
to no effect.  CPU jumps to about 25% system time during the stalls, and hovers 
around 5% when data is being transferred.

 # zpool iostat 1
capacity operationsbandwidth
 pool used  avail   read  write   read  write
 --  -  -  -  -  -  -
 tank 183M   464G  0 17  1.12K  1.93M
 tank 183M   464G  0457  0  57.2M
 tank 183M   464G  0445  0  55.7M
 tank 183M   464G  0405  0  50.7M
 tank 366M   464G  0226  0  4.97M
 tank 366M   464G  0  0  0  0
 tank 366M   464G  0  0  0  0
 tank 366M   464G  0  0  0  0
 tank 366M   464G  0200  0  25.0M
 tank 366M   464G  0431  0  54.0M
 tank 366M   464G  0445  0  55.7M
 tank 366M   464G  0423  0  53.0M
 tank 574M   463G  0270  0  18.1M
 tank 574M   463G  0  0  0  0
 tank 574M   463G  0  0  0  0
 tank 574M   463G  0  0  0  0
 tank 574M   463G  0164  0  20.5M
 tank 574M   463G  0504  0  63.1M
 tank 574M   463G  0405  0  50.7M
 tank 753M   463G  0404  0  42.6M
 tank 753M   463G  0  0  0  0
 tank 753M   463G  0  0  0  0
 tank 753M   463G  0  0  0  0
 tank 753M   463G  0343  0  42.9M
 tank 753M   463G  0476  0  59.5M
 tank 753M   463G  0465  0  50.4M
 tank 907M   463G  0 68  0   390K
 tank 907M   463G  0  0  0  0
 tank 907M   463G  0 11  0  1.40M
 tank 907M   463G  0451  0  56.4M
 tank 907M   463G  0492  0  61.5M
 tank1.01G   463G  0139  0  7.94M
 tank1.01G   463G  0  0  0  0

 Thanks,
 Jesse DeFer


 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS party - PANIC collection

2007-03-05 Thread Gino Ruopolo
Hi All,

yesterday we done some tests with ZFS using a new server and a new JBOD going 
in production this week.

Here is what we found:


1) Solaris seems unable to recognize as disk any fc disk already labeled by a 
storage processor. cfgadm reports them as unknown.
We had to start linux and clean the partition table to have Solaris recognize 
the disks ... :(


2) Our test server was connected to the JBOD through a dual fc adapter, dual fc 
switch, MPXIO enabled.
We had MANY PANICS doing the following when the pool was loaded with a dd ..

-disconnecting and reconnectiong a few times one of the fc link.
-enabling/disabling a fc link port on one fc switch.
-powering off one of the two fc switches


Sometimes we get a panic and nothing on the logs!
Just a few examples:

Mar  3 18:38:54 TESTSVR offlining lun=0 (trace=0), target=cd (trace=284)
Mar  3 18:38:55 TESTSVR unix: [ID 836849 kern.notice] 
Mar  3 18:38:55 TESTSVR ^Mpanic[cpu0]/thread=fe8000d1cc80: 
Mar  3 18:38:55 TESTSVR genunix: [ID 809409 kern.notice] ZFS: I/O failure 
(write on unknown off 0: zio fe8322055280 [L0 unallocated] 2L/2P 
DVA[0]=1:575a0
000:2 fletcher2 uncompressed LE contiguous birth=9 fill=0 cksum=0:0:0:0): 
error 14
Mar  3 18:38:55 TESTSVR unix: [ID 10 kern.notice] 
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cac0 
zfs:zfsctl_ops_root+2f9c8b42 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cad0 
zfs:zio_next_stage+72 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cb00 
zfs:zio_wait_for_children+49 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cb10 
zfs:zio_wait_children_done+15 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cb20 
zfs:zio_next_stage+72 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cb60 
zfs:zio_vdev_io_assess+82 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cb70 
zfs:zio_next_stage+72 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cbd0 
zfs:vdev_mirror_io_done+c1 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cbe0 
zfs:zio_vdev_io_done+14 ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cc60 
genunix:taskq_thread+bc ()
Mar  3 18:38:55 TESTSVR genunix: [ID 655072 kern.notice] fe8000d1cc70 
unix:thread_start+8 ()
Mar  3 18:38:55 TESTSVR unix: [ID 10 kern.notice] 
Mar  3 18:38:55 TESTSVR genunix: [ID 672855 kern.notice] syncing file systems...

Mar  3 18:51:52 TESTSVR savecore: [ID 570001 auth.error] reboot after panic: 
ZFS: I/O failure (write on unknown off 0: zio fe8322055280 [L0 
unallocated] 2L/20
000P DVA[0]=1:575a:2 fletcher2 uncompressed LE contiguous birth=9 
fill=0 cksum=0:0:0:0): error 14


PANIC
Nothing on the log!
Mar  4 19:08:21 TESTSVR savecore: [ID 570001 auth.error] reboot after panic: 
ZFS: I/O failure (write on unknown off 0: zio fe8322055280 [L0 
unallocated] 2L/20
000P DVA[0]=1:575a:2 fletcher2 uncompressed LE contiguous birth=9 
fill=0 cksum=0:0:0:0): error 14


PANIC
Nothing on the log!
Mar  4 19:11:20 TESTSVR savecore: [ID 570001 auth.error] reboot after panic: 
ZFS: I/O failure (write on unknown off 0: zio fe8322055280 [L0 
unallocated] 2L/20
000P DVA[0]=1:575a:2 fletcher2 uncompressed LE contiguous birth=9 
fill=0 cksum=0:0:0:0): error 14




Mar  4 19:25:37 TESTSVR genunix: [ID 834635 kern.info] /scsi_vhci/[EMAIL 
PROTECTED] (sd13) multipath status: degraded, path /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci1011,[EMAIL PROTECTED]/pc
i1077,[EMAIL PROTECTED]/[EMAIL PROTECTED],0 (fp2) to target address: 
w2204cfd87b7b,0 is offline Load balancing: round-robin
Mar  4 19:25:37 TESTSVR unix: [ID 836849 kern.notice] 
Mar  4 19:25:37 TESTSVR ^Mpanic[cpu3]/thread=fe80002e1c80: 
Mar  4 19:25:37 TESTSVR genunix: [ID 809409 kern.notice] ZFS: I/O failure 
(write on unknown off 0: zio fe811bdb7800 [L0 unallocated] 2L/2P 
DVA[0]=3:56260
000:2 fletcher2 uncompressed LE contiguous birth=22 fill=0 cksum=0:0:0:0): 
error 14
Mar  4 19:25:37 TESTSVR unix: [ID 10 kern.notice] 
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1ac0 
zfs:zfsctl_ops_root+2f9c8b42 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1ad0 
zfs:zio_next_stage+72 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1b00 
zfs:zio_wait_for_children+49 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1b10 
zfs:zio_wait_children_done+15 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1b20 
zfs:zio_next_stage+72 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1b60 
zfs:zio_vdev_io_assess+82 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] fe80002e1b70 
zfs:zio_next_stage+72 ()
Mar  4 19:25:37 TESTSVR genunix: [ID 655072 kern.notice] 

[zfs-discuss] Re: How to interrupt a zpool scrub?

2007-03-05 Thread Thomas Werschlein
How embarrassing is that? Pete kindly pointed me to the man page where it 
clearly states that I should use zpool scrub [-s] pool. -s for Stop 
scrubbing. Sorry folks, I just looked in the Administration guide where I 
couldn't find it. But I am sure it's in there, too.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Roch - PAE

Leon Koll writes:
  On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
  
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6467988
  
   NFSD  threads are created  on a  demand  spike (all of  them
   waiting  on I/O) but thentend to stick around  servicing
   moderate loads.
  
   -r
  
  Hello Roch,
  It's not my case. NFS stops to service after some point. And the
  reason is in ZFS. It never happens with NFS/UFS.
  Shortly, my scenario:
  1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads.
  2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of
  threads jumps to max
  3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of
  threads jumps to max.
  System cannot get back to the same results under equal load (1st and 3rd).
  Reboot between 2nd and 3rd doesn't help. The only persistent thing is
  a directory structure that was created during the 2nd run (in SFS
  higher requested load - more directories/files created).
  I am sure it's a bug. I need help. I don't care that ZFS works N times
  worse than UFS. I really care that after heavy load everything is
  totally screwed.
  
  Thanks,
  -- Leon

Hi Leon,

How much is the slowdown between 1st and 3rd ? How filled is 
the pool at each stage ? What does 'NFS stops to service'
mean ?

-r

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: How to interrupt a zpool scrub?

2007-03-05 Thread Wade . Stuart





[EMAIL PROTECTED] wrote on 03/05/2007 04:18:44 AM:

 How embarrassing is that? Pete kindly pointed me to the man page
 where it clearly states that I should use zpool scrub [-s] pool. -
 s for Stop scrubbing. Sorry folks, I just looked in the
 Administration guide where I couldn't find it. But I am sure it's in
 there, too.

Don't feel too bad I missed it too... Robert Milkowski thankfully directed
me to the man page.  =/




 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS stalling problem

2007-03-05 Thread Wade . Stuart





[EMAIL PROTECTED] wrote on 03/05/2007 03:56:28 AM:

 one question,
 is there a way to stop the default txg push behaviour (push at regular
 timestep-- default is 5sec) but instead push them on the fly...I
 would imagine this is better in the case of an application doing big
 sequential write (video streaming... )

 s.


I do not believe you would want to do that under any workload -- txg allow
for optimized writes.  I am wondering if this stall behavior (is it really
stalling,  or just a visual stat issue) is more related to txg maxsize
(calculated from memory/arc size) vs txg_time.  txg_time adjusting may
cloud the real issue if it is due to a bottleneck while evacing a txg or if
the txg maxsize is miscalculated so that people are hitting a state where
txg is _almost_ hitting maxsize in 5 seconds (txg_time default), and
blocking the next txg while evacing -- in which case the core issue is the
txg evac / maxsize.

Any thoughts?

-Wade


 On 3/5/07, Jeff Bonwick [EMAIL PROTECTED] wrote:
  Jesse,
 
  This isn't a stall -- it's just the natural rhythm of pushing out
  transaction groups.  ZFS collects work (transactions) until either
  the transaction group is full (measured in terms of how much memory
  the system has), or five seconds elapse -- whichever comes first.
 
  Your data would seem to suggest that the read side isn't delivering
  data as fast as ZFS can write it.  However, it's possible that
  there's some sort of 'breathing' effect that's hurting performance.
  One simple experiment you could try: patch txg_time to 1.  That
  will cause ZFS to push transaction groups every second instead of
  the default of every 5 seconds.  If this helps (or if it doesn't),
  please let us know.
 
  Thanks,
 
  Jeff
 
  Jesse DeFer wrote:
   Hello,
  
   I am having problems with ZFS stalling when writing, any help in
 troubleshooting would be appreciated.  Every 5 seconds or so the
 write bandwidth drops to zero, then picks up a few seconds later
 (see the zpool iostat at the bottom of this message).  I am running
 SXDE, snv_55b.
  
   My test consists of copying a 1gb file (with cp) between two
 drives, one 80GB PATA, one 500GB SATA.  The first drive is the
 system drive (UFS), the second is for data.  I have configured the
 data drive with UFS and it does not exhibit the stalling problem and
 it runs in almost half the time.  I have tried many different ZFS
 settings as well: atime=off, compression=off, checksums=off,
 zil_disable=1 all to no effect.  CPU jumps to about 25% system time
 during the stalls, and hovers around 5% when data is being transferred.
  
   # zpool iostat 1
  capacity operationsbandwidth
   pool used  avail   read  write   read  write
   --  -  -  -  -  -  -
   tank 183M   464G  0 17  1.12K  1.93M
   tank 183M   464G  0457  0  57.2M
   tank 183M   464G  0445  0  55.7M
   tank 183M   464G  0405  0  50.7M
   tank 366M   464G  0226  0  4.97M
   tank 366M   464G  0  0  0  0
   tank 366M   464G  0  0  0  0
   tank 366M   464G  0  0  0  0
   tank 366M   464G  0200  0  25.0M
   tank 366M   464G  0431  0  54.0M
   tank 366M   464G  0445  0  55.7M
   tank 366M   464G  0423  0  53.0M
   tank 574M   463G  0270  0  18.1M
   tank 574M   463G  0  0  0  0
   tank 574M   463G  0  0  0  0
   tank 574M   463G  0  0  0  0
   tank 574M   463G  0164  0  20.5M
   tank 574M   463G  0504  0  63.1M
   tank 574M   463G  0405  0  50.7M
   tank 753M   463G  0404  0  42.6M
   tank 753M   463G  0  0  0  0
   tank 753M   463G  0  0  0  0
   tank 753M   463G  0  0  0  0
   tank 753M   463G  0343  0  42.9M
   tank 753M   463G  0476  0  59.5M
   tank 753M   463G  0465  0  50.4M
   tank 907M   463G  0 68  0   390K
   tank 907M   463G  0  0  0  0
   tank 907M   463G  0 11  0  1.40M
   tank 907M   463G  0451  0  56.4M
   tank 907M   463G  0492  0  61.5M
   tank1.01G   463G  0139  0  7.94M
   tank1.01G   463G  0  0  0  0
  
   Thanks,
   Jesse DeFer
  
  
   This message posted from opensolaris.org
   ___
   zfs-discuss mailing list
   zfs-discuss@opensolaris.org
   http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 
  ___
  zfs-discuss mailing list
  

Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Leon Koll

On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:


Leon Koll writes:
  On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
  
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6467988
  
   NFSD  threads are created  on a  demand  spike (all of  them
   waiting  on I/O) but thentend to stick around  servicing
   moderate loads.
  
   -r
 
  Hello Roch,
  It's not my case. NFS stops to service after some point. And the
  reason is in ZFS. It never happens with NFS/UFS.
  Shortly, my scenario:
  1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads.
  2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of
  threads jumps to max
  3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of
  threads jumps to max.
  System cannot get back to the same results under equal load (1st and 3rd).
  Reboot between 2nd and 3rd doesn't help. The only persistent thing is
  a directory structure that was created during the 2nd run (in SFS
  higher requested load - more directories/files created).
  I am sure it's a bug. I need help. I don't care that ZFS works N times
  worse than UFS. I really care that after heavy load everything is
  totally screwed.
 
  Thanks,
  -- Leon

Hi Leon,

How much is the slowdown between 1st and 3rd ? How filled is


Typical case is:
1st: 1996 IOPS, latency  2.7
3rd: 1375 IOPS, latency 37.9


the pool at each stage ? What does 'NFS stops to service'
mean ?


There is a lot of error messages on the NFS(SFS) client :
sfs352: too many failed RPC calls - 416 good 27 bad
sfs3132: too many failed RPC calls - 302 good 27 bad
sfs3109: too many failed RPC calls - 533 good 31 bad
sfs353: too many failed RPC calls - 301 good 28 bad
sfs3144: too many failed RPC calls - 305 good 25 bad
sfs3121: too many failed RPC calls - 311 good 30 bad
sfs370: too many failed RPC calls - 315 good 27 bad

Thanks,
-- Leon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Roch - PAE

Leon Koll writes:

  On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
   Leon Koll writes:
 On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
 
 
  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6467988
 
  NFSD  threads are created  on a  demand  spike (all of  them
  waiting  on I/O) but thentend to stick around  servicing
  moderate loads.
 
  -r

 Hello Roch,
 It's not my case. NFS stops to service after some point. And the
 reason is in ZFS. It never happens with NFS/UFS.
 Shortly, my scenario:
 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads.
 2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of
 threads jumps to max
 3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of
 threads jumps to max.
 System cannot get back to the same results under equal load (1st and 
   3rd).
 Reboot between 2nd and 3rd doesn't help. The only persistent thing is
 a directory structure that was created during the 2nd run (in SFS
 higher requested load - more directories/files created).
 I am sure it's a bug. I need help. I don't care that ZFS works N times
 worse than UFS. I really care that after heavy load everything is
 totally screwed.

 Thanks,
 -- Leon
  
   Hi Leon,
  
   How much is the slowdown between 1st and 3rd ? How filled is
  
  Typical case is:
  1st: 1996 IOPS, latency  2.7
  3rd: 1375 IOPS, latency 37.9
  

The large latency increase is the  side effect of requesting
more than what can be delivered. Queue builds up and latency
follow. So  it  should  not be  the  primary  focus IMO. The
Decrease in IOPS is the primary problem.

One hypothesis is that over the life of the FS we're moving
toward spreading access to the full disk platter. We can
imagine some fragmentation hitting as well. I'm not sure
how I'd test both hypothesis.

   the pool at each stage ? What does 'NFS stops to service'
   mean ?
  
  There is a lot of error messages on the NFS(SFS) client :
  sfs352: too many failed RPC calls - 416 good 27 bad
  sfs3132: too many failed RPC calls - 302 good 27 bad
  sfs3109: too many failed RPC calls - 533 good 31 bad
  sfs353: too many failed RPC calls - 301 good 28 bad
  sfs3144: too many failed RPC calls - 305 good 25 bad
  sfs3121: too many failed RPC calls - 311 good 30 bad
  sfs370: too many failed RPC calls - 315 good 27 bad
 

Can this be timing out or queue full drops ? Might be a side 
effect of SFS requesting more than what can be delivered.

  Thanks,
  -- Leon

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Leon Koll

On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:


Leon Koll writes:

  On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
   Leon Koll writes:
 On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
 
 
  http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6467988
 
  NFSD  threads are created  on a  demand  spike (all of  them
  waiting  on I/O) but thentend to stick around  servicing
  moderate loads.
 
  -r

 Hello Roch,
 It's not my case. NFS stops to service after some point. And the
 reason is in ZFS. It never happens with NFS/UFS.
 Shortly, my scenario:
 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number of threads.
 2st SFS run, 4000 requested IOPS. NFS cannot serve all requests, no of
 threads jumps to max
 3rd SFS run, 2000 requested IOPS. NFS cannot serve all requests, no of
 threads jumps to max.
 System cannot get back to the same results under equal load (1st and 
3rd).
 Reboot between 2nd and 3rd doesn't help. The only persistent thing is
 a directory structure that was created during the 2nd run (in SFS
 higher requested load - more directories/files created).
 I am sure it's a bug. I need help. I don't care that ZFS works N times
 worse than UFS. I really care that after heavy load everything is
 totally screwed.

 Thanks,
 -- Leon
  
   Hi Leon,
  
   How much is the slowdown between 1st and 3rd ? How filled is
 
  Typical case is:
  1st: 1996 IOPS, latency  2.7
  3rd: 1375 IOPS, latency 37.9
 

The large latency increase is the  side effect of requesting
more than what can be delivered. Queue builds up and latency
follow. So  it  should  not be  the  primary  focus IMO. The
Decrease in IOPS is the primary problem.

One hypothesis is that over the life of the FS we're moving
toward spreading access to the full disk platter. We can
imagine some fragmentation hitting as well. I'm not sure
how I'd test both hypothesis.

   the pool at each stage ? What does 'NFS stops to service'
   mean ?
 
  There is a lot of error messages on the NFS(SFS) client :
  sfs352: too many failed RPC calls - 416 good 27 bad
  sfs3132: too many failed RPC calls - 302 good 27 bad
  sfs3109: too many failed RPC calls - 533 good 31 bad
  sfs353: too many failed RPC calls - 301 good 28 bad
  sfs3144: too many failed RPC calls - 305 good 25 bad
  sfs3121: too many failed RPC calls - 311 good 30 bad
  sfs370: too many failed RPC calls - 315 good 27 bad
 

Can this be timing out or queue full drops ? Might be a side
effect of SFS requesting more than what can be delivered.


I don't know was it timeouts or full drops. SFS marked such runs as INVALID.
I can run whatever is needed to help to investigate the problem. If
you have a D script that will tell us more, please send it to me.
I appreciate your help.

-- Leon
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zfs and iscsi: cannot open device: I/O error

2007-03-05 Thread Rick McNeal
If you have questions about iSCSI, I would suggest sending them to  
[EMAIL PROTECTED] I read that mail list a little more  
often, so you'll get a quicker response.


On Feb 26, 2007, at 8:39 AM, cedric briner wrote:


 devfsadm -i iscsi # to create the device on sf3
 iscsiadm list target -Sv| egrep 'OS Device|Peer|Alias' # not empty
  Alias: vol-1
IP address (Peer): 10.194.67.111:3260
   OS Device Name:
 /dev/rdsk/c1t014005A267C12A0045E2F524d0s2
this is where my confusion began.
I don't know what is the device c1t04d0s2 for ? I mean what  
does it represents?




Normally the OS Device Name: would be exactly the same name that  
you would see when you run format. I don't know why you're seeing two  
different names. What version of Solaris are you running on the  
initiator?


The device names contain the Globally Unique IDentifier (GUID). The  
main benefit is that if you have multiple Solaris machines which can  
attach to the same device the pathname will be consistent across the  
machines.


I've found that the ``OS Device Name'' (c1t04d0s2) is created  
after the invocation:

devfsadm -i iscsi # to create the device on sf3

but no way, this is not a device that you can use.
you can find the device only with the command:
format
   Searching for disks...done


   AVAILABLE DISK SELECTIONS:
   0. c0t0d0 IC35L120AVV207-0 cyl 59129 alt 2 hd 16 sec 255
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   1. c0t2d0 IC35L120-   VNC602A6G9E2T-0001-115.04GB
  /[EMAIL PROTECTED],0/[EMAIL PROTECTED]/[EMAIL PROTECTED],0
   2. c1t014005A267C12A0045E308D2d0 SUN-SOLARIS-1-6.68GB
  /scsi_vhci/[EMAIL PROTECTED]

and then if you create the zpool with:
zpool create tank c1t014005A267C12A0045E308D2d0
it works !!


BUT.. BUT... and re-BUT
Since this, and with all this virtualization... how can I link a  
device name on my iscsi's client with the device name on my  
iscsi'server.




Look at the Alias value which is reported by the initiator. You can  
use that to find the device on the storage array. This assumes that  
you don't create duplicate Alias strings of course.


Because, Imagine that you are in my situation where I want to have  
(let's say) 4 iscsi'server with at maximum 16 disks attached by  
iscsi'server. And that you have at least 2 iscsi's client which  
will consolidate this space with zfs. And suddenly, you can see  
with zpool that a disk is dead. So I have to be able to replace  
this disk and so for this, I have to know on which one of the 4  
machine it resides and which disk it is.



so does some of you knows a little bit about this ?



If you post iSCSI related questions to storage-discuss you'll find  
many people who've been using both the initiator and target and are  
quite knowledgeable. Also, the Solaris iSCSI developers read the  
storage-discuss list more frequently than this one.



Ced.
--

Cedric BRINER
Geneva - Switzerland
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



Rick McNeal

If ignorance is bliss, this lesson would appear to be a deliberate  
attempt on your part to deprive me of happiness, the pursuit of which  
is my unalienable right according to the Declaration of  
Independence.  I therefore assert my patriotic prerogative not to  
know this material.  I'll be out on the playground. -- Calvin



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Spencer Shepler


On Mar 5, 2007, at 11:17 AM, Leon Koll wrote:


On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:


Leon Koll writes:

  On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
   Leon Koll writes:
 On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
 
 
  http://bugs.opensolaris.org/bugdatabase/view_bug.do? 
bug_id=6467988

 
  NFSD  threads are created  on a  demand  spike (all of   
them
  waiting  on I/O) but thentend to stick around   
servicing

  moderate loads.
 
  -r

 Hello Roch,
 It's not my case. NFS stops to service after some point.  
And the

 reason is in ZFS. It never happens with NFS/UFS.
 Shortly, my scenario:
 1st SFS run, 2000 requested IOPS. NFS is fine, ;low number  
of threads.
 2st SFS run, 4000 requested IOPS. NFS cannot serve all  
requests, no of

 threads jumps to max
 3rd SFS run, 2000 requested IOPS. NFS cannot serve all  
requests, no of

 threads jumps to max.
 System cannot get back to the same results under equal  
load (1st and 3rd).
 Reboot between 2nd and 3rd doesn't help. The only  
persistent thing is
 a directory structure that was created during the 2nd run  
(in SFS

 higher requested load - more directories/files created).
 I am sure it's a bug. I need help. I don't care that ZFS  
works N times
 worse than UFS. I really care that after heavy load  
everything is

 totally screwed.

 Thanks,
 -- Leon
  
   Hi Leon,
  
   How much is the slowdown between 1st and 3rd ? How filled is
 
  Typical case is:
  1st: 1996 IOPS, latency  2.7
  3rd: 1375 IOPS, latency 37.9
 

The large latency increase is the  side effect of requesting
more than what can be delivered. Queue builds up and latency
follow. So  it  should  not be  the  primary  focus IMO. The
Decrease in IOPS is the primary problem.

One hypothesis is that over the life of the FS we're moving
toward spreading access to the full disk platter. We can
imagine some fragmentation hitting as well. I'm not sure
how I'd test both hypothesis.

   the pool at each stage ? What does 'NFS stops to service'
   mean ?
 
  There is a lot of error messages on the NFS(SFS) client :
  sfs352: too many failed RPC calls - 416 good 27 bad
  sfs3132: too many failed RPC calls - 302 good 27 bad
  sfs3109: too many failed RPC calls - 533 good 31 bad
  sfs353: too many failed RPC calls - 301 good 28 bad
  sfs3144: too many failed RPC calls - 305 good 25 bad
  sfs3121: too many failed RPC calls - 311 good 30 bad
  sfs370: too many failed RPC calls - 315 good 27 bad
 

Can this be timing out or queue full drops ? Might be a side
effect of SFS requesting more than what can be delivered.


I don't know was it timeouts or full drops. SFS marked such runs as  
INVALID.

I can run whatever is needed to help to investigate the problem. If
you have a D script that will tell us more, please send it to me.
I appreciate your help.


The failed RPCs are indeed a result of the SFS client timing out
the requests it has made to the server.  The server is being
overloaded for its capabilities and the benchmark results
show that.  I agree with Roch that as the SFS benchmark adds
more data to the filesystems that additional latency is
added and for this particular configuration and the
server is being over-driven.

The helpful thing would be to run smaller increments in the
benchmark to determine where the response time increases
beyond what the SFS workload can handle.

There have been a number of changes in ZFS recently that should
help with SFS performance measurement but fundamentally it
all depends on the configuration of the server (number of spindles
and CPU available).  So there may be a limit that is being
reached based on the hardware configuration.

What is your real goal here, Leon?  Are you trying to gather SFS
data to fit into sizing of a particular solution or just trying
to gather performance results for other general comparisons?
There are certainly better benchmarks than SFS for either
sizing and comparison reasons.

Spencer
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Cluster File System Use Cases

2007-03-05 Thread Rayson Ho

I read this paper on Sunday. Seems interesting:

The Architecture of PolyServe Matrix Server: Implementing a Symmetric
Cluster File System

http://www.polyserve.com/requestinfo_formq1.php?pdf=2

What interested me the most is that the metadata and lock are spread
across all the nodes. I read the Parallel NFS (pNFS) presentation,
and seems like pNFS still has the metadata on one server... (Lisa,
correct me if I am wrong).

http://opensolaris.org/os/community/os_user_groups/frosug/pNFS/FROSUG-pNFS.pdf

Rayson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Why number of NFS threads jumps to the max value?

2007-03-05 Thread Leon Koll

On 3/5/07, Spencer Shepler [EMAIL PROTECTED] wrote:


On Mar 5, 2007, at 11:17 AM, Leon Koll wrote:

 On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:

 Leon Koll writes:

   On 3/5/07, Roch - PAE [EMAIL PROTECTED] wrote:
   
Leon Koll writes:
  On 2/28/07, Roch - PAE [EMAIL PROTECTED] wrote:
  
  
   http://bugs.opensolaris.org/bugdatabase/view_bug.do?
 bug_id=6467988
  
   NFSD  threads are created  on a  demand  spike (all of
 them
   waiting  on I/O) but thentend to stick around
 servicing
   moderate loads.
  
   -r
 
  Hello Roch,
  It's not my case. NFS stops to service after some point.
 And the
  reason is in ZFS. It never happens with NFS/UFS.
  Shortly, my scenario:
  1st SFS run, 2000 requested IOPS. NFS is fine, ;low number
 of threads.
  2st SFS run, 4000 requested IOPS. NFS cannot serve all
 requests, no of
  threads jumps to max
  3rd SFS run, 2000 requested IOPS. NFS cannot serve all
 requests, no of
  threads jumps to max.
  System cannot get back to the same results under equal
 load (1st and 3rd).
  Reboot between 2nd and 3rd doesn't help. The only
 persistent thing is
  a directory structure that was created during the 2nd run
 (in SFS
  higher requested load - more directories/files created).
  I am sure it's a bug. I need help. I don't care that ZFS
 works N times
  worse than UFS. I really care that after heavy load
 everything is
  totally screwed.
 
  Thanks,
  -- Leon
   
Hi Leon,
   
How much is the slowdown between 1st and 3rd ? How filled is
  
   Typical case is:
   1st: 1996 IOPS, latency  2.7
   3rd: 1375 IOPS, latency 37.9
  

 The large latency increase is the  side effect of requesting
 more than what can be delivered. Queue builds up and latency
 follow. So  it  should  not be  the  primary  focus IMO. The
 Decrease in IOPS is the primary problem.

 One hypothesis is that over the life of the FS we're moving
 toward spreading access to the full disk platter. We can
 imagine some fragmentation hitting as well. I'm not sure
 how I'd test both hypothesis.

the pool at each stage ? What does 'NFS stops to service'
mean ?
  
   There is a lot of error messages on the NFS(SFS) client :
   sfs352: too many failed RPC calls - 416 good 27 bad
   sfs3132: too many failed RPC calls - 302 good 27 bad
   sfs3109: too many failed RPC calls - 533 good 31 bad
   sfs353: too many failed RPC calls - 301 good 28 bad
   sfs3144: too many failed RPC calls - 305 good 25 bad
   sfs3121: too many failed RPC calls - 311 good 30 bad
   sfs370: too many failed RPC calls - 315 good 27 bad
  

 Can this be timing out or queue full drops ? Might be a side
 effect of SFS requesting more than what can be delivered.

 I don't know was it timeouts or full drops. SFS marked such runs as
 INVALID.
 I can run whatever is needed to help to investigate the problem. If
 you have a D script that will tell us more, please send it to me.
 I appreciate your help.

The failed RPCs are indeed a result of the SFS client timing out
the requests it has made to the server.  The server is being
overloaded for its capabilities and the benchmark results
show that.  I agree with Roch that as the SFS benchmark adds
more data to the filesystems that additional latency is
added and for this particular configuration and the
server is being over-driven.

The helpful thing would be to run smaller increments in the
benchmark to determine where the response time increases
beyond what the SFS workload can handle.

There have been a number of changes in ZFS recently that should
help with SFS performance measurement but fundamentally it
all depends on the configuration of the server (number of spindles
and CPU available).  So there may be a limit that is being
reached based on the hardware configuration.

What is your real goal here, Leon?  Are you trying to gather SFS
data to fit into sizing of a particular solution or just trying
to gather performance results for other general comparisons?


Spencer,
I am using SFS benchmark to emulate the real-world environment for the
NAS solution that I've built. SFS is able to create as many processes
(each one emulating separate client) as one needs plus it's rich of
meta operations.
My real goal is: quiet peaceful nights after my solution will start to
work in the production :)
What I see now: after some step the directory structure/on-disk layout
of ZFS? becomes an obstacle that cannot be passed. I don't want that
this will happen in the field, that's it.
And I am sure we have a bug case here.


There are certainly better benchmarks than SFS for either
sizing and comparison reasons.


I haven't found anything better: client-independent,
multi-proc./multi-threaded, meta-rich, comparable. An obvious drawback
is: $$. I think that a free replacement of it is almost unknown and
underestimated fstress ( http://www.cs.duke.edu/ari/fstress/ ).
Which ones are 

[zfs-discuss] old zfs pool and mounting

2007-03-05 Thread Michael Lee
Hi, 

I need to copy files from an old ZFS pool on an old hard drive to a new one on 
a new HD.
With UFS, you can just mount a partition from an old drive to copy files to a 
new drive. 
What's the equivalent process to do that with ZFS?

Thanks.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] old zfs pool and mounting

2007-03-05 Thread Robert Milkowski
Hello Michael,

Monday, March 5, 2007, 11:36:57 PM, you wrote:

ML Hi, 

ML I need to copy files from an old ZFS pool on an old hard drive to a new one 
on a new HD.
ML With UFS, you can just mount a partition from an old drive to copy files to 
a new drive.
ML What's the equivalent process to do that with ZFS?

How old are we talking?
If you think about ZFS version 1 which was in S10U1 (or was it v2?)
then just import those two pools and copy data.


-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Re: Cluster File System Use Cases

2007-03-05 Thread Mike Gerdts

On 2/28/07, Dean Roehrich [EMAIL PROTECTED] wrote:

ASM was Storage-Tek's rebranding of SAM-QFS.  SAM-QFS is already a shared
(clustering) filesystem.  You need to upgrade :)  Look for Shared QFS.


ASM as Oracle states it is Automatic Storage Management.  To the best
of my knowledge, it shares no heritage with SAM-QFS.

http://www.oracle.com/technology/products/database/asm/index.html

Mike

--
Mike Gerdts
http://mgerdts.blogspot.com/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] DMU interfaces

2007-03-05 Thread Sanjeev Bagewadi

Manoj,

Welcome back on the alias :-)

I don't think the interfaces are documented. However, refering to ZPL 
should be a good place to start.
The ZPL code interacts with DMU and obviously it is using the DMU 
interfaces.


However, I am not sure whether there is any gaurantee that they will not 
change.


Thanks and regards,
Sanjeev.

Manoj Joseph wrote:


Hi,

I believe, ZFS, at least in the design ;) , provides APIs other than 
POSIX (for databases and other applications) to directly talk to the DMU.


Are such interfaces ready/documented? If this is documented somewhere, 
could you point me to it?


Regards,
Manoj
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




--
Solaris Revenue Products Engineering,
India Engineering Center,
Sun Microsystems India Pvt Ltd.
Tel:x27521 +91 80 669 27521 


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss