Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-13 Thread Michael
I've got the box on eval, and just pushing through its paces. Ideally I would 
be replicating to another x4500, but I don't have another one and didn't want 
to use 22 disks for another pool.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-12 Thread Matthew Ahrens
Michael Kucharski wrote:
 We have a x4500 setup as a single 4*( raid2z 9 + 2)+2 spare pool and have
 the files  system mounted over v5 krb5 NFS and accessed directly. The pool
 is a 20TB pool and is using . There are three filesystems, backup, test
 and home. Test has about 20 million files and uses 4TB. These files range
 from 100B to 200MB. Test has a cron job to take snapshots every 15 minutes
 from 1m on the hour. Every 15min at 2min on the hour a cron batch job runs
 to zfs send/recv to the backup filesystem. Home has only 100GB.

Are you doing this send|recv within the same pool?  What's the motivation for 
that?  Can't you just use zfs clone, which would be much faster and use 
less disk space?  Or if you want another copy (which seems unlikely since you 
can already tolerate any 2 disks failing in your pool), then use zfs set 
copies=2 fs.

--matt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-10 Thread Michael
Thanks. Looks like I have this bug. Is it a hardware problem combined with a 
software problem?

Oct  9 09:35:43 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct  9 09:35:43 zeta1  port 3: device reset
Oct  9 09:35:43 zeta1 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd18):
Oct  9 09:35:43 zeta1   Error for Command: read(10)Error Level: 
Retryable
Oct  9 09:35:43 zeta1 scsi: [ID 107833 kern.notice] Requested Block: 
6063576   Error Block: 6063576
Oct  9 09:35:43 zeta1 scsi: [ID 107833 kern.notice] Vendor: ATA 
   Serial Number:
Oct  9 09:35:43 zeta1 scsi: [ID 107833 kern.notice] Sense Key: No 
Additional Sense
Oct  9 09:35:43 zeta1 scsi: [ID 107833 kern.notice] ASC: 0x0 (no additional 
sense info), ASCQ: 0x0, FRU: 0x0
Oct  9 09:35:43 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct  9 09:35:43 zeta1  port 3: device reset
Oct  9 09:35:43 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct  9 09:35:43 zeta1  port 3: link lost
Oct  9 09:35:43 zeta1 sata: [ID 801593 kern.notice] NOTICE: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]:
Oct  9 09:35:43 zeta1  port 3: link established
Oct  9 09:35:43 zeta1 marvell88sx: [ID 812950 kern.warning] WARNING: 
marvell88sx4: error on port 3:
Oct  9 09:35:43 zeta1 marvell88sx: [ID 517869 kern.info]device 
disconnected
Oct  9 09:35:43 zeta1 marvell88sx: [ID 517869 kern.info]device connected
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-10 Thread Gary Gendel
I'm not sure.  But when I would re-run a scrub, I got the errors at the same 
block numbers, which indicated that the disk was really bad.  It wouldn't hurt 
to make the entry in the /etc/system file, reboot, and then try the scrub 
again. If the problem disappears then it is a driver bug.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-09 Thread Michael Kucharski
Every day we see pause times of sometime 60 seconds to read 1K of a file for 
local reads as well as NFS in a test setup.

We have a x4500 setup as a single 4*( raid2z 9 + 2)+2 spare pool and have the 
files  system mounted over v5 krb5 NFS and accessed directly. The pool is a 
20TB pool and is using . There are three filesystems, backup, test and home. 
Test has about 20 million files and uses 4TB. These files range from 100B to 
200MB. Test has a cron job to take snapshots every 15 minutes from 1m on the 
hour. Every 15min at 2min on the hour a cron batch job runs to zfs send/recv to 
the backup filesystem. Home has only 100GB.

The test dir has 3 directories, 1 has 130,000 files other 2 have 10,000,000. We 
have 4 processes, 2 over NFS, 2 local, and 2 reading the dir with 130,000 
files, the other 2 reading the dir with 10,000,000. Every 35 seconds each 
process reads 1K of 64thK of 10 files and records the latency, then reads for 1 
second and record the throughput. At times of no other activity (outside the 
snapshot and send/recv times) we see read latencies of up to 60 seconds, maybe 
once a day at random times.

We are using an unpatched Solaris 10 08/07 build.

Pause times this long can to timeouts and jobs to fails which is problematic 
for us. Is this expected behaviour? Can anything be done to mitigate or 
diagnose the issue?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-09 Thread Gary Gendel
Are there any clues in the logs?  I have had a similar problem when a disk bad 
block was uncovered by zfs.  I've also seen this when using the Silicon Image 
driver without the recommended patch.

The former became evident when I ran a scrub. I saw the SCSI timeout errors pop 
up in the kern syslogs.  I solved this by replacing the disk.

Gary
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-09 Thread Michael
Excellent. 

Oct  9 13:36:01 zeta1 scsi: [ID 107833 kern.warning] WARNING: /[EMAIL 
PROTECTED],0/pci1022,[EMAIL PROTECTED]/pci11ab,[EMAIL PROTECTED]/[EMAIL 
PROTECTED],0 (sd13):
Oct  9 13:36:01 zeta1   Error for Command: readError Level: 
Retryable

Scrubbing now.

Big thanks ggendel.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS 60 second pause times to read 1K

2007-10-09 Thread Marc Bevand
Michael m.kucharski at bigfoot.com writes:
 
 Excellent. 
 
 Oct  9 13:36:01 zeta1 scsi: [ID 107833 kern.warning] WARNING:
 /pci at 2,0/pci1022,7458 at 8/pci11ab,11ab at 1/disk at 2,0 (sd13):
 Oct  9 13:36:01 zeta1   Error for Command: readError
Level: Retryable
 
 Scrubbing now.

This is only a part of the complete error message. Look a few lines above this
one. If you see something like:

  sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
   port 1: device reset
  sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
   port 1: link lost
  sata: [ID 801593 kern.notice] NOTICE: /[EMAIL PROTECTED],0/pci8086,[EMAIL 
PROTECTED]/pci11ab,[EMAIL PROTECTED]:
   port 1: link established
  marvell88sx: [ID 812950 kern.warning] WARNING: marvell88sx0: error on port 1:
  marvell88sx: [ID 517869 kern.info]   device disconnected
  marvell88sx: [ID 517869 kern.info]   device connected

Then it means you are probably affected by bug
http://bugs.opensolaris.org/view_bug.do?bug_id=6587133

This bug is fixed in Solaris Express build 73 and above, and will likely be
fixed in Solaris 10 Update 5. The workaround is to disable SATA NCQ and queuing
by adding set sata:sata_func_enable = 0x4 to /etc/system.

-marc

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss