Re: [zfs-discuss] modification to zdb to decompress blocks

2008-05-03 Thread Benjamin Brumaire
Hi,

Great stuff.

Does this change will make it into opensolaris? Looking at actual code I 
couldn't find the modification.

I try to replace zdb.c in the opensolaris main tree before compiling with 
nightly but the compiler wasn't happy with it. Can you write down the right 
options?

bbr
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS still crashing after patch

2008-05-03 Thread Rustam
I don't think that this is hardware issue, however i don't except this. I'll 
try to explain why.

1. I've replaced all memory modules which are more likely to cause such a 
problem.

2. There are many different applications running on that server (Apache, 
PostgreSQL, etc.). However, if you look at the four different crash dump stack 
traces you see the same picture:

-- crash dump st1 --
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
spa_scrub_io_start+0xf1()
spa_scrub_cb+0x13d()

-- crash dump st2 --
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

-- crash dump st3 --
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()

-- crash dump st4 --
mutex_enter+0xb()
zio_buf_alloc+0x1a()
zio_read+0xba()
arc_read+0x3cc()
dbuf_prefetch+0x11d()
dmu_prefetch+0x107()
zfs_readdir+0x408()
fop_readdir+0x34()


All four crash dumps show problem at zio_read/zio_buf_alloc. Three of these 
appeared during metadata prefetch (dmu_prefetch) and one during scrubbing. I 
don't think that it's coincidence. IMHO, checksum errors are the result of this 
inconsistency.

I tend to think that problem is in ZFS it exists even in the latest Solaris 
version (maybe OpenSolaris as well).


 
 Lots of CKSUM errors like you see is often indicative
 of bad hardware. Run 
 memtest for 24-48 hours.
 
 -marc
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Endian relevance for decoding lzjb blocks

2008-05-03 Thread Benjamin Brumaire
I 'm trying to decode a lzjb compressed blocks and I have some hard times 
regarding big/little endian. I'm on x86 working with build 77.

#zdb - ztest
...
rootbp = [L0 DMU objset] 400L/200P DVA[0]=0:e0c98e00:200
...

## zdb -R ztest:c0d1s4:e0c98e00:200:
Found vdev: /dev/dsk/c0d1s4

ztest:c0d1s4:e0c98e00:200:
  0 1 2 3 4 5 6 7   8 9 a b c d e f  0123456789abcdef
00:  0003020e0a00  dd0304050020b601  .. .
10:  c505048404040504  35b558231002047c  |...#X.5

Looking at this blocks with dd:
dd if=/dev/dsk/c0d1s4 iseek=7374023 bs=512 count=1 | od -x
000: 0a00 020e 0003  b601 0020 0405 dd03

od -x is responsible for swapping every two bytes. I have on disk
000: 000a 0e02 0300  01b6 0200 0504 03dd

Comparing with the zdb output is every 8 bytes reversed.

Now I don't know how to pass this to my lzjb decoding programm?

Should I read the 512 bytes and pass them:
   - from the end
   - from the start and reverse every 8 bytes
   - or something else

thanks for any advice

bbr
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] modification to zdb to decompress blocks

2008-05-03 Thread Benjamin Brumaire
thanks for the quick reaction. I ve now a working binary for my system.

 I don't understand why these changes should go through a project. The hooks 
are already there so once the code is written no much work have to be done. But 
it's an other story. Lets decode lzjb blocks now :-)

bbr
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Endian relevance for decoding lzjb blocks

2008-05-03 Thread [EMAIL PROTECTED]
Hi Benjamin,

Benjamin Brumaire wrote:
 I 'm trying to decode a lzjb compressed blocks and I have some hard times 
 regarding big/little endian. I'm on x86 working with build 77.

 #zdb - ztest
 ...
 rootbp = [L0 DMU objset] 400L/200P DVA[0]=0:e0c98e00:200
 ...

 ## zdb -R ztest:c0d1s4:e0c98e00:200:
 Found vdev: /dev/dsk/c0d1s4

 ztest:c0d1s4:e0c98e00:200:
   0 1 2 3 4 5 6 7   8 9 a b c d e f  0123456789abcdef
 00:  0003020e0a00  dd0304050020b601  .. .
 10:  c505048404040504  35b558231002047c  |...#X.5

   
Using the modified zdb, you should be able to do:

# zdb -R ztest:c0d1s4:e0c98e00:200:d,lzjb,400 2/tmp/foo

Then you can od /tmp/foo.  I am not sure what happens if you run zdb
with a zfs file system that is different endianess from the machine on which
you are running zdb.  It may just work...
The d:lzjb:400 says to use lzjb decompression with a logical (after 
decompression) size
of 0x400 bytes.  It dumps raw data to stderr, hence the 2/tmp/foo.

max

 Looking at this blocks with dd:
 dd if=/dev/dsk/c0d1s4 iseek=7374023 bs=512 count=1 | od -x
 000: 0a00 020e 0003  b601 0020 0405 dd03

 od -x is responsible for swapping every two bytes. I have on disk
 000: 000a 0e02 0300  01b6 0200 0504 03dd

 Comparing with the zdb output is every 8 bytes reversed.

 Now I don't know how to pass this to my lzjb decoding programm?

 Should I read the 512 bytes and pass them:
- from the end
- from the start and reverse every 8 bytes
- or something else

 thanks for any advice

 bbr
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Thanks Max, I have done a few tests with what you suggest and I have listed the 
output below. I wait a few minutes before deciding it's failed, and there is 
never any console output about anything failing, and nothing in any log files 
I've looked in: /var/adm/messages or /var/log/syslog. Maybe if I left it 2 
hours I might see a message somewhere, but who knows?

This is a nasty problem, as:
1. it appears to be failing on different files, although I think I'm seeing a 
common pattern where it fails on the third file often.
2. it copied all the files successfullly once, see log below for run #2, but 
then I do run #3 immediately afterwards and it fails again, so I list debug 
output for run #3
3. I cannot kill the hanging cp command and then my whole zfs filesystem is 
locked up meaning I have to reboot. Even doing an 'ls' command hangs often due 
to the hanging 'cp' command.
4. I cannot use 'shutdown -y -g 0 -i 5' to shutdown the machine, as it seems to 
be blocked by the hanging cp command
5. The way to shutdown the machine is to hit the reset button, and I don't like 
doing this when there are, theoretically, write operations occurring, or at 
least pending.

Anyway here is the long output. Perhaps if people reply they can avoid keeping 
this text as part of their reply or we'll be lost in a sea of dump output :)




output: run #1 (after reboot)

bash-3.2$ truss -topen cp -r testdir z
open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT
open(/lib/libc.so.1, O_RDONLY)= 3
open(/usr/lib/locale/en_GB.UTF-8/en_GB.UTF-8.so.3, O_RDONLY) = 3
open(/usr/lib/locale/common/methods_unicode.so.3, O_RDONLY) = 3
open(/lib/libsec.so.1, O_RDONLY)  = 3
open(/lib/libcmdutils.so.1, O_RDONLY) = 3
open(/lib/libavl.so.1, O_RDONLY)  = 3
open64(testdir/f06, O_RDONLY) = 4
open64(testdir/f15, O_RDONLY) = 4
open64(testdir/f12, O_RDONLY) = 4



# mdb -k
Loading modules: [ unix genunix specfs dtrace cpu.generic 
cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba 
s1394 nca lofs zfs random md sppp smbsrv nfs ptm crypto ipc ]
 ::pgrep cp
SPID   PPID   PGIDSIDUID  FLAGS ADDR NAME
R910909909869501 0x4a004000 ff01d96e6e30 cp
 ff01d96e6e30::walk thread | ::threadlist -v
ADDR PROC  LWP CLS PRIWCHAN
ff01d4371b60 ff01d96e6e30 ff01d9b28930   2  60 ff01f4aa52c0
  PC: _resume_from_idle+0xf1CMD: cp -r testdir z
  stack pointer for thread ff01d4371b60: ff000949f260
  [ ff000949f260 _resume_from_idle+0xf1() ]
swtch+0x17f()
cv_wait+0x61()
zio_wait+0x5f()
dmu_buf_hold_array_by_dnode+0x214()
dmu_read+0xd4()
zfs_fillpage+0x15e()
zfs_getpage+0x187()
fop_getpage+0x9f()
segvn_fault+0x9ef()
as_fault+0x5ae()
pagefault+0x95()
trap+0x1286()
0xfb8001d9()
fuword8+0x21()
zfs_write+0x147()
fop_write+0x69()
write+0x2af()
write32+0x1e()
sys_syscall32+0x101() 
  
 


bash-3.2$ iostat -xce 1
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk033.55.7  431.0   32.6  0.4  0.2   14.5   6  12   0   0   0   0   4 
 2  0 94
sd0   1.34.3  111.1   45.0  0.1  0.0   17.3   0   1   0   0   0   0 
sd1   2.04.4  210.0   45.1  0.1  0.0   14.5   0   1   0   0   0   0 
sd2   1.34.3  111.1   45.2  0.1  0.0   17.4   0   1   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   1 
48  0 51
sd0 325.6   56.3 36340.5  146.7 26.2  0.9   71.0  92  92   0   0   0   0 
sd1 518.6   49.2 65734.5  117.1 25.9  0.9   47.0  86  85   0   0   0   0 
sd2 327.6   57.3 36983.7  144.7 27.3  0.9   73.4  94  93   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device statistics  errors --- 
 cpu
devicer/sw/s   kr/s   kw/s wait actv  svc_t  %w  %b s/w h/w trn tot  us 
sy wt id
cmdk0 0.00.00.00.0  0.0  0.00.0   0   0   0   0   0   0   0 
43  0 57
sd0 301.11.0 33550.00.0 23.6  0.8   80.8  84  84   0   0   0   0 
sd1 556.21.0 69661.10.0 26.1  0.8   48.3  92  83   0   0   0   0 
sd2 300.15.0 33229.94.0 23.8  0.8   80.7  84  84   0   0   0   0 
sd3   0.00.00.00.0  0.0  0.00.0   0   0   0   5   0   5 
 extended device 

Re: [zfs-discuss] ZFS still crashing after patch

2008-05-03 Thread Robert Milkowski
Hello Rustam,

Saturday, May 3, 2008, 9:16:41 AM, you wrote:

R I don't think that this is hardware issue, however i don't except this. I'll 
try to explain why.

R 1. I've replaced all memory modules which are more likely to cause such a 
problem.

R 2. There are many different applications running on that server
R (Apache, PostgreSQL, etc.). However, if you look at the four
R different crash dump stack traces you see the same picture:

R -- crash dump st1 --
R mutex_enter+0xb()
R zio_buf_alloc+0x1a()
R zio_read+0xba()
R spa_scrub_io_start+0xf1()
R spa_scrub_cb+0x13d()

R -- crash dump st2 --
R mutex_enter+0xb()
R zio_buf_alloc+0x1a()
R zio_read+0xba()
R arc_read+0x3cc()
R dbuf_prefetch+0x11d()
R dmu_prefetch+0x107()
R zfs_readdir+0x408()
R fop_readdir+0x34()

R -- crash dump st3 --
R mutex_enter+0xb()
R zio_buf_alloc+0x1a()
R zio_read+0xba()
R arc_read+0x3cc()
R dbuf_prefetch+0x11d()
R dmu_prefetch+0x107()
R zfs_readdir+0x408()
R fop_readdir+0x34()

R -- crash dump st4 --
R mutex_enter+0xb()
R zio_buf_alloc+0x1a()
R zio_read+0xba()
R arc_read+0x3cc()
R dbuf_prefetch+0x11d()
R dmu_prefetch+0x107()
R zfs_readdir+0x408()
R fop_readdir+0x34()


R All four crash dumps show problem at zio_read/zio_buf_alloc. Three
R of these appeared during metadata prefetch (dmu_prefetch) and one
R during scrubbing. I don't think that it's coincidence. IMHO,
R checksum errors are the result of this inconsistency.

Which would happen if you have problem with HW and you're getting
wring checksums on both side of your mirrors. Maybe PS?

Try memtest anyway or sunvts



-- 
Best regards,
 Robert Milkowskimailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Well, I had some more ideas and ran some more tests:

1. cp -r testdir ~/z1

This copied the testdir directory from the zfs pool into my home directory on 
the IDE boot drive, so not part of the zfs pool, and this worked.

2. cp -r ~/z1 .

This copied the files back from my home directory on the IDE boot disk and into 
the ZFS pool. This worked.

3. cp -r z1 z2 

This copied the files from the ZFS pool to another directory in the ZFS pool 
and this has not worked -- it hanged again, but differently this time. It 
copied a couple of files, then the hanged. The mouse wouldn't move, keyboard 
inactive, I hit loads of keys including ALT TAB and finally the mouse was back, 
the copying continued to copy 2 or 3 more files and then hanged again, this 
time no more files are being copied, and its hanged copying a different file 
from other times.


So from these tests, it appears that copying the test directory out of the ZFS 
pool is successful, copying it in from outside the pool is successful, but 
reading and writing the files completely within the pool is failing. My gut 
instinct is that reading and writing purely within the pool is stressing the 
disks due to more I/O demands on the disks used for the ZFS pool, but this may 
be completely wrong.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a few 
times and it didn't hang so far.

So on the face of it, it appears that 'cp' is doing something that causes my 
system to hang if the files are read from and written to the same pool, but 
simply replacing 'cp' with 'rsync' works. Hmmm... anyone have a clue about what 
I can do next to home in on the problem with 'cp' ?  

Here is the output using 'rsync' :

bash-3.2$ truss -topen rsync -a z1 z2
open(/var/ld/ld.config, O_RDONLY) Err#2 ENOENT
open(/lib/libsocket.so.1, O_RDONLY)   = 3
open(/lib/libnsl.so.1, O_RDONLY)  = 3
open(/lib/libc.so.1, O_RDONLY)= 3
open(/usr/lib/locale/en_GB.UTF-8/en_GB.UTF-8.so.3, O_RDONLY) = 3
open(/usr/lib/locale/common/methods_unicode.so.3, O_RDONLY) = 3
open64(/etc/popt, O_RDONLY)   Err#2 ENOENT
open64(/export/home/simon/.popt, O_RDONLY)Err#2 ENOENT
open(/usr/lib/iconv/UTF-8%UTF-8.so, O_RDONLY) = 3
open64(/var/run/name_service_door, O_RDONLY)  = 3
open64(z1/testdir/f01, O_RDONLY)  = 5
open64(z1/testdir/f02, O_RDONLY)  = 5
open64(z1/testdir/f03, O_RDONLY)  = 5
open64(z1/testdir/f04, O_RDONLY)  = 5
open64(z1/testdir/f05, O_RDONLY)  = 5
open64(z1/testdir/f06, O_RDONLY)  = 5
open64(z1/testdir/f07, O_RDONLY)  = 5
open64(z1/testdir/f08, O_RDONLY)  = 5
open64(z1/testdir/f09, O_RDONLY)  = 5
open64(z1/testdir/f10, O_RDONLY)  = 5
open64(z1/testdir/f11, O_RDONLY)  = 5
open64(z1/testdir/f12, O_RDONLY)  = 5
open64(z1/testdir/f13, O_RDONLY)  = 5
open64(z1/testdir/f14, O_RDONLY)  = 5
open64(z1/testdir/f15, O_RDONLY)  = 5
open64(z1/testdir/f16, O_RDONLY)  = 5
Received signal #18, SIGCLD, in pollsys() [caught]
  siginfo: SIGCLD CLD_EXITED pid=910 status=0x
bash-3.2$
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:
 The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a 
 few times and it didn't hang so far.

 So on the face of it, it appears that 'cp' is doing something that causes my 
 system to hang if the files are read from and written to the same pool, but 
 simply replacing 'cp' with 'rsync' works. Hmmm... anyone have a clue about 
 what I can do next to home in on the problem with 'cp' ?  

 Here is the output using 'rsync' :

 bash-3.2$ truss -topen rsync -a z1 z2
 open(/var/ld/ld.config, O_RDONLY)   Err#2 ENOENT
   
The rsync command and cp command work very differently.  cp mmaps up to 
8MB of the input file and writes
from the returned address of mmap, faulting in the pages as it writes  
(unless you are a normal user on Indiana,
in which case cp is gnu's cp which reads/writes (so, why are there 2 
versions?)).  Rsync forks and sets up a socketpair between parent and
child processes then reads/writes.  It should be much slower than cp, 
and put much less stress on the disk.
It would be great to have a way to reproduce this.   I have not had any 
problems.  How large is the
directory you are copying?  Either the disk has not sent a response to 
an I/O operation, or the response was
somehow lost.  If I could reproduce the problem, I might try to dtrace 
the commands being sent to the HBA
and responses coming back...  Hopefully someone here who has experience 
with the disks you are using
will be able to help.

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Rob
oops, I lied... according to my self

http://mail.opensolaris.org/pipermail/zfs-discuss/2008-January/045141.html

wait are queued in solaris and active  1 are in
the drives NCQ.

so the question is: Where are the drive's command getting
dropped across 3 disks at the same time?

and in all cases its not a zfs issue, but a disk, controller
or [EMAIL PROTECTED] issue.

Rob
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Thanks Max, and the fact that rsync stresses the system less would help explain 
why rsync works, and cp hangs. The directory was around 11GB in size.

If Sun engineers are interested in this problem then I'm happy to run whatever 
commands they give me -- after all, I have a pure goldmine here for them to 
debug ;-) And it *is* running on a ZFS filesystem. Opportunities like this 
don't come along every day :) Tempted? :)

Well, if I can't tempt Sun, then for anyone who has the same disks, I would be 
interested to see what happens on your machine:
Model Number: WD7500AAKS-00RBA0
Firmware revision: 4G30

I use three of these disks in a RAIDZ1 vdev within the pool.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon,
Simon Breden wrote:
 Thanks Max, and the fact that rsync stresses the system less would help 
 explain why rsync works, and cp hangs. The directory was around 11GB in size.

 If Sun engineers are interested in this problem then I'm happy to run 
 whatever commands they give me -- after all, I have a pure goldmine here for 
 them to debug ;-) And it *is* running on a ZFS filesystem. Opportunities like 
 this don't come along every day :) Tempted? :)

 Well, if I can't tempt Sun, then for anyone who has the same disks, I would 
 be interested to see what happens on your machine:
 Model Number: WD7500AAKS-00RBA0
 Firmware revision: 4G30

 I use three of these disks in a RAIDZ1 vdev within the pool.
  
   
I think Rob Logan is probably correct, and there is a problem with the 
disks, not zfs.  Have you
tried this with a different file system (ufs), or multiple dd commands 
running at the same time with
the raw disks?

max

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Dave
I have similar, but not exactly the same drives:

format inq
Vendor:   ATA
Product:  WDC WD7500AYYS-0
Revision: 4G30

Same firmware revision. I have no problems with drive performance, 
although I use them under UFS and for backing stores for iscsi disks.

FYI, I had random lockups and crashes on my Tyan MB with the MCP55 
chipset. I bought Supermicro AOL-SAT2-MV8's and moved all my disks to 
them. Haven't had a problem since.

http://de.opensolaris.org/jive/thread.jspa?messageID=204736

--
Dave

On 05/03/2008 01:44 PM, Simon Breden wrote:
 @Max: I've not tried this with other file systems, and not with multiple dd 
 commands at the same time with raw disks. I suppose this is not possible to 
 do with my disks which are currently part of this RAIDZ1 vdev in the pool 
 without corrupting data? I'll assume not.
 
 @Rob: OK, let's assume that, like you say, it's not a ZFS issue, but in fact 
 a drive, firmware etc issue. That said, where should I create a new thread -- 
 in storage-discuss ? I will refer to these 2 threads here for all the gory 
 details ;-)
 
 If this can be proven to be a disk problem then I want to return them under 
 warranty and get some different ones. Normally these disks have absolutely 
 excellent user feedback on newegg.com, so I'm quite surprised if the disks 
 are ALL bad.
 
 I wonder if, in fact, there could be some issue between the motherboard's 
 BIOS and the drives, or is this not possible? The motherboard is an Asus 
 M2N-SLI Deluxe and it uses an NVidia controller (MCP 55?) which is part of 
 the NVidia 570 SLI chipset, again not exactly an exotic, unused chipset.
 
 If it's possible for the BIOS to affect the disk interface then I have seen 
 that a new BIOS is available, which I could try.
 
 Also I could update to snv_b87, which is the latest one, although I first saw 
 this problem with that build (87), so the OS upgrade might not help.
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Wow, thanks Dave. Looks like you've had this hell too :)

So, that makes me happy that the disks and pool are probably OK, but it does 
seem an issue with the NVidia MCP 55 chipset, or at least perhaps the nv_sata 
driver. From reading the bug list below, it seems the problem might be a more 
general disk driver problem, perhaps not just limited to the nv_sata driver.

I looked at the post you listed, and followed a chain of bug reports:

http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6658565 (Accepted)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6642154 (dup of 
6662881)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6662881 (fixed in 
snv_87)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6669134 (fixed in 
snv_90)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6662400 (dup of 
6669134)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6671523 (Accepted)
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6672202 (Need more 
info)

Maybe I'll try snv_87 again, or just wait until snv_90 is released.

My work around for until this issue is fixed is to use 'rsync' to do intra-pool 
copies as it seems to stress the system less and thus prevents ZFS file system 
lockup.

Thanks to everyone who helped.

I might post a link to this thread in the storage-discuss group to see if I can 
get any further  help, if anyone there knows any more details on this.

That Supermicro AOC MV8 card looks good, but I would prefer not to have to buy 
new hardware to fix what should hopefully turn out to be a driver problem.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss