Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-04 Thread Simon Breden
I have moved this saga to storage-discuss now, as this doesn't appear to be a ZFS issue, and it can be found here: http://www.opensolaris.org/jive/thread.jspa?threadID=59201 This message posted from opensolaris.org ___ zfs-discuss mailing list

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-04 Thread Simon Breden
I have moved this saga to storage-discuss now, as this doesn't appear to be a ZFS issue, and it can be found here: http://www.opensolaris.org/jive/thread.jspa?threadID=59201 This message posted from opensolaris.org ___ zfs-discuss mailing list

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Thanks Max, I have done a few tests with what you suggest and I have listed the output below. I wait a few minutes before deciding it's failed, and there is never any console output about anything failing, and nothing in any log files I've looked in: /var/adm/messages or /var/log/syslog. Maybe

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Well, I had some more ideas and ran some more tests: 1. cp -r testdir ~/z1 This copied the testdir directory from the zfs pool into my home directory on the IDE boot drive, so not part of the zfs pool, and this worked. 2. cp -r ~/z1 . This copied the files back from my home directory on the

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a few times and it didn't hang so far. So on the face of it, it appears that 'cp' is doing something that causes my system to hang if the files are read from and written to the same pool, but simply replacing 'cp' with

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: The plot thickens. I replaced 'cp' with 'rsync' and it worked -- I ran it a few times and it didn't hang so far. So on the face of it, it appears that 'cp' is doing something that causes my system to hang if the files are read from and written to the same pool,

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Rob
oops, I lied... according to my self http://mail.opensolaris.org/pipermail/zfs-discuss/2008-January/045141.html wait are queued in solaris and active 1 are in the drives NCQ. so the question is: Where are the drive's command getting dropped across 3 disks at the same time? and in all cases

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Thanks Max, and the fact that rsync stresses the system less would help explain why rsync works, and cp hangs. The directory was around 11GB in size. If Sun engineers are interested in this problem then I'm happy to run whatever commands they give me -- after all, I have a pure goldmine here

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: Thanks Max, and the fact that rsync stresses the system less would help explain why rsync works, and cp hangs. The directory was around 11GB in size. If Sun engineers are interested in this problem then I'm happy to run whatever commands they give me -- after

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Dave
I have similar, but not exactly the same drives: format inq Vendor: ATA Product: WDC WD7500AYYS-0 Revision: 4G30 Same firmware revision. I have no problems with drive performance, although I use them under UFS and for backing stores for iscsi disks. FYI, I had random lockups and crashes on

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-03 Thread Simon Breden
Wow, thanks Dave. Looks like you've had this hell too :) So, that makes me happy that the disks and pool are probably OK, but it does seem an issue with the NVidia MCP 55 chipset, or at least perhaps the nv_sata driver. From reading the bug list below, it seems the problem might be a more

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Simon Breden
OK, I tried replying by email, and got a message that a moderator will approve the message sometime... but that was a few hours ago, so I'm reverting to this forum software again :) Here's the reply I emailed: Hi Richard, I ran the format comand, selected the number of one of the disks in the

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Simon Breden
Thanks Max, I have not been able to find any new firmware for these drives (Western Digital WD7500AAKS) so I have sent an email to Western Digital to enquire about firmware updates. I'll see what they reply with, but I'm not too hopeful. In the meantime I decided to copy the files one at a

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread max
Hi Simon, One quick note. You don't have to cp each file one at a time to see which one it hangs on. Just run truss. It should be the last file that it opened. To see this with truss, do: truss cp -r ... Don't worry about all the truss output. You are probably only concerned with the

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Rob Logan
or work around the NCQ bug in the drive's FW by typing: su echo set sata:sata_max_queue_depth = 0x1 /etc/system reboot Rob ___ zfs-discuss mailing list zfs-discuss@opensolaris.org

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Cindy . Swearingen
Simon, I think you should review the checksum error reports from the fmdump output (dated 4/30) that you supplied previously. You can get more details by using fmdump -ev. Use zpool status -v to identify checksum errors as well. Cindy Simon Breden wrote: Thanks Max, I have not been able

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Simon Breden
Thanks Cindy. Here's the zpool status -v output: # zpool status -v tank pool: tank state: ONLINE scrub: none requested config: NAMESTATE READ WRITE CKSUM tankONLINE 0 0 0 raidz1ONLINE 0 0 0 c1t1d0

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Cindy . Swearingen
Okay, thanks. I wanted to rule out that the checksum errors reported on 4/30 were persistent enough to be picked up by zpool status. ZFS is generally quick to identify device problems. Since fmdump doesn't show any add'l recent errors either, then I think you can rule out hardware problems other

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Simon Breden
OK then, thanks Cindy. I have 2 current lines of investigation left (at least): 1. assumption the problem could relate to a drive firmware bug 2. there's a new BIOS for the motherboard available which might possibly have some effect For the idea that it's a drive firmware bug, I'm currently

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-02 Thread Simon Breden
Assuming that this problem could be related to a drive firmware bug I have: 1. tuned off NCQ -- or in fact limited the queue depth to 1 2. used truss with the cp command I found this for NCQ: http://blogs.sun.com/erickustarz/entry/ncq_tunable = NCQ

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Sorry for the delay. Here is the output for a couple of seconds: # iostat -xce 1 extended device statistics errors --- cpu devicer/sw/s kr/s kw/s wait actv svc_t %w %b s/w h/w trn tot us sy wt id cmdk0 1.50.7 20.84.2 0.0

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
Simon Breden wrote: Sorry for the delay. Here is the output for a couple of seconds: This is the smoking gun... # iostat -xce 1 extended device statistics errors --- cpu devicer/sw/s kr/s kw/s wait actv svc_t %w %b s/w h/w trn

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Rob Logan
hmm, three drives with 35 io requests in the queue and none active? remind me not to buy a drive with that FW.. 1) upgrade the FW in the drives or 2) turn off NCQ with: echo set sata:sata_max_queue_depth = 0x1 /etc/system Rob

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Thanks a lot Richard. To give a bit more info, I've copied my /var/adm/messages from booting up the machine: And @picker: I guess the 35 requests are stacked up waiting for the hanging request to be serviced? The question I have is where do I go from now, to get some more info on what is

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: Thanks a lot Richard. To give a bit more info, I've copied my /var/adm/messages from booting up the machine: And @picker: I guess the 35 requests are stacked up waiting for the hanging request to be serviced? The question I have is where do I go from now, to

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
This list seems out of sync (delayed) with email messages I receive. Why is that? Which are the best tools to use when reading / replying to these posts? Anyway from my email I can see that Max has sent me a question about truss -- here is my reply: Hi Max, I haven't used truss before, but

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: Hi Max, I haven't used truss before, but give me the command line + switches and I'll be happy to run it. Simon # truss -p pid_from_cp where pid_from_cp is... the pid of the cp process that is hung. The pid you can get from ps. I am curious if the cp is

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
This mailing list seems broken and out of sync -- your post is as 'Guest' and appears as a new post in the main zfs-discuss list -- and the main thread is out of sync with the replies, and I just got a java exception trying to post to the main thread -- what's going on here? This message

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Hi Max, I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp command, got it's PID and then ran: # truss -p PID_of_cp and it output nothing at all -- i.e. it hanged too -- just showing a flashing cursor. The system is still operational as I am typing into the browser.

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: Hi Max, I re-ran the cp command and when it hanged I ran 'ps -el' looked up the cp command, got it's PID and then ran: # truss -p PID_of_cp and it output nothing at all -- i.e. it hanged too -- just showing a flashing cursor. The system is still

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Keep getting Java exceptions posting to the proper thread for this -- just lost an hour --- WTF??? Had to reply to my own post as Max's reply (which I saw in my email inbox) has not appeared here. Again, what is wrong with this forum software -- it seems so buggy, or am I missing something

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Simon Breden
Just to reduce my stress levels and to give the webmaster some useful info to help fix this broken forum: I tried posting a reply to the main thread for 'cp -r hanged copying a directory' and got the following error -- seems like it can't find the parent thread/message's id in the database at

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread [EMAIL PROTECTED]
Hi Simon, Simon Breden wrote: Thanks for your advice Max, and here is my reply to your suggestion: # mdb -k Loading modules: [ unix genunix specfs dtrace cpu.generic cpu_ms.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs ip hook neti sctp arp usba s1394 nca lofs zfs random md sppp smbsrv nfs

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
Simon Breden wrote: Thanks a lot Richard. To give a bit more info, I've copied my /var/adm/messages from booting up the machine: And @picker: I guess the 35 requests are stacked up waiting for the hanging request to be serviced? The question I have is where do I go from now, to get

Re: [zfs-discuss] cp -r hanged copying a directory

2008-05-01 Thread Richard Elling
[forget the BUI forum, e-mail works better, IMHO] Simon Breden wrote: Thanks a lot Richard. To give a bit more info, I've copied my /var/adm/messages from booting up the machine: I don't see any major issues related to this problem in the messages. And @picker: I guess the 35 requests

Re: [zfs-discuss] cp -r hanged copying a directory

2008-04-28 Thread Simon Breden
I don't like the sound of broken hardware :( I did the cp -r dir1 dir2 again and when it hanged I issued 'fmdump -e' like you said -- here is the output: # fmdump -e TIME CLASS fmdump: /var/fm/fmd/errlog is empty # I also checked /var/adm/messages and I didn't see anything in

Re: [zfs-discuss] cp -r hanged copying a directory

2008-04-28 Thread Rob Logan
I did the cp -r dir1 dir2 again and when it hanged when its hung, can you type: iostat -xce 1 in another window and is there a 100 in the %b column? when you reset and try the cp again, and look at iostat -xce 1 on the second hang, is the same disk at 100 in %b? if all your windows are hung,

Re: [zfs-discuss] cp -r hanged copying a directory

2008-04-27 Thread Richard Elling
Simon Breden wrote: I installed b87 today and then I made a copy of a directory. To my surprise, a few seconds later the drive access light went out. Upon inspection, only a couple of the files had been copied, and the cp command appeared to have hung. I did: cp -r dir1 dir2 ps -el