Thank you. I'll investigate as soon as I can get decent access. Rebooting nodes seemed to have temporarily solved the issue, however, things are not completely back to normal yet. -Daniel
On Thu, Dec 9, 2010 at 8:49 AM, Sunil Mushran <sunil.mush...@oracle.com> wrote: > http://oss.oracle.com/git/?p=ocfs2-1.4.git;a=commitdiff;h=1f667766cb67ed05b4d706aa82e8ad0b12eaae8b > > That specific error has been addressed in the upcoming 1.4.8. > > Attach the logs and all other info to a bugzilla. > > On 12/08/2010 05:07 PM, Daniel McDonald wrote: >> >> Hello, >> >> I'm writing from the otherside of the world from where my systems are, >> so details are coming in slow. We have a 6TB OCFS2 volume across 20 or >> so nodes all running OEL5.4 running ocfs2-1.4.4. The system has worked >> fairly well for the last 6-8 months. Something has happened over the >> last few weeks which has driven write performance nearly to a halt. >> I'm not sure how to proceed, and very poor internet is hindering my >> abilities further. I've verified that the disk array is in good >> health. I'm seeing a few awkward kernel log messages, an example of >> one follows. I have not been able to verify all nodes due to limited >> time and slow internet in my present location. Any assistance would be >> greatly appreciated. I should be able to provide log files in about 12 >> hours. At this moment, loadavgs on each node are 0.00 to 0.09. >> >> Here is a test write and associated iostat -xm 5 output. Previously I >> was obtaining> 90MB/s: >> >> $ dd if=/dev/zero of=/home/testdump count=1000 bs=1024k >> >> ...and associated iostat output: >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.10 0.00 0.43 12.25 0.00 87.22 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 1.80 0.00 8.40 0.00 0.04 9.71 >> 0.01 0.64 0.05 0.04 >> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda3 0.00 1.80 0.00 8.40 0.00 0.04 9.71 >> 0.01 0.64 0.05 0.04 >> sdc 0.00 0.00 115.80 0.60 0.46 0.00 >> 8.04 0.99 8.48 8.47 98.54 >> sdc1 0.00 0.00 115.80 0.60 0.46 0.00 >> 8.04 0.99 8.48 8.47 98.54 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.07 0.00 0.55 12.25 0.00 87.13 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.40 0.00 0.80 0.00 0.00 12.00 >> 0.00 2.00 1.25 0.10 >> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda3 0.00 0.40 0.00 0.80 0.00 0.00 12.00 >> 0.00 2.00 1.25 0.10 >> sdc 0.00 0.00 112.80 0.40 0.44 0.00 >> 8.03 0.98 8.68 8.69 98.38 >> sdc1 0.00 0.00 112.80 0.40 0.44 0.00 >> 8.03 0.98 8.68 8.69 98.38 >> >> Here is a test read and associated iostat output. I'm intentionally >> reading from a different test file as to avoid caching effects: >> >> $ dd if=/home/someothertestdump of=/dev/null bs=1024k >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.10 0.00 3.60 10.85 0.00 85.45 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 3.79 0.00 1.40 0.00 0.02 29.71 >> 0.00 1.29 0.43 0.06 >> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda3 0.00 3.79 0.00 1.40 0.00 0.02 29.71 >> 0.00 1.29 0.43 0.06 >> sdc 7.98 0.20 813.17 1.00 102.50 0.00 >> 257.84 1.92 2.34 1.19 96.71 >> sdc1 7.98 0.20 813.17 1.00 102.50 0.00 >> 257.84 1.92 2.34 1.19 96.67 >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 0.07 0.00 3.67 10.22 0.00 86.03 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await svctm %util >> sda 0.00 0.20 0.00 0.40 0.00 0.00 12.00 >> 0.00 0.50 0.50 0.02 >> sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 >> 0.00 0.00 0.00 0.00 >> sda3 0.00 0.20 0.00 0.40 0.00 0.00 12.00 >> 0.00 0.50 0.50 0.02 >> sdc 6.60 0.20 829.00 1.00 104.28 0.00 >> 257.32 1.90 2.31 1.17 97.28 >> sdc1 6.60 0.20 829.00 1.00 104.28 0.00 >> 257.32 1.90 2.31 1.17 97.28 >> >> I'm seeing a few weird kernel messages, such as: >> >> Dec 7 14:07:50 growler kernel: >> (dlm_wq,4793,4):dlm_deref_lockres_worker:2344 ERROR: >> 84B7C6421A6C4280AB87F569035C5368:O0000000000000016296ce900000000: node >> 14 trying to drop ref but it is already dropped! >> Dec 7 14:07:50 growler kernel: lockres: >> O0000000000000016296ce900000000, owner=0, state=0 >> Dec 7 14:07:50 growler kernel: last used: 0, refcnt: 6, on purge list: >> no >> Dec 7 14:07:50 growler kernel: on dirty list: no, on reco list: no, >> migrating pending: no >> Dec 7 14:07:50 growler kernel: inflight locks: 0, asts reserved: 0 >> Dec 7 14:07:50 growler kernel: refmap nodes: [ 21 ], inflight=0 >> Dec 7 14:07:50 growler kernel: granted queue: >> Dec 7 14:07:50 growler kernel: type=3, conv=-1, node=21, >> cookie=21:213370, ref=2, ast=(empty=y,pend=n), bast=(empty=y,pend=n), >> pending=(conv=n,lock=n,cancel=n,unlock=n) >> Dec 7 14:07:50 growler kernel: converting queue: >> Dec 7 14:07:50 growler kernel: blocked queue: >> >> >> Here is df output: >> >> r...@growler:~$ df >> Filesystem 1K-blocks Used Available Use% Mounted on >> /dev/sda3 245695888 29469416 203544360 13% / >> /dev/sda1 101086 15133 80734 16% /boot >> tmpfs 33005580 0 33005580 0% /dev/shm >> /dev/sdc1 5857428444 5234400436 623028008 90% /home >> >> Thanks >> -Daniel >> >> _______________________________________________ >> Ocfs2-users mailing list >> Ocfs2-users@oss.oracle.com >> http://oss.oracle.com/mailman/listinfo/ocfs2-users > > _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users