Some more information that could help debug these issues. I just issued a ps axl that returns the information of the system call where the process is sleeping.
[cld@BO01 ~]$ ps axl | grep D F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND 0 500 1502 23156 20 0 103308 840 - R+ pts/3 0:00 grep D 1 0 1697 2 0 -20 0 0 msleep S< ? 1:33 [o2hb-DBAAFEC1F3] 1 0 1712 2 0 -20 0 0 msleep S< ? 1:35 [o2hb-D465A573D9] 1 0 1757 2 0 -20 0 0 msleep S< ? 1:34 [o2hb-1D67B38925] 1 0 17525 2 0 -20 0 0 msleep S< ? 0:03 [o2hb-83D827E8AB] 1 0 17532 2 20 0 0 0 jbd2_j D ? 0:01 [jbd2/sdg-65] 4 0 19524 1 20 0 1587484 919776 sync_b Dl pts/0 20:49 ruby /usr/local/rvm/gems/ruby-1.9.2-p180/bin/rake fixes:covers[6] --trace 4 0 24576 1 20 0 1449236 705832 ocfs2_ Dl ? 136:36 ruby /usr/local/rvm/gems/ruby-1.9.2-p180/bin/rake jobs:work RAILS_ENV=production QUEUE=p_covers --trace 0 500 24712 1 20 0 1873140 1134856 start_ Dl ? 3488:11 ruby /usr/local/rvm/gems/ruby-1.9.2-p180/bin/rake jobs:work RAILS_ENV=production QUEUE=pl_covers_soap --trace 0 500 24714 1 20 0 1249756 511468 start_ Dl ? 1002:28 ruby /usr/local/rvm/gems/ruby-1.9.2-p180/bin/rake jobs:work RAILS_ENV=production QUEUE=pl_covers --trace Has anyone some tips to help debug these issues? Adelino On Mon, Apr 9, 2012 at 7:23 PM, Adelino Monteiro <adelino.monte...@gmail.com> wrote: > Hello, > > Slowdowns are new but we're also just now beginning to have more > Reads, before we where mainly filling up the filesystem. Most of the > files (I would say 95%) are not changed after copying, they are only > read. Here is a simple df -h from the mounted partitions > > /dev/sdd 14T 3.2T 11T 24% /mnt/3 > /dev/sde 14T 1.2T 13T 9% /mnt/4 > /dev/sdh 14T 2.8T 11T 21% /mnt/7 > /dev/sdb 14T 13T 1.5T 90% /mnt/1 > /dev/sdf 14T 11T 3.0T 79% /mnt/5 > > As you can see there is only one that is 90% full but the problems are > on all of them now. > > In order to see if the problem was somehow related with the partition > we copied the contents (a simple cp /mnt/6/* /mnt/5) from one > partition to another and surprisingly or not the issue is also on the > "new" partition. > > I just tried to copy 240Mb to this partition and after 9 min of > waiting the copy just went on and 30 seconds later all was copied. > > The hardware this runs on is a DELL MD 3200 on a VMWare ESX 5 environment. > > I would love to give you some numbers just let me know the commands I > need to run. > > Adelino > > On Mon, Apr 9, 2012 at 5:35 PM, Joel Becker <jl...@evilplan.org> wrote: >> On Thu, Apr 05, 2012 at 05:11:59PM +0100, Adelino Monteiro wrote: >>> Hello all, >>> >>> For 4 month now I'm using OCFS in an environment with 7 partitions >>> each with 14 Tb running Oracle Linux 6.2 and until last week >>> everything was fine. >>> Now however we're running into severe performance problems when doing >>> simple copies. >>> >>> I have one of the 7 partitions mounted as RW on one server and 4 >>> servers with RO. I did a simple cp of various files on the RW server >>> and during that copy the process got into D state and a simple df for >>> instance blocked. It took minutes for something that should be >>> immediate. This is happening on any of those partitions. >> >> Hey Adelino, >> I'd love to understand your problems. You say you've been >> running these systems for four months. Are the slowdowns new? >> Was anything happening on the RO servers at the time? >> Especially touching the same files or directories? How full are the >> filesystems? How much change to they have (that is, are the files >> long-lived or constantly being deleted and created)? >> >> Joel >> >> -- >> >> "Also, all of life's big problems include the words 'indictment' or >> 'inoperable.' Everything else is small stuff." >> - Alton Brown >> >> http://www.jlbec.org/ >> jl...@evilplan.org > > > > -- > Cumprimentos / Best Regards > > Adelino Monteiro -- Cumprimentos / Best Regards Adelino Monteiro _______________________________________________ Ocfs2-users mailing list Ocfs2-users@oss.oracle.com http://oss.oracle.com/mailman/listinfo/ocfs2-users