Thanks, Mike
On 5/31/13 4:23 PM, Becky Ligon wrote:
Mike: Thanks for letting us onto your system. We ran some more tests and it seems that file creation during the touch command is taking more time than it should, while metadata ops seem okay. I dumped some more OFS debug data and will be looking at it over the weekend. I want to pinpoint the precise places in the code that I *think* are taking time and then rerun more tests. This may mean putting up a new copy of OFS with more specific debugging in it, if that is okay with you. I also have more ideas on other tests that we can run to verify where the problem is occurring. Is it okay if I log onto your system over the weekend? Becky On Fri, May 31, 2013 at 3:24 PM, Becky Ligon <[email protected] <mailto:[email protected]>> wrote: Mike: From the data you just sent, we see spikes in the touches as well as the removes, with the removes being more frequent. For example, on the rm data, there is a spike of about 2 orders of magnitude (100x) about every 10 ops, which can result in a 10x average slow down, even though most of the operations finish quite quickly. We do not normally see this, and we don't see it on our systems here, so we are trying to decide what might cause this so we can direct our efforts. At this point, we are trying to further diagnose the problem. Would it be possible for us to log onto your system to look around and possibly run some more tests? I am sorry for the inconvenience this is causing, but rest assured, several of us developers are trying to figure out the difference in performance between your system and ours. (We haven't been able to recreate your problem as of yet.) Becky On Fri, May 31, 2013 at 2:34 PM, Michael Robbert <[email protected] <mailto:[email protected]>> wrote: My terminal buffers weren't big enough to copy and paste all of that output, but hopefully the attached will have enough info for you to get an idea of what I'm seeing. I am beginning to feel like we're just running around in circles here. I can do these kinds of tests with and without cache until I'm blue in the face, but nothing is going to change until we figure out why un-cached meta data access is so slow. What are we doing to track that down? Thanks, Mike On 5/31/13 12:05 PM, Becky Ligon wrote: Mike: There is something going on with your system, as I am able to touch 500 files in 12.5 seconds and delete them in 8.8 seconds on our cluster. Did you remove all of ATTR entries from your conf file and restart the servers? If not, please do so and then capture the output from the following and send it to me: for i in `seq 1 500`; do time touch myfile${i}; done and then for i in myfile*; do time rm -f ${i}; done. Thanks, Becky On Fri, May 31, 2013 at 12:02 PM, Michael Robbert <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote: top - 09:54:53 up 6 days, 19:11, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 156 total, 1 running, 155 sleeping, 0 stopped, 0 zombie Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, 99.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 12289220k total, 1322196k used, 10967024k free, 85820k buffers Swap: 2104432k total, 232k used, 2104200k free, 965636k cached They all look very similar to this. 232k swap used on all of them throughout a touch/rm of 100 files. Ganglia doesn't show any change over time with cache on or off. Mike On 5/31/13 9:30 AM, Becky Ligon wrote: Michael: Can you send me a screen shot of "top" from your servers when the metadata is running on the local disk? I'd like to see how much memory is available. I'm wondering if 1GB for your DB cache is too high, possibly causing excessive swapping. Becky On Fri, May 24, 2013 at 6:06 PM, Michael Robbert <[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>> <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>>> wrote: We recently noticed a performance problem with our OrangeFS server. Here are the server stats: 3 servers, built identically with identical hardware [root@orangefs02 ~]# /usr/sbin/pvfs2-server --version 2.8.7-orangefs (mode: aio-threaded) [root@orangefs02 ~]# uname -r 2.6.18-308.16.1.el5.584g0000 4 core E5603 1.60GHz 12GB of RAM OrangeFS is being served to clients using bmi_tcp over DDR Infiniband. Backend storage is PanFS with 2x10Gig connections on the servers. Performance to the backend looks fine using bonnie++. >100MB/sec write and ~250MB/s read to each stack. ~300 creates/sec. On the OrangeFS clients are running kernel version 2.6.18-238.19.1.el5. The biggest problem I have right now is that delete are taking a long time. Almost 1 sec per file. [root@fatcompute-11-32 L_10_V0.2_eta0.3_wRes_______truncerr1e-11]# find N2/|wc -l 137 [root@fatcompute-11-32 L_10_V0.2_eta0.3_wRes_______truncerr1e-11]# time rm -rf N2 real 1m31.096s user 0m0.000s sys 0m0.015s Similar results for file creates: [root@fatcompute-11-32 ]# date;for i in `seq 1 50`;do touch file${i};done;date Fri May 24 16:04:17 MDT 2013 Fri May 24 16:05:05 MDT 2013 What else do you need to know? Which debug flags? What should we be looking at? I don't see any load on the servers and I've restarted server and rebooted server nodes. Thanks for any pointers, Mike Robbert Colorado School of Mines ___________________________________________________ Pvfs2-users mailing list Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org> <mailto:Pvfs2-users@beowulf-__underground.org <mailto:[email protected]>> <mailto:Pvfs2-users@beowulf-____underground.org <mailto:Pvfs2-users@beowulf-__underground.org> <mailto:Pvfs2-users@beowulf-__underground.org <mailto:[email protected]>>> http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users> <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users <http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>> -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
smime.p7s
Description: S/MIME Cryptographic Signature
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
