Let me run my last tests on orangefs01-ib0 to see if it is really the kernel or not.
Becky On Mon, Jun 3, 2013 at 11:30 AM, Michael Robbert <[email protected]> wrote: > I misspoke slightly in that last Email. I think that the kernel version > numbers that we're tied to are 2.6.18.*. Not just -308, We're still running > 2.6.18-275.12.1.el5.573g0000 on our other system so we can try that if > you'd like. > > Thanks, > Mike > > > On 6/3/13 9:16 AM, Michael Robbert wrote: > >> We are confined to kernels from Scyld Clusterware in the 2.6.18-308.* >> range. Our PanFS modules were purchased as a one time deal to get it to >> work with Scyld 5.x. They put in some work to make it version number >> independent, but I've tried non-scyld and other versions of Scyld and it >> doesn't work. >> >> Mike >> >> On 6/2/13 8:50 PM, Becky Ligon wrote: >> >>> All: >>> >>> The area of the code where we thought more time was being spent than >>> seemed reasonable was in the metafile dspace create and the local >>> datafile dspace create contained in the create state machine. In both >>> of these operations, the code executes a function called >>> dbpf_dspace_create_store_**handle which does the following: >>> >>> 1. db->get against BDB to see if the new handle already has a dspace >>> entry....which it shouldn't and doesn't. >>> 2. Issue a system call to "access" which tells us if the bstream file >>> for the given handle already exists...which it doesn't. >>> 3. db-put against BDB to store the dspace entry for the new handle >>> 4. inserts into the attribute cache. >>> >>> >>> In reviewing a more detailed debug log of these functions, I discovered >>> that most of the time these four operations execute in less than 0.5ms. >>> When the time is greater than that, the culprit is always the "access" >>> call alone or the "access" call along with interrupts from the job_timer >>> state machine. >>> >>> At this point, I am thinking that there may be a problem with the >>> version of linux running on the machines. As noted in my previous >>> email, 2.6.18-308.16.1.el5 is known to have issues with the kernel >>> dcache mechanism, which leads me to believe there could be other issues >>> as well. >>> >>> In the morning, I will run the same tests on a newer kernel (RHEL 6.3) >>> and compare "access" times between the two kernels. >>> >>> Becky >>> >>> >>> >>> >>> >>> >>> On Fri, May 31, 2013 at 7:22 PM, Becky Ligon <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Thanks, Mike! >>> >>> I ran some more tests hoping that the null-aio trove method would >>> eliminate disk issues, but null-aio, as I just discovered, still >>> allows files to be created. Doh! So, I will be looking more in >>> depth at our file creation process which includes metadata updates >>> and file creation on the disk. >>> >>> BTW: I noticed that you are running 2.6.18-308.16.1.el5.584g0000 >>> on your servers and there is a known Linux bug concerning dcache >>> processing that creates a kernel panic when OrangeFS is unmounted. >>> This bug effects other software, too, not just ours. Have you had >>> any problems along these lines? Our recommendation for those who >>> want to stay on RHEL 5 is to use 2.6.18-308. >>> >>> Becky >>> >>> >>> >>> On Fri, May 31, 2013 at 6:33 PM, Michael Robbert <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Yes, please do. You have free reign on the nodes that I listed >>> in my Email to you until this problem is solved. >>> >>> Thanks, >>> Mike >>> >>> >>> On 5/31/13 4:23 PM, Becky Ligon wrote: >>> >>> Mike: >>> >>> Thanks for letting us onto your system. >>> >>> We ran some more tests and it seems that file creation >>> during the touch >>> command is taking more time than it should, while metadata >>> ops seem >>> okay. I dumped some more OFS debug data and will be >>> looking at it over >>> the weekend. I want to pinpoint the precise places in the >>> code that I >>> *think* are taking time and then rerun more tests. This may >>> mean >>> putting up a new copy of OFS with more specific debugging in >>> it, if that >>> is okay with you. I also have more ideas on other tests >>> that we can run >>> to verify where the problem is occurring. >>> >>> Is it okay if I log onto your system over the weekend? >>> >>> Becky >>> >>> >>> On Fri, May 31, 2013 at 3:24 PM, Becky Ligon >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>>> >>> wrote: >>> >>> Mike: >>> >>> From the data you just sent, we see spikes in the >>> touches as well >>> as the removes, with the removes being more frequent. >>> >>> For example, on the rm data, there is a spike of about >>> 2 orders of >>> magnitude (100x) about every 10 ops, which can result >>> in a 10x >>> average slow down, even though most of the operations >>> finish quite >>> quickly. We do not normally see this, and we don't see >>> it on our >>> systems here, so we are trying to decide what might >>> cause this so we >>> can direct our efforts. >>> >>> At this point, we are trying to further diagnose the >>> problem. Would >>> it be possible for us to log onto your system to look >>> around and >>> possibly run some more tests? >>> >>> I am sorry for the inconvenience this is causing, but >>> rest assured, >>> several of us developers are trying to figure out the >>> difference in >>> performance between your system and ours. (We haven't >>> been able to >>> recreate your problem as of yet.) >>> >>> >>> Becky >>> >>> >>> >>> On Fri, May 31, 2013 at 2:34 PM, Michael Robbert >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>>> wrote: >>> >>> My terminal buffers weren't big enough to copy and >>> paste all of >>> that output, but hopefully the attached will have >>> enough info >>> for you to get an idea of what I'm seeing. >>> I am beginning to feel like we're just running >>> around in circles >>> here. I can do these kinds of tests with and >>> without cache until >>> I'm blue in the face, but nothing is going to >>> change until we >>> figure out why un-cached meta data access is so >>> slow. What are >>> we doing to track that down? >>> >>> Thanks, >>> Mike >>> >>> >>> On 5/31/13 12:05 PM, Becky Ligon wrote: >>> >>> Mike: >>> >>> There is something going on with your system, >>> as I am able >>> to touch 500 >>> files in 12.5 seconds and delete them in 8.8 >>> seconds on our >>> cluster. >>> >>> Did you remove all of ATTR entries from your >>> conf file and >>> restart the >>> servers? >>> >>> If not, please do so and then capture the >>> output from the >>> following and >>> send it to me: >>> >>> for i in `seq 1 500`; do time touch myfile${i}; >>> done >>> >>> and then >>> >>> for i in myfile*; do time rm -f ${i}; done. >>> >>> >>> Thanks, >>> Becky >>> >>> >>> On Fri, May 31, 2013 at 12:02 PM, Michael >>> Robbert >>> <[email protected] <mailto:[email protected]> >>> <mailto:[email protected] <mailto:[email protected]>> >>> <mailto:[email protected] >>> <mailto:[email protected]> <mailto:[email protected] >>> <mailto:[email protected]>>>> wrote: >>> >>> top - 09:54:53 up 6 days, 19:11, 1 user, >>> load >>> average: 0.00, 0.00, >>> 0.00 >>> Tasks: 156 total, 1 running, 155 >>> sleeping, 0 >>> stopped, 0 zombie >>> Cpu(s): 0.1%us, 0.2%sy, 0.0%ni, >>> 99.8%id, 0.0%wa, >>> 0.0%hi, >>> 0.0%si, 0.0%st >>> Mem: 12289220k total, 1322196k used, >>> 10967024k free, >>> 85820k buffers >>> Swap: 2104432k total, 232k used, >>> 2104200k free, >>> 965636k cached >>> >>> They all look very similar to this. 232k >>> swap used on >>> all of them >>> throughout a touch/rm of 100 files. >>> Ganglia doesn't >>> show any change >>> over time with cache on or off. >>> >>> Mike >>> >>> >>> On 5/31/13 9:30 AM, Becky Ligon wrote: >>> >>> Michael: >>> >>> Can you send me a screen shot of "top" >>> from your >>> servers when the >>> metadata is running on the local disk? >>> I'd like to >>> see how much >>> memory >>> is available. I'm wondering if 1GB >>> for your DB >>> cache is too high, >>> possibly causing excessive swapping. >>> >>> Becky >>> >>> >>> On Fri, May 24, 2013 at 6:06 PM, >>> Michael Robbert >>> <[email protected] >>> <mailto:[email protected]> <mailto:[email protected] >>> <mailto:[email protected]>> >>> <mailto:[email protected] >>> <mailto:[email protected]> <mailto:[email protected] >>> <mailto:[email protected]>>> >>> <mailto:[email protected] >>> <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>> <mailto:[email protected] >>> <mailto:[email protected]> >>> <mailto:[email protected] >>> <mailto:[email protected]>>>>**> wrote: >>> >>> We recently noticed a performance >>> problem with >>> our OrangeFS >>> server. >>> >>> Here are the server stats: >>> 3 servers, built identically with >>> identical >>> hardware >>> >>> [root@orangefs02 ~]# >>> /usr/sbin/pvfs2-server >>> --version >>> 2.8.7-orangefs (mode: >>> aio-threaded) >>> >>> [root@orangefs02 ~]# uname -r >>> 2.6.18-308.16.1.el5.584g0000 >>> >>> 4 core E5603 1.60GHz >>> 12GB of RAM >>> >>> OrangeFS is being served to >>> clients using >>> bmi_tcp over DDR >>> Infiniband. >>> Backend storage is PanFS with >>> 2x10Gig >>> connections on the >>> servers. >>> Performance to the backend looks >>> fine using >>> bonnie++. >>> >100MB/sec >>> write and ~250MB/s read to each >>> stack. ~300 >>> creates/sec. >>> >>> On the OrangeFS clients are >>> running kernel version >>> 2.6.18-238.19.1.el5. >>> >>> The biggest problem I have right >>> now is that >>> delete are >>> taking a >>> long time. Almost 1 sec per file. >>> >>> [root@fatcompute-11-32 >>> >>> L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# find >>> >>> >>> N2/|wc -l >>> 137 >>> [root@fatcompute-11-32 >>> >>> L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# time >>> >>> >>> >>> rm -rf N2 >>> >>> real 1m31.096s >>> user 0m0.000s >>> sys 0m0.015s >>> >>> Similar results for file creates: >>> >>> [root@fatcompute-11-32 ]# >>> date;for i in `seq 1 >>> 50`;do touch >>> file${i};done;date >>> Fri May 24 16:04:17 MDT 2013 >>> Fri May 24 16:05:05 MDT 2013 >>> >>> What else do you need to know? >>> Which debug >>> flags? What >>> should we be >>> looking at? >>> I don't see any load on the >>> servers and I've >>> restarted >>> server and >>> rebooted server nodes. >>> >>> Thanks for any pointers, >>> Mike Robbert >>> Colorado School of Mines >>> >>> >>> >>> >>> ______________________________**_______________________ >>> Pvfs2-users mailing list >>> >>> Pvfs2-users@beowulf-______**underground.org<Pvfs2-users@beowulf-______underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org> >>> > >>> <mailto:Pvfs2-users@beowulf-__** >>> __underground.org <Pvfs2-users@beowulf-____underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> >> >>> >>> >>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> > >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]> >>> >>> >>> >>> >>> >>> <mailto:Pvfs2-users@beowulf-__**____underground.org<Pvfs2-users@beowulf-______underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org> >>> > >>> <mailto:Pvfs2-users@beowulf-__** >>> __underground.org <Pvfs2-users@beowulf-____underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> >> >>> >>> >>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> > >>> >>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org> >>> >>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]> >>> >>>> >>> >>> >>> http://www.beowulf-______**underground.org/mailman/______** >>> listinfo/pvfs2-users<http://www.beowulf-______underground.org/mailman/______listinfo/pvfs2-users> >>> >>> >>> <http://www.beowulf-____**underground.org/mailman/____** >>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users> >>> > >>> >>> >>> <http://www.beowulf-____**underground.org/mailman/____** >>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users> >>> >>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users> >>> >> >>> >>> >>> >>> >>> >>> <http://www.beowulf-____**underground.org/mailman/____** >>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users> >>> >>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users> >>> > >>> >>> >>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users> >>> >>> <http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> >>> >>> >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >>> >>> >>> -- >>> Becky Ligon >>> OrangeFS Support and Development >>> Omnibond Systems >>> Anderson, South Carolina >>> >>> >> >> >> ______________________________**_________________ >> Pvfs2-users mailing list >> Pvfs2-users@beowulf-**underground.org<[email protected]> >> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users> >> >> > > _______________________________________________ > Pvfs2-users mailing list > [email protected] > http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users > > -- Becky Ligon OrangeFS Support and Development Omnibond Systems Anderson, South Carolina
_______________________________________________ Pvfs2-users mailing list [email protected] http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
