Re: [Pvfs2-users] Performance drop

Becky Ligon Mon, 03 Jun 2013 08:46:42 -0700

Let me run my last tests on orangefs01-ib0 to see if it is really the
kernel or not.


Becky


On Mon, Jun 3, 2013 at 11:30 AM, Michael Robbert <[email protected]> wrote:

> I misspoke slightly in that last Email. I think that the kernel version
> numbers that we're tied to are 2.6.18.*. Not just -308, We're still running
> 2.6.18-275.12.1.el5.573g0000 on our other system so we can try that if
> you'd like.
>
> Thanks,
> Mike
>
>
> On 6/3/13 9:16 AM, Michael Robbert wrote:
>
>> We are confined to kernels from Scyld Clusterware in the 2.6.18-308.*
>> range. Our PanFS modules were purchased as a one time deal to get it to
>> work with Scyld 5.x. They put in  some work to make it version number
>> independent, but I've tried non-scyld and other versions of Scyld and it
>> doesn't work.
>>
>> Mike
>>
>> On 6/2/13 8:50 PM, Becky Ligon wrote:
>>
>>> All:
>>>
>>> The area of the code where we thought more time was being spent than
>>> seemed reasonable was in the metafile dspace create and the local
>>> datafile dspace create contained in the create state machine.  In both
>>> of these operations, the code executes a function called
>>> dbpf_dspace_create_store_**handle which does the following:
>>>
>>> 1.  db->get against BDB to see if the new handle already has a dspace
>>> entry....which it shouldn't and doesn't.
>>> 2.  Issue a system call to "access" which tells us if the bstream file
>>> for the given handle already exists...which it doesn't.
>>> 3.  db-put against BDB to store the dspace entry for the new handle
>>> 4.  inserts into the attribute cache.
>>>
>>>
>>> In reviewing a more detailed debug log of these functions, I discovered
>>> that most of the time these four operations execute in less than 0.5ms.
>>> When the time is greater than that, the culprit is always the "access"
>>> call alone or the "access" call along with interrupts from the job_timer
>>> state machine.
>>>
>>> At this point, I am thinking that there may be a problem with the
>>> version of linux running on the machines.  As noted in my previous
>>> email, 2.6.18-308.16.1.el5 is known to have issues with the kernel
>>> dcache mechanism, which leads me to believe there could be other issues
>>> as well.
>>>
>>> In the morning, I will run the same tests on a newer kernel (RHEL 6.3)
>>> and compare "access" times between the two kernels.
>>>
>>> Becky
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Fri, May 31, 2013 at 7:22 PM, Becky Ligon <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>>     Thanks, Mike!
>>>
>>>     I ran some more tests hoping that the null-aio trove method would
>>>     eliminate disk issues, but null-aio, as I just discovered, still
>>>     allows files to be created. Doh!  So, I will be looking more in
>>>     depth at our file creation process which includes metadata updates
>>>     and file creation on the disk.
>>>
>>>     BTW:  I noticed that you are running 2.6.18-308.16.1.el5.584g0000
>>>     on your servers and there is a known Linux bug concerning dcache
>>>     processing that creates a kernel panic when OrangeFS is unmounted.
>>>     This bug effects other software, too, not just ours.  Have you had
>>>     any problems along these lines?  Our recommendation for those who
>>>     want to stay on RHEL 5 is to use 2.6.18-308.
>>>
>>>     Becky
>>>
>>>
>>>
>>>     On Fri, May 31, 2013 at 6:33 PM, Michael Robbert <[email protected]
>>>     <mailto:[email protected]>> wrote:
>>>
>>>         Yes, please do. You have free reign on the nodes that I listed
>>>         in my Email to you until this problem is solved.
>>>
>>>         Thanks,
>>>         Mike
>>>
>>>
>>>         On 5/31/13 4:23 PM, Becky Ligon wrote:
>>>
>>>             Mike:
>>>
>>>             Thanks for letting us onto your system.
>>>
>>>             We ran some more tests and it seems that file creation
>>>             during the touch
>>>             command is taking more time than it should, while metadata
>>>             ops seem
>>>             okay.   I dumped some more OFS debug data and will be
>>>             looking at it over
>>>             the weekend.  I want to pinpoint the precise places in the
>>>             code that I
>>>             *think* are taking time and then rerun more tests.  This may
>>>             mean
>>>             putting up a new copy of OFS with more specific debugging in
>>>             it, if that
>>>             is okay with you.  I also have more ideas on other tests
>>>             that we can run
>>>             to verify where the problem is occurring.
>>>
>>>             Is it okay if I log onto your system over the weekend?
>>>
>>>             Becky
>>>
>>>
>>>             On Fri, May 31, 2013 at 3:24 PM, Becky Ligon
>>>             <[email protected] <mailto:[email protected]>
>>>             <mailto:[email protected] <mailto:[email protected]>>>
>>> wrote:
>>>
>>>                  Mike:
>>>
>>>                   From the data you just sent, we see spikes in the
>>>             touches as well
>>>                  as the removes, with the removes being more frequent.
>>>
>>>                  For example, on the rm data, there is a spike of about
>>>             2 orders of
>>>                  magnitude (100x) about every 10 ops, which can result
>>>             in a 10x
>>>                  average slow down, even though most of the operations
>>>             finish quite
>>>                  quickly.  We do not normally see this, and we don't see
>>>             it on our
>>>                  systems here, so we are trying to decide what might
>>>             cause this so we
>>>                  can direct our efforts.
>>>
>>>                  At this point, we are trying to further diagnose the
>>>             problem.  Would
>>>                  it be possible for us to log onto your system to look
>>>             around and
>>>                  possibly run some more tests?
>>>
>>>                  I am sorry for the inconvenience this is causing, but
>>>             rest assured,
>>>                  several of us developers are trying to figure out the
>>>             difference in
>>>                  performance between your system and ours.  (We haven't
>>>             been able to
>>>                  recreate your problem as of yet.)
>>>
>>>
>>>                  Becky
>>>
>>>
>>>
>>>                  On Fri, May 31, 2013 at 2:34 PM, Michael Robbert
>>>             <[email protected] <mailto:[email protected]>
>>>                  <mailto:[email protected]
>>>             <mailto:[email protected]>>> wrote:
>>>
>>>                      My terminal buffers weren't big enough to copy and
>>>             paste all of
>>>                      that output, but hopefully the attached will have
>>>             enough info
>>>                      for you to get an idea of what I'm seeing.
>>>                      I am beginning to feel like we're just running
>>>             around in circles
>>>                      here. I can do these kinds of tests with and
>>>             without cache until
>>>                      I'm blue in the face, but nothing is going to
>>>             change until we
>>>                      figure out why un-cached meta data access is so
>>>             slow. What are
>>>                      we doing to track that down?
>>>
>>>                      Thanks,
>>>                      Mike
>>>
>>>
>>>                      On 5/31/13 12:05 PM, Becky Ligon wrote:
>>>
>>>                          Mike:
>>>
>>>                          There is something going on with your system,
>>>             as I am able
>>>                          to touch 500
>>>                          files in 12.5 seconds and delete them in 8.8
>>>             seconds on our
>>>                          cluster.
>>>
>>>                          Did you remove all of ATTR entries from your
>>>             conf file and
>>>                          restart the
>>>                          servers?
>>>
>>>                          If not, please do so and then capture the
>>>             output from the
>>>                          following and
>>>                          send it to me:
>>>
>>>                          for i in `seq 1 500`; do time touch myfile${i};
>>>             done
>>>
>>>                          and then
>>>
>>>                          for i in myfile*; do time rm -f ${i}; done.
>>>
>>>
>>>                          Thanks,
>>>                          Becky
>>>
>>>
>>>                          On Fri, May 31, 2013 at 12:02 PM, Michael
>>> Robbert
>>>                          <[email protected] <mailto:[email protected]>
>>>             <mailto:[email protected] <mailto:[email protected]>>
>>>                          <mailto:[email protected]
>>>             <mailto:[email protected]> <mailto:[email protected]
>>>             <mailto:[email protected]>>>> wrote:
>>>
>>>                               top - 09:54:53 up 6 days, 19:11,  1 user,
>>>               load
>>>                          average: 0.00, 0.00,
>>>                               0.00
>>>                               Tasks: 156 total,   1 running, 155
>>>             sleeping,   0
>>>                          stopped,   0 zombie
>>>                               Cpu(s):  0.1%us,  0.2%sy,  0.0%ni,
>>>             99.8%id,  0.0%wa,
>>>                            0.0%hi,
>>>                                 0.0%si, 0.0%st
>>>                               Mem:  12289220k total,  1322196k used,
>>>             10967024k free,
>>>                              85820k buffers
>>>                               Swap:  2104432k total,      232k used,
>>>               2104200k free,
>>>                             965636k cached
>>>
>>>                               They all look very similar to this. 232k
>>>             swap used on
>>>                          all of them
>>>                               throughout a touch/rm of 100 files.
>>>             Ganglia doesn't
>>>                          show any change
>>>                               over time with cache on or off.
>>>
>>>                               Mike
>>>
>>>
>>>                               On 5/31/13 9:30 AM, Becky Ligon wrote:
>>>
>>>                                   Michael:
>>>
>>>                                   Can you send me a screen shot of "top"
>>>             from your
>>>                          servers when the
>>>                                   metadata is running on the local disk?
>>>               I'd like to
>>>                          see how much
>>>                                   memory
>>>                                   is available.  I'm wondering if 1GB
>>>             for your DB
>>>                          cache is too high,
>>>                                   possibly causing excessive swapping.
>>>
>>>                                   Becky
>>>
>>>
>>>                                   On Fri, May 24, 2013 at 6:06 PM,
>>>             Michael Robbert
>>>                                   <[email protected]
>>>             <mailto:[email protected]> <mailto:[email protected]
>>>             <mailto:[email protected]>>
>>>                          <mailto:[email protected]
>>>             <mailto:[email protected]> <mailto:[email protected]
>>>             <mailto:[email protected]>>>
>>>                                   <mailto:[email protected]
>>>             <mailto:[email protected]>
>>>                          <mailto:[email protected]
>>>             <mailto:[email protected]>> <mailto:[email protected]
>>>             <mailto:[email protected]>
>>>                          <mailto:[email protected]
>>>             <mailto:[email protected]>>>>**> wrote:
>>>
>>>                                        We recently noticed a performance
>>>             problem with
>>>                          our OrangeFS
>>>                                   server.
>>>
>>>                                        Here are the server stats:
>>>                                        3 servers, built identically with
>>>             identical
>>>                          hardware
>>>
>>>                                        [root@orangefs02 ~]#
>>>             /usr/sbin/pvfs2-server
>>>                          --version
>>>                                        2.8.7-orangefs (mode:
>>> aio-threaded)
>>>
>>>                                        [root@orangefs02 ~]# uname -r
>>>                                        2.6.18-308.16.1.el5.584g0000
>>>
>>>                                        4 core E5603 1.60GHz
>>>                                        12GB of RAM
>>>
>>>                                        OrangeFS is being served to
>>>             clients using
>>>                          bmi_tcp over DDR
>>>                                   Infiniband.
>>>                                        Backend storage is PanFS with
>>> 2x10Gig
>>>                          connections on the
>>>                                   servers.
>>>                                        Performance to the backend looks
>>>             fine using
>>>                          bonnie++.
>>>                                    >100MB/sec
>>>                                        write and ~250MB/s read to each
>>>             stack. ~300
>>>                          creates/sec.
>>>
>>>                                        On the OrangeFS clients are
>>>             running kernel version
>>>                                   2.6.18-238.19.1.el5.
>>>
>>>                                        The biggest problem I have right
>>>             now is that
>>>                          delete are
>>>                                   taking a
>>>                                        long time. Almost 1 sec per file.
>>>
>>>                                        [root@fatcompute-11-32
>>>
>>>               L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# find
>>>
>>>
>>>                                        N2/|wc -l
>>>                                        137
>>>                                        [root@fatcompute-11-32
>>>
>>>               L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# time
>>>
>>>
>>>
>>>                                        rm -rf N2
>>>
>>>                                        real    1m31.096s
>>>                                        user    0m0.000s
>>>                                        sys     0m0.015s
>>>
>>>                                        Similar results for file creates:
>>>
>>>                                        [root@fatcompute-11-32 ]#
>>>             date;for i in `seq 1
>>>                          50`;do touch
>>>                                        file${i};done;date
>>>                                        Fri May 24 16:04:17 MDT 2013
>>>                                        Fri May 24 16:05:05 MDT 2013
>>>
>>>                                        What else do you need to know?
>>>             Which debug
>>>                          flags? What
>>>                                   should we be
>>>                                        looking at?
>>>                                        I don't see any load on the
>>>             servers and I've
>>>                          restarted
>>>                                   server and
>>>                                        rebooted server nodes.
>>>
>>>                                        Thanks for any pointers,
>>>                                        Mike Robbert
>>>                                        Colorado School of Mines
>>>
>>>
>>>
>>>
>>>             ______________________________**_______________________
>>>                                        Pvfs2-users mailing list
>>>             
>>> Pvfs2-users@beowulf-______**underground.org<Pvfs2-users@beowulf-______underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>> >
>>>                          <mailto:Pvfs2-users@beowulf-__**
>>> __underground.org <Pvfs2-users@beowulf-____underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>> >>
>>>
>>>               
>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>> >
>>>                          
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>>> >>>
>>>
>>>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**____underground.org<Pvfs2-users@beowulf-______underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>> >
>>>                          <mailto:Pvfs2-users@beowulf-__**
>>> __underground.org <Pvfs2-users@beowulf-____underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>> >>
>>>
>>>               
>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>> >
>>>                          
>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>             
>>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>>> >>>>
>>>
>>>
>>> http://www.beowulf-______**underground.org/mailman/______**
>>> listinfo/pvfs2-users<http://www.beowulf-______underground.org/mailman/______listinfo/pvfs2-users>
>>>
>>>
>>> <http://www.beowulf-____**underground.org/mailman/____**
>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>> >
>>>
>>>
>>> <http://www.beowulf-____**underground.org/mailman/____**
>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>>
>>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>> >>
>>>
>>>
>>>
>>>
>>>
>>> <http://www.beowulf-____**underground.org/mailman/____**
>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>>
>>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>> >
>>>
>>>
>>> <http://www.beowulf-__**underground.org/mailman/__**listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>>
>>> <http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>> >>>
>>>
>>>
>>>
>>>
>>>                                   --
>>>                                   Becky Ligon
>>>                                   OrangeFS Support and Development
>>>                                   Omnibond Systems
>>>                                   Anderson, South Carolina
>>>
>>>
>>>
>>>
>>>
>>>                          --
>>>                          Becky Ligon
>>>                          OrangeFS Support and Development
>>>                          Omnibond Systems
>>>                          Anderson, South Carolina
>>>
>>>
>>>
>>>
>>>                  --
>>>                  Becky Ligon
>>>                  OrangeFS Support and Development
>>>                  Omnibond Systems
>>>                  Anderson, South Carolina
>>>
>>>
>>>
>>>
>>>             --
>>>             Becky Ligon
>>>             OrangeFS Support and Development
>>>             Omnibond Systems
>>>             Anderson, South Carolina
>>>
>>>
>>>
>>>
>>>
>>>     --
>>>     Becky Ligon
>>>     OrangeFS Support and Development
>>>     Omnibond Systems
>>>     Anderson, South Carolina
>>>
>>>
>>>
>>>
>>> --
>>> Becky Ligon
>>> OrangeFS Support and Development
>>> Omnibond Systems
>>> Anderson, South Carolina
>>>
>>>
>>
>>
>> ______________________________**_________________
>> Pvfs2-users mailing list
>> Pvfs2-users@beowulf-**underground.org<[email protected]>
>> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>
>>
>
> _______________________________________________
> Pvfs2-users mailing list
> [email protected]
> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Performance drop

Reply via email to