Re: [Pvfs2-users] Performance drop

Becky Ligon Mon, 03 Jun 2013 12:03:02 -0700

Mike:

When I run the same "touch" test using local storage as the metadata and
data stores, I get great response, paralleling what I get on our own
cluster.  So, the kernel version doesn't seem to make a difference where
the "access" system call is concerned.


I ran some tests last night where I removed the system call to "access",
which removes the calls to  PANFS, and I got great response. The problem,
therefore, appears to be running the system call "access" against PANFS.
The Berkeley DB has nothing to do with your issue.

Let me discuss my findings with the team and get back with you on this.

BTW, I wasn't able to restart your servers using the crm command. Can you
see what's going on with that?

Thanks so much for your time and patience!

Becky


On Mon, Jun 3, 2013 at 11:43 AM, Becky Ligon <[email protected]> wrote:

> Let me run my last tests on orangefs01-ib0 to see if it is really the
> kernel or not.
>
> Becky
>
>
> On Mon, Jun 3, 2013 at 11:30 AM, Michael Robbert <[email protected]>wrote:
>
>> I misspoke slightly in that last Email. I think that the kernel version
>> numbers that we're tied to are 2.6.18.*. Not just -308, We're still running
>> 2.6.18-275.12.1.el5.573g0000 on our other system so we can try that if
>> you'd like.
>>
>> Thanks,
>> Mike
>>
>>
>> On 6/3/13 9:16 AM, Michael Robbert wrote:
>>
>>> We are confined to kernels from Scyld Clusterware in the 2.6.18-308.*
>>> range. Our PanFS modules were purchased as a one time deal to get it to
>>> work with Scyld 5.x. They put in  some work to make it version number
>>> independent, but I've tried non-scyld and other versions of Scyld and it
>>> doesn't work.
>>>
>>> Mike
>>>
>>> On 6/2/13 8:50 PM, Becky Ligon wrote:
>>>
>>>> All:
>>>>
>>>> The area of the code where we thought more time was being spent than
>>>> seemed reasonable was in the metafile dspace create and the local
>>>> datafile dspace create contained in the create state machine.  In both
>>>> of these operations, the code executes a function called
>>>> dbpf_dspace_create_store_**handle which does the following:
>>>>
>>>> 1.  db->get against BDB to see if the new handle already has a dspace
>>>> entry....which it shouldn't and doesn't.
>>>> 2.  Issue a system call to "access" which tells us if the bstream file
>>>> for the given handle already exists...which it doesn't.
>>>> 3.  db-put against BDB to store the dspace entry for the new handle
>>>> 4.  inserts into the attribute cache.
>>>>
>>>>
>>>> In reviewing a more detailed debug log of these functions, I discovered
>>>> that most of the time these four operations execute in less than 0.5ms.
>>>> When the time is greater than that, the culprit is always the "access"
>>>> call alone or the "access" call along with interrupts from the job_timer
>>>> state machine.
>>>>
>>>> At this point, I am thinking that there may be a problem with the
>>>> version of linux running on the machines.  As noted in my previous
>>>> email, 2.6.18-308.16.1.el5 is known to have issues with the kernel
>>>> dcache mechanism, which leads me to believe there could be other issues
>>>> as well.
>>>>
>>>> In the morning, I will run the same tests on a newer kernel (RHEL 6.3)
>>>> and compare "access" times between the two kernels.
>>>>
>>>> Becky
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 31, 2013 at 7:22 PM, Becky Ligon <[email protected]
>>>> <mailto:[email protected]>> wrote:
>>>>
>>>>     Thanks, Mike!
>>>>
>>>>     I ran some more tests hoping that the null-aio trove method would
>>>>     eliminate disk issues, but null-aio, as I just discovered, still
>>>>     allows files to be created. Doh!  So, I will be looking more in
>>>>     depth at our file creation process which includes metadata updates
>>>>     and file creation on the disk.
>>>>
>>>>     BTW:  I noticed that you are running 2.6.18-308.16.1.el5.584g0000
>>>>     on your servers and there is a known Linux bug concerning dcache
>>>>     processing that creates a kernel panic when OrangeFS is unmounted.
>>>>     This bug effects other software, too, not just ours.  Have you had
>>>>     any problems along these lines?  Our recommendation for those who
>>>>     want to stay on RHEL 5 is to use 2.6.18-308.
>>>>
>>>>     Becky
>>>>
>>>>
>>>>
>>>>     On Fri, May 31, 2013 at 6:33 PM, Michael Robbert <
>>>> [email protected]
>>>>     <mailto:[email protected]>> wrote:
>>>>
>>>>         Yes, please do. You have free reign on the nodes that I listed
>>>>         in my Email to you until this problem is solved.
>>>>
>>>>         Thanks,
>>>>         Mike
>>>>
>>>>
>>>>         On 5/31/13 4:23 PM, Becky Ligon wrote:
>>>>
>>>>             Mike:
>>>>
>>>>             Thanks for letting us onto your system.
>>>>
>>>>             We ran some more tests and it seems that file creation
>>>>             during the touch
>>>>             command is taking more time than it should, while metadata
>>>>             ops seem
>>>>             okay.   I dumped some more OFS debug data and will be
>>>>             looking at it over
>>>>             the weekend.  I want to pinpoint the precise places in the
>>>>             code that I
>>>>             *think* are taking time and then rerun more tests.  This may
>>>>             mean
>>>>             putting up a new copy of OFS with more specific debugging in
>>>>             it, if that
>>>>             is okay with you.  I also have more ideas on other tests
>>>>             that we can run
>>>>             to verify where the problem is occurring.
>>>>
>>>>             Is it okay if I log onto your system over the weekend?
>>>>
>>>>             Becky
>>>>
>>>>
>>>>             On Fri, May 31, 2013 at 3:24 PM, Becky Ligon
>>>>             <[email protected] <mailto:[email protected]>
>>>>             <mailto:[email protected] <mailto:[email protected]>>>
>>>> wrote:
>>>>
>>>>                  Mike:
>>>>
>>>>                   From the data you just sent, we see spikes in the
>>>>             touches as well
>>>>                  as the removes, with the removes being more frequent.
>>>>
>>>>                  For example, on the rm data, there is a spike of about
>>>>             2 orders of
>>>>                  magnitude (100x) about every 10 ops, which can result
>>>>             in a 10x
>>>>                  average slow down, even though most of the operations
>>>>             finish quite
>>>>                  quickly.  We do not normally see this, and we don't see
>>>>             it on our
>>>>                  systems here, so we are trying to decide what might
>>>>             cause this so we
>>>>                  can direct our efforts.
>>>>
>>>>                  At this point, we are trying to further diagnose the
>>>>             problem.  Would
>>>>                  it be possible for us to log onto your system to look
>>>>             around and
>>>>                  possibly run some more tests?
>>>>
>>>>                  I am sorry for the inconvenience this is causing, but
>>>>             rest assured,
>>>>                  several of us developers are trying to figure out the
>>>>             difference in
>>>>                  performance between your system and ours.  (We haven't
>>>>             been able to
>>>>                  recreate your problem as of yet.)
>>>>
>>>>
>>>>                  Becky
>>>>
>>>>
>>>>
>>>>                  On Fri, May 31, 2013 at 2:34 PM, Michael Robbert
>>>>             <[email protected] <mailto:[email protected]>
>>>>                  <mailto:[email protected]
>>>>             <mailto:[email protected]>>> wrote:
>>>>
>>>>                      My terminal buffers weren't big enough to copy and
>>>>             paste all of
>>>>                      that output, but hopefully the attached will have
>>>>             enough info
>>>>                      for you to get an idea of what I'm seeing.
>>>>                      I am beginning to feel like we're just running
>>>>             around in circles
>>>>                      here. I can do these kinds of tests with and
>>>>             without cache until
>>>>                      I'm blue in the face, but nothing is going to
>>>>             change until we
>>>>                      figure out why un-cached meta data access is so
>>>>             slow. What are
>>>>                      we doing to track that down?
>>>>
>>>>                      Thanks,
>>>>                      Mike
>>>>
>>>>
>>>>                      On 5/31/13 12:05 PM, Becky Ligon wrote:
>>>>
>>>>                          Mike:
>>>>
>>>>                          There is something going on with your system,
>>>>             as I am able
>>>>                          to touch 500
>>>>                          files in 12.5 seconds and delete them in 8.8
>>>>             seconds on our
>>>>                          cluster.
>>>>
>>>>                          Did you remove all of ATTR entries from your
>>>>             conf file and
>>>>                          restart the
>>>>                          servers?
>>>>
>>>>                          If not, please do so and then capture the
>>>>             output from the
>>>>                          following and
>>>>                          send it to me:
>>>>
>>>>                          for i in `seq 1 500`; do time touch myfile${i};
>>>>             done
>>>>
>>>>                          and then
>>>>
>>>>                          for i in myfile*; do time rm -f ${i}; done.
>>>>
>>>>
>>>>                          Thanks,
>>>>                          Becky
>>>>
>>>>
>>>>                          On Fri, May 31, 2013 at 12:02 PM, Michael
>>>> Robbert
>>>>                          <[email protected] <mailto:[email protected]
>>>> >
>>>>             <mailto:[email protected] <mailto:[email protected]>>
>>>>                          <mailto:[email protected]
>>>>             <mailto:[email protected]> <mailto:[email protected]
>>>>             <mailto:[email protected]>>>> wrote:
>>>>
>>>>                               top - 09:54:53 up 6 days, 19:11,  1 user,
>>>>               load
>>>>                          average: 0.00, 0.00,
>>>>                               0.00
>>>>                               Tasks: 156 total,   1 running, 155
>>>>             sleeping,   0
>>>>                          stopped,   0 zombie
>>>>                               Cpu(s):  0.1%us,  0.2%sy,  0.0%ni,
>>>>             99.8%id,  0.0%wa,
>>>>                            0.0%hi,
>>>>                                 0.0%si, 0.0%st
>>>>                               Mem:  12289220k total,  1322196k used,
>>>>             10967024k free,
>>>>                              85820k buffers
>>>>                               Swap:  2104432k total,      232k used,
>>>>               2104200k free,
>>>>                             965636k cached
>>>>
>>>>                               They all look very similar to this. 232k
>>>>             swap used on
>>>>                          all of them
>>>>                               throughout a touch/rm of 100 files.
>>>>             Ganglia doesn't
>>>>                          show any change
>>>>                               over time with cache on or off.
>>>>
>>>>                               Mike
>>>>
>>>>
>>>>                               On 5/31/13 9:30 AM, Becky Ligon wrote:
>>>>
>>>>                                   Michael:
>>>>
>>>>                                   Can you send me a screen shot of "top"
>>>>             from your
>>>>                          servers when the
>>>>                                   metadata is running on the local disk?
>>>>               I'd like to
>>>>                          see how much
>>>>                                   memory
>>>>                                   is available.  I'm wondering if 1GB
>>>>             for your DB
>>>>                          cache is too high,
>>>>                                   possibly causing excessive swapping.
>>>>
>>>>                                   Becky
>>>>
>>>>
>>>>                                   On Fri, May 24, 2013 at 6:06 PM,
>>>>             Michael Robbert
>>>>                                   <[email protected]
>>>>             <mailto:[email protected]> <mailto:[email protected]
>>>>             <mailto:[email protected]>>
>>>>                          <mailto:[email protected]
>>>>             <mailto:[email protected]> <mailto:[email protected]
>>>>             <mailto:[email protected]>>>
>>>>                                   <mailto:[email protected]
>>>>             <mailto:[email protected]>
>>>>                          <mailto:[email protected]
>>>>             <mailto:[email protected]>> <mailto:[email protected]
>>>>             <mailto:[email protected]>
>>>>                          <mailto:[email protected]
>>>>             <mailto:[email protected]>>>>**> wrote:
>>>>
>>>>                                        We recently noticed a performance
>>>>             problem with
>>>>                          our OrangeFS
>>>>                                   server.
>>>>
>>>>                                        Here are the server stats:
>>>>                                        3 servers, built identically with
>>>>             identical
>>>>                          hardware
>>>>
>>>>                                        [root@orangefs02 ~]#
>>>>             /usr/sbin/pvfs2-server
>>>>                          --version
>>>>                                        2.8.7-orangefs (mode:
>>>> aio-threaded)
>>>>
>>>>                                        [root@orangefs02 ~]# uname -r
>>>>                                        2.6.18-308.16.1.el5.584g0000
>>>>
>>>>                                        4 core E5603 1.60GHz
>>>>                                        12GB of RAM
>>>>
>>>>                                        OrangeFS is being served to
>>>>             clients using
>>>>                          bmi_tcp over DDR
>>>>                                   Infiniband.
>>>>                                        Backend storage is PanFS with
>>>> 2x10Gig
>>>>                          connections on the
>>>>                                   servers.
>>>>                                        Performance to the backend looks
>>>>             fine using
>>>>                          bonnie++.
>>>>                                    >100MB/sec
>>>>                                        write and ~250MB/s read to each
>>>>             stack. ~300
>>>>                          creates/sec.
>>>>
>>>>                                        On the OrangeFS clients are
>>>>             running kernel version
>>>>                                   2.6.18-238.19.1.el5.
>>>>
>>>>                                        The biggest problem I have right
>>>>             now is that
>>>>                          delete are
>>>>                                   taking a
>>>>                                        long time. Almost 1 sec per file.
>>>>
>>>>                                        [root@fatcompute-11-32
>>>>
>>>>               L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# find
>>>>
>>>>
>>>>                                        N2/|wc -l
>>>>                                        137
>>>>                                        [root@fatcompute-11-32
>>>>
>>>>               L_10_V0.2_eta0.3_wRes_________**truncerr1e-11]# time
>>>>
>>>>
>>>>
>>>>                                        rm -rf N2
>>>>
>>>>                                        real    1m31.096s
>>>>                                        user    0m0.000s
>>>>                                        sys     0m0.015s
>>>>
>>>>                                        Similar results for file creates:
>>>>
>>>>                                        [root@fatcompute-11-32 ]#
>>>>             date;for i in `seq 1
>>>>                          50`;do touch
>>>>                                        file${i};done;date
>>>>                                        Fri May 24 16:04:17 MDT 2013
>>>>                                        Fri May 24 16:05:05 MDT 2013
>>>>
>>>>                                        What else do you need to know?
>>>>             Which debug
>>>>                          flags? What
>>>>                                   should we be
>>>>                                        looking at?
>>>>                                        I don't see any load on the
>>>>             servers and I've
>>>>                          restarted
>>>>                                   server and
>>>>                                        rebooted server nodes.
>>>>
>>>>                                        Thanks for any pointers,
>>>>                                        Mike Robbert
>>>>                                        Colorado School of Mines
>>>>
>>>>
>>>>
>>>>
>>>>             ______________________________**_______________________
>>>>                                        Pvfs2-users mailing list
>>>>             
>>>> Pvfs2-users@beowulf-______**underground.org<Pvfs2-users@beowulf-______underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>> >
>>>>                          <mailto:Pvfs2-users@beowulf-__**
>>>> __underground.org <Pvfs2-users@beowulf-____underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>> >>
>>>>
>>>>               
>>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>> >
>>>>                          <mailto:Pvfs2-users@beowulf-__**
>>>> underground.org <Pvfs2-users@beowulf-__underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>>>> >>>
>>>>
>>>>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**____underground.org<Pvfs2-users@beowulf-______underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>> >
>>>>                          <mailto:Pvfs2-users@beowulf-__**
>>>> __underground.org <Pvfs2-users@beowulf-____underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>> >>
>>>>
>>>>               
>>>> <mailto:Pvfs2-users@beowulf-__**__underground.org<Pvfs2-users@beowulf-____underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-__**underground.org<Pvfs2-users@beowulf-__underground.org>
>>>> >
>>>>                          <mailto:Pvfs2-users@beowulf-__**
>>>> underground.org <Pvfs2-users@beowulf-__underground.org>
>>>>             
>>>> <mailto:Pvfs2-users@beowulf-**underground.org<[email protected]>
>>>> >>>>
>>>>
>>>>
>>>> http://www.beowulf-______**underground.org/mailman/______**
>>>> listinfo/pvfs2-users<http://www.beowulf-______underground.org/mailman/______listinfo/pvfs2-users>
>>>>
>>>>
>>>> <http://www.beowulf-____**underground.org/mailman/____**
>>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>>> >
>>>>
>>>>
>>>> <http://www.beowulf-____**underground.org/mailman/____**
>>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>>>
>>>> <http://www.beowulf-__**underground.org/mailman/__**
>>>> listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>>> >>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> <http://www.beowulf-____**underground.org/mailman/____**
>>>> listinfo/pvfs2-users<http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users>
>>>>
>>>> <http://www.beowulf-__**underground.org/mailman/__**
>>>> listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>>> >
>>>>
>>>>
>>>> <http://www.beowulf-__**underground.org/mailman/__**
>>>> listinfo/pvfs2-users<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>
>>>>
>>>> <http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>>> >>>
>>>>
>>>>
>>>>
>>>>
>>>>                                   --
>>>>                                   Becky Ligon
>>>>                                   OrangeFS Support and Development
>>>>                                   Omnibond Systems
>>>>                                   Anderson, South Carolina
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>                          --
>>>>                          Becky Ligon
>>>>                          OrangeFS Support and Development
>>>>                          Omnibond Systems
>>>>                          Anderson, South Carolina
>>>>
>>>>
>>>>
>>>>
>>>>                  --
>>>>                  Becky Ligon
>>>>                  OrangeFS Support and Development
>>>>                  Omnibond Systems
>>>>                  Anderson, South Carolina
>>>>
>>>>
>>>>
>>>>
>>>>             --
>>>>             Becky Ligon
>>>>             OrangeFS Support and Development
>>>>             Omnibond Systems
>>>>             Anderson, South Carolina
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>     --
>>>>     Becky Ligon
>>>>     OrangeFS Support and Development
>>>>     Omnibond Systems
>>>>     Anderson, South Carolina
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Becky Ligon
>>>> OrangeFS Support and Development
>>>> Omnibond Systems
>>>> Anderson, South Carolina
>>>>
>>>>
>>>
>>>
>>> ______________________________**_________________
>>> Pvfs2-users mailing list
>>> Pvfs2-users@beowulf-**underground.org<[email protected]>
>>> http://www.beowulf-**underground.org/mailman/**listinfo/pvfs2-users<http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>
>>>
>>>
>>
>> _______________________________________________
>> Pvfs2-users mailing list
>> [email protected]
>> http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users
>>
>>
>
>
> --
> Becky Ligon
> OrangeFS Support and Development
> Omnibond Systems
> Anderson, South Carolina
>
>


-- 
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Performance drop

Reply via email to