Re: [Pvfs2-users] Performance drop

Michael Robbert Fri, 31 May 2013 15:36:44 -0700

Yes, please do. You have free reign on the nodes that I listed in my Email to you until this problem is solved.


Thanks,
Mike


On 5/31/13 4:23 PM, Becky Ligon wrote:

Mike:

Thanks for letting us onto your system.

We ran some more tests and it seems that file creation during the touch
command is taking more time than it should, while metadata ops seem
okay.   I dumped some more OFS debug data and will be looking at it over
the weekend.  I want to pinpoint the precise places in the code that I
*think* are taking time and then rerun more tests.  This may mean
putting up a new copy of OFS with more specific debugging in it, if that
is okay with you.  I also have more ideas on other tests that we can run
to verify where the problem is occurring.

Is it okay if I log onto your system over the weekend?

Becky


On Fri, May 31, 2013 at 3:24 PM, Becky Ligon <[email protected]
<mailto:[email protected]>> wrote:

    Mike:

     From the data you just sent, we see spikes in the touches as well
    as the removes, with the removes being more frequent.

    For example, on the rm data, there is a spike of about 2 orders of
    magnitude (100x) about every 10 ops, which can result in a 10x
    average slow down, even though most of the operations finish quite
    quickly.  We do not normally see this, and we don't see it on our
    systems here, so we are trying to decide what might cause this so we
    can direct our efforts.

    At this point, we are trying to further diagnose the problem.  Would
    it be possible for us to log onto your system to look around and
    possibly run some more tests?

    I am sorry for the inconvenience this is causing, but rest assured,
    several of us developers are trying to figure out the difference in
    performance between your system and ours.  (We haven't been able to
    recreate your problem as of yet.)


    Becky



    On Fri, May 31, 2013 at 2:34 PM, Michael Robbert <[email protected]
    <mailto:[email protected]>> wrote:

        My terminal buffers weren't big enough to copy and paste all of
        that output, but hopefully the attached will have enough info
        for you to get an idea of what I'm seeing.
        I am beginning to feel like we're just running around in circles
        here. I can do these kinds of tests with and without cache until
        I'm blue in the face, but nothing is going to change until we
        figure out why un-cached meta data access is so slow. What are
        we doing to track that down?

        Thanks,
        Mike


        On 5/31/13 12:05 PM, Becky Ligon wrote:

            Mike:

            There is something going on with your system, as I am able
            to touch 500
            files in 12.5 seconds and delete them in 8.8 seconds on our
            cluster.

            Did you remove all of ATTR entries from your conf file and
            restart the
            servers?

            If not, please do so and then capture the output from the
            following and
            send it to me:

            for i in `seq 1 500`; do time touch myfile${i}; done

            and then

            for i in myfile*; do time rm -f ${i}; done.


            Thanks,
            Becky


            On Fri, May 31, 2013 at 12:02 PM, Michael Robbert
            <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>> wrote:

                 top - 09:54:53 up 6 days, 19:11,  1 user,  load
            average: 0.00, 0.00,
                 0.00
                 Tasks: 156 total,   1 running, 155 sleeping,   0
            stopped,   0 zombie
                 Cpu(s):  0.1%us,  0.2%sy,  0.0%ni, 99.8%id,  0.0%wa,
              0.0%hi,
                   0.0%si, 0.0%st
                 Mem:  12289220k total,  1322196k used, 10967024k free,
                85820k buffers
                 Swap:  2104432k total,      232k used,  2104200k free,
               965636k cached

                 They all look very similar to this. 232k swap used on
            all of them
                 throughout a touch/rm of 100 files. Ganglia doesn't
            show any change
                 over time with cache on or off.

                 Mike


                 On 5/31/13 9:30 AM, Becky Ligon wrote:

                     Michael:

                     Can you send me a screen shot of "top" from your
            servers when the
                     metadata is running on the local disk?  I'd like to
            see how much
                     memory
                     is available.  I'm wondering if 1GB for your DB
            cache is too high,
                     possibly causing excessive swapping.

                     Becky


                     On Fri, May 24, 2013 at 6:06 PM, Michael Robbert
                     <[email protected] <mailto:[email protected]>
            <mailto:[email protected] <mailto:[email protected]>>
                     <mailto:[email protected]
            <mailto:[email protected]> <mailto:[email protected]
            <mailto:[email protected]>>>> wrote:

                          We recently noticed a performance problem with
            our OrangeFS
                     server.

                          Here are the server stats:
                          3 servers, built identically with identical
            hardware

                          [root@orangefs02 ~]# /usr/sbin/pvfs2-server
            --version
                          2.8.7-orangefs (mode: aio-threaded)

                          [root@orangefs02 ~]# uname -r
                          2.6.18-308.16.1.el5.584g0000

                          4 core E5603 1.60GHz
                          12GB of RAM

                          OrangeFS is being served to clients using
            bmi_tcp over DDR
                     Infiniband.
                          Backend storage is PanFS with 2x10Gig
            connections on the
                     servers.
                          Performance to the backend looks fine using
            bonnie++.
                      >100MB/sec
                          write and ~250MB/s read to each stack. ~300
            creates/sec.

                          On the OrangeFS clients are running kernel version
                     2.6.18-238.19.1.el5.

                          The biggest problem I have right now is that
            delete are
                     taking a
                          long time. Almost 1 sec per file.

                          [root@fatcompute-11-32
                     L_10_V0.2_eta0.3_wRes_______truncerr1e-11]# find

                          N2/|wc -l
                          137
                          [root@fatcompute-11-32
                     L_10_V0.2_eta0.3_wRes_______truncerr1e-11]# time


                          rm -rf N2

                          real    1m31.096s
                          user    0m0.000s
                          sys     0m0.015s

                          Similar results for file creates:

                          [root@fatcompute-11-32 ]# date;for i in `seq 1
            50`;do touch
                          file${i};done;date
                          Fri May 24 16:04:17 MDT 2013
                          Fri May 24 16:05:05 MDT 2013

                          What else do you need to know? Which debug
            flags? What
                     should we be
                          looking at?
                          I don't see any load on the servers and I've
            restarted
                     server and
                          rebooted server nodes.

                          Thanks for any pointers,
                          Mike Robbert
                          Colorado School of Mines



              ___________________________________________________
                          Pvfs2-users mailing list
            Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>
                     <mailto:Pvfs2-users@beowulf-__underground.org
            <mailto:[email protected]>>

              <mailto:Pvfs2-users@beowulf-____underground.org
            <mailto:Pvfs2-users@beowulf-__underground.org>
                     <mailto:Pvfs2-users@beowulf-__underground.org
            <mailto:[email protected]>>>

            
http://www.beowulf-____underground.org/mailman/____listinfo/pvfs2-users
            
<http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users>


            <http://www.beowulf-__underground.org/mailman/__listinfo/pvfs2-users
            <http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users>>




                     --
                     Becky Ligon
                     OrangeFS Support and Development
                     Omnibond Systems
                     Anderson, South Carolina





            --
            Becky Ligon
            OrangeFS Support and Development
            Omnibond Systems
            Anderson, South Carolina




    --
    Becky Ligon
    OrangeFS Support and Development
    Omnibond Systems
    Anderson, South Carolina




--
Becky Ligon
OrangeFS Support and Development
Omnibond Systems
Anderson, South Carolina

smime.p7s
Description: S/MIME Cryptographic Signature

_______________________________________________
Pvfs2-users mailing list
[email protected]
http://www.beowulf-underground.org/mailman/listinfo/pvfs2-users

Re: [Pvfs2-users] Performance drop

Reply via email to