Re: [MTT users] MTT username/password and report upload
I am also interested in some help moving to the Python client. Since a few of us that are interested are going to be at the face-to-face next week, could Howard (or someone familiar with setting it up for OMPI testing) give us a presentation on it? I'll add it to the agenda. The initial hurdle of getting started has prevented me from taking the leap, but maybe next week is a good opportunity to do so. On Tue, Mar 13, 2018 at 9:14 PM, Kawashima, Takahiro < t-kawash...@jp.fujitsu.com> wrote: > Jeff, > > Thank you. I received the password. I cannot remember I had received it > before... > > My colleague was working using the Perl client before but the work was > suspended because his job was changed. It is the reason we use the Perl > client currently. We want to change it to the Python client if the change > does not require much work. > > Takahiro Kawashima, > MPI development team, > Fujitsu > > > Yes, it's trivial to reset the Fujitsu MTT password -- I'll send you a > mail off-list with the new password. > > > > If you're just starting up with MTT, you might want to use the Python > client, instead. That's where 95% of ongoing development is occurring. > > > > If all goes well, I plan to sit down with Howard + Ralph next week and > try converting Cisco's Perl config to use the Python client. > > > > > > > On Mar 12, 2018, at 10:23 PM, Kawashima, Takahiro < > t-kawash...@jp.fujitsu.com> wrote: > > > > > > Hi, > > > > > > Fujitsu has resumed the work of running MTT on our machines. > > > > > > Could you give me username/password to upload the report to the server? > > > I suppose the username is "fujitsu". > > > > > > Our machines are not connected to the Internet directly. > > > Is there a document for uploading the report outside the MTT run? > > > We are using Perl client currently. > > > > > > I'll attend the OMPI developer's meeting next week. > > > I hope we can talk about it. > > > > > > Takahiro Kawashima, > > > MPI development team, > > > Fujitsu > > ___ > mtt-users mailing list > mtt-users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/mtt-users > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/mtt-users
Re: [MTT users] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)
MTT Service has been successfully moved to the AWS servers. Everything looks good so far, so folks can start using it. Note that the DNS record had to be updated. This -should- have propagated everywhere by now. I had to clear my browser cache to get it to resolve correctly the first time. This move requires -no- changes to any of your MTT client setups. Let me know if you have any issues. -- Josh On Fri, Oct 21, 2016 at 9:53 PM, Josh Hursey <jjhur...@open-mpi.org> wrote: > I have taken down the MTT Reporter at mtt.open-mpi.org while we finish up > the migration. I'll send out another email when everything is up and > running again. > > On Fri, Oct 21, 2016 at 10:17 AM, Josh Hursey <jjhur...@open-mpi.org> > wrote: > >> Reminder that the MTT will go offline starting at *Noon US Eastern (11 >> am US Central) today*. >> >> Any MTT client submissions to the MTT database will return in error >> during this window of downtime. I will try to keep the MTT Reporter >> interface as available as possible (although permalinks will not be >> available) at the normal URL. >> https://mtt.open-mpi.org >> However, there will be a time when that will go down as well. I'll send a >> note when that occurs. >> >> I will send another email once MTT is back online. >> >> Thank you for your patience. Let me know if you have any questions. >> >> -- Josh >> >> >> On Wed, Oct 19, 2016 at 10:14 AM, Josh Hursey <jjhur...@open-mpi.org> >> wrote: >> >>> Based on current estimates we need to extend the window of downtime for >>> MTT to 24 hours. >>> >>> *Start time*: *Fri., Oct. 21, 2016 at Noon US Eastern* (11 am US >>> Central) >>> *End time*: *Sat., Oct. 22, 2016 at Noon US Eastern* (estimated) >>> >>> I will send an email just before taking down the MTT site on Friday, and >>> another once it is back up on Sat. >>> >>> During this time all of the MTT services will be down - MTT Reporter and >>> MTT submission interface. If you have an MTT client running during this >>> time you will receive an error message if you try to submit results to the >>> MTT server. >>> >>> Let me know if you have any questions or concerns. >>> >>> >>> On Tue, Oct 18, 2016 at 10:59 AM, Josh Hursey <jjhur...@open-mpi.org> >>> wrote: >>> >>>> We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern* >>>> . >>>> >>>> We hit a snag with the AWS configuration that we are working through. >>>> >>>> On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey <jjhur...@open-mpi.org> >>>> wrote: >>>> >>>>> I will announce this on the Open MPI developer's teleconf on Tuesday, >>>>> before the move. >>>>> >>>>> Geoff - Please add this item to the agenda. >>>>> >>>>> >>>>> Short version: >>>>> --- >>>>> MTT server (mtt.open-mpi.org) will be going down for maintenance on >>>>> Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT >>>>> Reporter and the MTT client submission interface will not be accessible. I >>>>> will send an email out when the service is back online. >>>>> >>>>> >>>>> Longer version: >>>>> --- >>>>> We need to move the MTT Server/Database from the IU server to the AWS >>>>> server. This move will be completely transparent to users submitting to >>>>> the >>>>> database, except for a window of downtime to move the database. >>>>> >>>>> I estimate that moving the database will take about two hours. So I >>>>> have blocked off three hours to give us time to test, and redirect the DNS >>>>> record. >>>>> >>>>> Once the service comes back online, you should be able to access MTT >>>>> using themtt.open-mpi.org URL. No changes are needed in your MTT >>>>> client setup, and all permalinks are expected to still work after the >>>>> move. >>>>> >>>>> >>>>> Let me know if you have any questions or concerns about the move. >>>>> >>>>> >>>>> -- >>>>> Josh Hursey >>>>> IBM Spectrum MPI Developer >>>>> >>>> >>>> >>>> >>>> -- >>>> Josh Hursey >>>> IBM Spectrum MPI Developer >>>> >>> >>> >>> >>> -- >>> Josh Hursey >>> IBM Spectrum MPI Developer >>> >> >> >> >> -- >> Josh Hursey >> IBM Spectrum MPI Developer >> > > > > -- > Josh Hursey > IBM Spectrum MPI Developer > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)
I have taken down the MTT Reporter at mtt.open-mpi.org while we finish up the migration. I'll send out another email when everything is up and running again. On Fri, Oct 21, 2016 at 10:17 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Reminder that the MTT will go offline starting at *Noon US Eastern (11 am > US Central) today*. > > Any MTT client submissions to the MTT database will return in error during > this window of downtime. I will try to keep the MTT Reporter interface as > available as possible (although permalinks will not be available) at the > normal URL. > https://mtt.open-mpi.org > However, there will be a time when that will go down as well. I'll send a > note when that occurs. > > I will send another email once MTT is back online. > > Thank you for your patience. Let me know if you have any questions. > > -- Josh > > > On Wed, Oct 19, 2016 at 10:14 AM, Josh Hursey <jjhur...@open-mpi.org> > wrote: > >> Based on current estimates we need to extend the window of downtime for >> MTT to 24 hours. >> >> *Start time*: *Fri., Oct. 21, 2016 at Noon US Eastern* (11 am US Central) >> *End time*: *Sat., Oct. 22, 2016 at Noon US Eastern* (estimated) >> >> I will send an email just before taking down the MTT site on Friday, and >> another once it is back up on Sat. >> >> During this time all of the MTT services will be down - MTT Reporter and >> MTT submission interface. If you have an MTT client running during this >> time you will receive an error message if you try to submit results to the >> MTT server. >> >> Let me know if you have any questions or concerns. >> >> >> On Tue, Oct 18, 2016 at 10:59 AM, Josh Hursey <jjhur...@open-mpi.org> >> wrote: >> >>> We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. >>> >>> We hit a snag with the AWS configuration that we are working through. >>> >>> On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey <jjhur...@open-mpi.org> >>> wrote: >>> >>>> I will announce this on the Open MPI developer's teleconf on Tuesday, >>>> before the move. >>>> >>>> Geoff - Please add this item to the agenda. >>>> >>>> >>>> Short version: >>>> --- >>>> MTT server (mtt.open-mpi.org) will be going down for maintenance on >>>> Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT >>>> Reporter and the MTT client submission interface will not be accessible. I >>>> will send an email out when the service is back online. >>>> >>>> >>>> Longer version: >>>> --- >>>> We need to move the MTT Server/Database from the IU server to the AWS >>>> server. This move will be completely transparent to users submitting to the >>>> database, except for a window of downtime to move the database. >>>> >>>> I estimate that moving the database will take about two hours. So I >>>> have blocked off three hours to give us time to test, and redirect the DNS >>>> record. >>>> >>>> Once the service comes back online, you should be able to access MTT >>>> using themtt.open-mpi.org URL. No changes are needed in your MTT >>>> client setup, and all permalinks are expected to still work after the move. >>>> >>>> >>>> Let me know if you have any questions or concerns about the move. >>>> >>>> >>>> -- >>>> Josh Hursey >>>> IBM Spectrum MPI Developer >>>> >>> >>> >>> >>> -- >>> Josh Hursey >>> IBM Spectrum MPI Developer >>> >> >> >> >> -- >> Josh Hursey >> IBM Spectrum MPI Developer >> > > > > -- > Josh Hursey > IBM Spectrum MPI Developer > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)
Reminder that the MTT will go offline starting at *Noon US Eastern (11 am US Central) today*. Any MTT client submissions to the MTT database will return in error during this window of downtime. I will try to keep the MTT Reporter interface as available as possible (although permalinks will not be available) at the normal URL. https://mtt.open-mpi.org However, there will be a time when that will go down as well. I'll send a note when that occurs. I will send another email once MTT is back online. Thank you for your patience. Let me know if you have any questions. -- Josh On Wed, Oct 19, 2016 at 10:14 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Based on current estimates we need to extend the window of downtime for > MTT to 24 hours. > > *Start time*: *Fri., Oct. 21, 2016 at Noon US Eastern* (11 am US Central) > *End time*: *Sat., Oct. 22, 2016 at Noon US Eastern* (estimated) > > I will send an email just before taking down the MTT site on Friday, and > another once it is back up on Sat. > > During this time all of the MTT services will be down - MTT Reporter and > MTT submission interface. If you have an MTT client running during this > time you will receive an error message if you try to submit results to the > MTT server. > > Let me know if you have any questions or concerns. > > > On Tue, Oct 18, 2016 at 10:59 AM, Josh Hursey <jjhur...@open-mpi.org> > wrote: > >> We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. >> >> We hit a snag with the AWS configuration that we are working through. >> >> On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey <jjhur...@open-mpi.org> >> wrote: >> >>> I will announce this on the Open MPI developer's teleconf on Tuesday, >>> before the move. >>> >>> Geoff - Please add this item to the agenda. >>> >>> >>> Short version: >>> --- >>> MTT server (mtt.open-mpi.org) will be going down for maintenance on >>> Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT >>> Reporter and the MTT client submission interface will not be accessible. I >>> will send an email out when the service is back online. >>> >>> >>> Longer version: >>> --- >>> We need to move the MTT Server/Database from the IU server to the AWS >>> server. This move will be completely transparent to users submitting to the >>> database, except for a window of downtime to move the database. >>> >>> I estimate that moving the database will take about two hours. So I have >>> blocked off three hours to give us time to test, and redirect the DNS >>> record. >>> >>> Once the service comes back online, you should be able to access MTT >>> using themtt.open-mpi.org URL. No changes are needed in your MTT client >>> setup, and all permalinks are expected to still work after the move. >>> >>> >>> Let me know if you have any questions or concerns about the move. >>> >>> >>> -- >>> Josh Hursey >>> IBM Spectrum MPI Developer >>> >> >> >> >> -- >> Josh Hursey >> IBM Spectrum MPI Developer >> > > > > -- > Josh Hursey > IBM Spectrum MPI Developer > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] MTT Server Downtime - Fri., Oct. 21, 2016 (Updated)
Based on current estimates we need to extend the window of downtime for MTT to 24 hours. *Start time*: *Fri., Oct. 21, 2016 at Noon US Eastern* (11 am US Central) *End time*: *Sat., Oct. 22, 2016 at Noon US Eastern* (estimated) I will send an email just before taking down the MTT site on Friday, and another once it is back up on Sat. During this time all of the MTT services will be down - MTT Reporter and MTT submission interface. If you have an MTT client running during this time you will receive an error message if you try to submit results to the MTT server. Let me know if you have any questions or concerns. On Tue, Oct 18, 2016 at 10:59 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. > > We hit a snag with the AWS configuration that we are working through. > > On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey <jjhur...@open-mpi.org> > wrote: > >> I will announce this on the Open MPI developer's teleconf on Tuesday, >> before the move. >> >> Geoff - Please add this item to the agenda. >> >> >> Short version: >> --- >> MTT server (mtt.open-mpi.org) will be going down for maintenance on >> Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT >> Reporter and the MTT client submission interface will not be accessible. I >> will send an email out when the service is back online. >> >> >> Longer version: >> --- >> We need to move the MTT Server/Database from the IU server to the AWS >> server. This move will be completely transparent to users submitting to the >> database, except for a window of downtime to move the database. >> >> I estimate that moving the database will take about two hours. So I have >> blocked off three hours to give us time to test, and redirect the DNS >> record. >> >> Once the service comes back online, you should be able to access MTT >> using themtt.open-mpi.org URL. No changes are needed in your MTT client >> setup, and all permalinks are expected to still work after the move. >> >> >> Let me know if you have any questions or concerns about the move. >> >> >> -- >> Josh Hursey >> IBM Spectrum MPI Developer >> > > > > -- > Josh Hursey > IBM Spectrum MPI Developer > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] MTT Server Downtime - Tues., Oct. 18, 2016
We are moving this downtime to *Friday, Oct. 21 from 2-5 pm US Eastern*. We hit a snag with the AWS configuration that we are working through. On Sun, Oct 16, 2016 at 9:53 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > I will announce this on the Open MPI developer's teleconf on Tuesday, > before the move. > > Geoff - Please add this item to the agenda. > > > Short version: > --- > MTT server (mtt.open-mpi.org) will be going down for maintenance on > Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT > Reporter and the MTT client submission interface will not be accessible. I > will send an email out when the service is back online. > > > Longer version: > --- > We need to move the MTT Server/Database from the IU server to the AWS > server. This move will be completely transparent to users submitting to the > database, except for a window of downtime to move the database. > > I estimate that moving the database will take about two hours. So I have > blocked off three hours to give us time to test, and redirect the DNS > record. > > Once the service comes back online, you should be able to access MTT using > themtt.open-mpi.org URL. No changes are needed in your MTT client setup, > and all permalinks are expected to still work after the move. > > > Let me know if you have any questions or concerns about the move. > > > -- > Josh Hursey > IBM Spectrum MPI Developer > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
[MTT users] MTT Server Downtime - Tues., Oct. 18, 2016
I will announce this on the Open MPI developer's teleconf on Tuesday, before the move. Geoff - Please add this item to the agenda. Short version: --- MTT server (mtt.open-mpi.org) will be going down for maintenance on Tuesday, Oct. 18, 2016 from 2-5 pm US Eastern. During this time the MTT Reporter and the MTT client submission interface will not be accessible. I will send an email out when the service is back online. Longer version: --- We need to move the MTT Server/Database from the IU server to the AWS server. This move will be completely transparent to users submitting to the database, except for a window of downtime to move the database. I estimate that moving the database will take about two hours. So I have blocked off three hours to give us time to test, and redirect the DNS record. Once the service comes back online, you should be able to access MTT using themtt.open-mpi.org URL. No changes are needed in your MTT client setup, and all permalinks are expected to still work after the move. Let me know if you have any questions or concerns about the move. -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] Open MPI - MTT perf
The MTT perl client (./client/mtt) has some performance gathering and reporting built in as modules. These modules know how to breakdown the results from, say, NetPipe and package it up to send to the MTT Server for storage. The available modules are here: https://github.com/open-mpi/mtt/tree/master/lib/MTT/Test/Analyze/Performance There are a couple examples in this sample file on how to set it up in your nightly runs (links to NetPipe below, but there are others in the file): https://github.com/open-mpi/mtt/blob/master/samples/perl/ompi-core-template.ini#L458-L462 https://github.com/open-mpi/mtt/blob/master/samples/perl/ompi-core-template.ini#L532-L541 https://github.com/open-mpi/mtt/blob/master/samples/perl/ompi-core-template.ini#L692-L704 The MTT Reporter will display the performance in the Test Run results column. https://mtt.open-mpi.org/index.php?do_redir=2349 Currently, the report generated by MTT could use some work. I actually don't know if the Performance button works any more. We started brainstorming ways to improve this aspect of MTT at the Open MPI developers meeting in Aug. 2016. We are just getting started on that. If you are interested in participating in those discussions let us know. Otherwise we will try to keep the list updated on any new performance features. On Thu, Sep 1, 2016 at 2:01 AM, Christoph Niethammer <nietham...@hlrs.de> wrote: > Hi Jeff, > > I recognized, that there is a perf field in the MTT database. > Unfortunately I could not find something in the wiki. > Can anyone give me a link or some information about about it? > > How to report results there? And can one automatically detect > degradations/compare version results? > > Best > Christoph Niethammer > > -- > > Christoph Niethammer > High Performance Computing Center Stuttgart (HLRS) > Nobelstrasse 19 > 70569 Stuttgart > > Tel: ++49(0)711-685-87203 > email: nietham...@hlrs.de > http://www.hlrs.de/people/niethammer > ___ > mtt-users mailing list > mtt-users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] Pyclient fails to report to IU database
This should be the correct (current) url: https://mtt.open-mpi.org/submit/cpy/api/ We might want to change it in the future, but the following is just for the perl client: https://mtt.open-mpi.org/submit/ It looks like the service at IU has been down since May 13. I just restarted it so you should be fine now. It should have auto-restarted, so we'll have to look into why that didn't happen when we move it over to the new server next week. On Wed, Aug 10, 2016 at 6:14 PM, r...@open-mpi.org <r...@open-mpi.org> wrote: > Hi folks > > I hit the following error when trying to upload results from Pyclient to > IU Reporter: > > <<<<<<< Response -->>>>>> > Result: 404: text/html; charset=iso-8859-1 > {'date': 'Wed, 10 Aug 2016 23:07:03 GMT', 'content-length': '296', > 'content-type': 'text/html; charset=iso-8859-1', 'connection': 'close', > 'server': 'Apache/2.2.15 (Red Hat)'} > Not Found > <<<<<<< Raw Output (Start) >>>>>> > > > 404 Not Found > > Not Found > The requested URL /submit//submit was not found on this server. > > Apache/2.2.15 (Red Hat) Server at mtt.open-mpi.org Port > 443 > > > <<<<<<< Raw Output (End ) >>>>>> > > Here was my .ini stage: > > [Reporter:IUdatabase] > plugin = IUDatabase > > realm = OMPI > username = intel > pwfile = /home/common/mttpwd.txt > platform = bend-rsh > hostname = rhc00[1-2] > url = https://mtt.open-mpi.org/submit/ > email = r...@open-mpi.org > > #------ > > Is the URL incorrect? > Ralph > > ___ > mtt-users mailing list > mtt-users@lists.open-mpi.org > https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users > -- Josh Hursey IBM Spectrum MPI Developer ___ mtt-users mailing list mtt-users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/mtt-users
Re: [MTT users] Python client
I think this is fine. If we do start to organize ourselves for a formal release then we might want to move to pull requests to keep the branch stable for a bit, but for now this is ok with me. The Python client looks like it will be a nice addition. Hopefully, I will have the REST submission interface finished in the next couple months and that will make it easier to submit to the DB. I probably won't get cycles to finish that work until after Jan 12. I plan to have it done before the end of Jan - let me know if you need it before then. On Thu, Dec 10, 2015 at 1:10 PM, Ralph Castainwrote: > Hey folks > > I'm working on the Python client and it is coming along pretty well. The > code is completely separate from the Perl-based client and doesn't interact > with it, so I would like to push it into the repo on an on-going basis so > others can look at it and comment as I go rather than hold it until it is > "complete". > > Any objections? Obviously, it won't be fully functional at times - mostly > available for architectural review and directional suggestions. > > Ralph > > > ___ > mtt-users mailing list > mtt-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users > Link to this post: > http://www.open-mpi.org/community/lists/mtt-users/2015/12/0834.php > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
Re: [MTT users] Actual releases?
I think that would be good. I won't have any cycles to help until after the first of the year. We started working towards a release way back when, but I think we got stuck with the license to package up the graphing library for the MTT Reporter. We could just remove that feature from the release since the new reporter will do something different. Releasing where we are now should be pretty straight forward if folks just want to posted a versioned tarball. We would have to assess how to get MTT into a more packaged configuration (e.g., rpm) if folks want that. On Wed, Dec 9, 2015 at 11:36 AM, Ralph Castainwrote: > Hey folks > > There is interest in packaging MTT in the OpenHPC distribution. However, > we don't actually have "releases" of MTT. Any objection to actually > tagging/releasing versions? > > Ralph > > > ___ > mtt-users mailing list > mtt-us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users > Searchable archives: > http://www.open-mpi.org/community/lists/mtt-users/2015/12/0832.php > -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
Re: [MTT users] [OMPI devel] Open MPI MTT is moving
The MTT server migration went well this weekend. I have updated the Open MPI website to redirect you appropriately to the new MTT Reporter. You will need to update your .ini files to submit your tests to the new server at the address below: https://mtt.open-mpi.org/submit/ Let me know if you experience any problems with the new server. -- Josh On Fri, Nov 2, 2012 at 9:26 AM, Josh Hursey <jjhur...@open-mpi.org> wrote: > Reminder that we will be shutting down the MTT submission and reporter > services this weekend to migrate it to another machine. The MTT > services will go offline at COB today, and be brought back by Monday > morning. > > > On Wed, Oct 31, 2012 at 7:54 AM, Jeff Squyres <jsquy...@cisco.com> wrote: >> *** IF YOU RUN MTT, YOU NEED TO READ THIS. >> >> Due to some server re-organization at Indiana University (read: our gracious >> hosting provider), we are moving the Open MPI community MTT database to a >> new server. Instead of being found under www.open-mpi.org/mtt/, the OMPI >> MTT results will soon be located under mtt.open-mpi.org. >> >> Josh and I have been running tests on the new server and we think it is >> ready; it's now time to move the rest of the community to it. >> >> 1. In order to make this change, we need some "quiet time" where no one is >> submitting new MTT results. As such, we will be shutting down >> MTT/disallowing new MTT submissions over this upcoming weekend: from COB >> Friday, 2 Nov 2012 through Monday morning, 5 Nov 2012 (all times US Central). >> >> ** Translation: don't expect submissions or queries to work after about 5pm >> on Friday through about 8am Monday (US Central). >> >> Super obvious translation: turn off your MTT runs this weekend. >> >> 2. After this weekend, you will need to update your MTT submission URL from: >> >> https://www.open-mpi.org/mtt/submit/ >> >> to >> >> https://mtt.open-mpi.org/submit/ >> >> Thanks! >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> For corporate legal information go to: >> http://www.cisco.com/web/about/doing_business/legal/cri/ >> >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > > > -- > Joshua Hursey > Assistant Professor of Computer Science > University of Wisconsin-La Crosse > http://cs.uwlax.edu/~jjhursey -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
Re: [MTT users] [OMPI devel] Open MPI MTT is moving
Reminder that we will be shutting down the MTT submission and reporter services this weekend to migrate it to another machine. The MTT services will go offline at COB today, and be brought back by Monday morning. On Wed, Oct 31, 2012 at 7:54 AM, Jeff Squyreswrote: > *** IF YOU RUN MTT, YOU NEED TO READ THIS. > > Due to some server re-organization at Indiana University (read: our gracious > hosting provider), we are moving the Open MPI community MTT database to a new > server. Instead of being found under www.open-mpi.org/mtt/, the OMPI MTT > results will soon be located under mtt.open-mpi.org. > > Josh and I have been running tests on the new server and we think it is > ready; it's now time to move the rest of the community to it. > > 1. In order to make this change, we need some "quiet time" where no one is > submitting new MTT results. As such, we will be shutting down > MTT/disallowing new MTT submissions over this upcoming weekend: from COB > Friday, 2 Nov 2012 through Monday morning, 5 Nov 2012 (all times US Central). > > ** Translation: don't expect submissions or queries to work after about 5pm > on Friday through about 8am Monday (US Central). > > Super obvious translation: turn off your MTT runs this weekend. > > 2. After this weekend, you will need to update your MTT submission URL from: > > https://www.open-mpi.org/mtt/submit/ > > to > > https://mtt.open-mpi.org/submit/ > > Thanks! > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Joshua Hursey Assistant Professor of Computer Science University of Wisconsin-La Crosse http://cs.uwlax.edu/~jjhursey
Re: [MTT users] Fwd: [Alert] Found server-side submit error messages
This is probably me. I haven't had a chance to do anything about it yet. Hopefully tomorrow. I'm running the release branch (I believe), does this option exist for the release branch yet? -- Josh On Oct 30, 2008, at 11:36 AM, Ethan Mallove wrote: On Wed, Oct/29/2008 09:15:37AM, Ethan Mallove wrote: On Tue, Oct/28/2008 06:17:12PM, Jeff Squyres wrote: Should we set a default value of report_after_n_results to, say, 50 or 100? We should. -Ethan On Oct 28, 2008, at 6:15 PM, Jeff Squyres wrote: That host is in one of IU's clusters (odin). Tim/Josh -- this is you guys... Got another submit.php failure alert last night from IU. If the IU tests are running on the MTT trunk, an "svn up" on it should eliminate the issue. (report_after_n_results now defaults to 100 - see r1239.) -Ethan On Oct 28, 2008, at 3:45 PM, Ethan Mallove wrote: Folks, I got an alert from the http-log-checker.pl script. Somebody appears to have lost some MTT results. (Possibly due to an oversized database submission to submit/index.php?) There's an open ticket for this (see https://svn.open-mpi.org/trac/mtt/ticket/375). Currently there exists a simple workaround for this problem. Put the below line in the problematic "Test run" section(s). This will prevent oversided submissions by directing MTT to submit the results in batches of 50 results instead of an entire section at a time, which can reach 400+ for an Intel test run section. report_after_n_results = 50 It's hard to know whose errors are in the HTTP error log with only the IP address. If you want to verify they are or are not yours, visit a bogus URL off open-mpi.org, e.g., www.open-mpi.org/what-is-foobars-ip-address, and ping me about it. This will write your IP address to the log file, and then this can be matched with the IP addr against the submit.php errors. -Ethan - Forwarded message from Ethan Mallove- From: Ethan Mallove Date: Tue, 28 Oct 2008 08:00:41 -0400 To: ethan.mall...@sun.com, http-log-checker.pl-no-re...@open-mpi.org Subject: [Alert] Found server-side submit error messages Original-recipient: rfc822;ethan.mall...@sun.com This email was automatically sent by http-log-checker.pl. You have received it because some error messages were found in the HTTP(S) logs that might indicate some MTT results were not successfully submitted by the server-side PHP submit script (even if the MTT client has not indicated a submission error). ### # # The below log messages matched "gz.*submit/index.php" in # /var/log/httpd/www.open-mpi.org/ssl_error_log # ### [client 129.79.240.114] PHP Warning: gzeof(): supplied argument is not a valid stream resource in /nfs/rontok/xraid/data/osl/www/www.open-mpi.org/mtt/submit/index.php on line 1923 [client 129.79.240.114] PHP Warning: gzgets(): supplied argument is not a valid stream resource in /nfs/rontok/xraid/data/osl/www/www.open-mpi.org/mtt/submit/index.php on line 1924 ... - End forwarded message - ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] MTT server side problem
Pasha, Looking at the patch I'm a little bit conserned. The "get_table_fields()" is, as you mentioned, no longer used so should be removed. However the other functions are critical to the submission script particularly 'do_pg_connect' which opens the connection to the backend database. Are you using the current development trunk (mtt/trunk) or the stable release branch (mtt/branches/ompi-core-testers)? Can you send us the error messages that you were receiving? Cheers, Josh On May 7, 2008, at 4:49 AM, Pavel Shamis (Pasha) wrote: Hi, I upgraded the server side (the mtt is still running , so don't know if the problem was resolved) During upgrade I had some problem with the submit/index.php script, it had some duplicated functions and some of them were broken. Please review the attached patch. Pasha Ethan Mallove wrote: On Tue, May/06/2008 06:29:33PM, Pavel Shamis (Pasha) wrote: I'm not sure which cron jobs you're referring to. Do you mean these? https://svn.open-mpi.org/trac/mtt/browser/trunk/server/php/cron I talked about this one: https://svn.open-mpi.org/trac/mtt/wiki/ServerMaintenance I'm guessing you would only be concerned with the below periodic-maintenance.pl script, which just runs ANALYZE/VACUUM queries. I think you can start that up whenever you want (and it should optimize the Reporter). https://svn.open-mpi.org/trac/mtt/browser/trunk/server/sql/cron/periodic-maintenance.pl -Ethan The only thing there are the regular mtt-resu...@open-mpi.org email alerts and some out-of-date DB monitoring junk. You can ignore that stuff. Josh, are there some nightly (DB pruning/cleaning/vacuuming?) cron jobs that Pasha should be running? -Ethan Thanks. Ethan Mallove wrote: Hi Pasha, I thought this issue was solved in r1119 (see below). Do you have the latest mtt/server scripts? https://svn.open-mpi.org/trac/mtt/changeset/1119/trunk/server/php/submit -Ethan On Tue, May/06/2008 03:26:43PM, Pavel Shamis (Pasha) wrote: About the issue: 1. On client side I see ""*** WARNING: MTTDatabase client did not get a serial" As result of the error some of MTT results is not visible via the web reporter 2. On server side I found follow error message: [client 10.4.3.214] PHP Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 23592960 bytes) in /.autodirect/swgwork/MTT/mtt/submit/index.php(79) : eval()'d code on line 77515 [Mon May 05 19:26:05 2008] [notice] caught SIGTERM, shutting down [Mon May 05 19:30:54 2008] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon May 05 19:30:54 2008] [notice] Digest: generating secret for digest authentication ... [Mon May 05 19:30:54 2008] [notice] Digest: done [Mon May 05 19:30:54 2008] [notice] LDAP: Built with OpenLDAP LDAP SDK [Mon May 05 19:30:54 2008] [notice] LDAP: SSL support unavailable My memory limit in php.ini file was set on 256MB ! Any ideas ? Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies Index: submit/index.php === --- submit/index.php(revision 1200) +++ submit/index.php(working copy) @@ -1,6 +1,7 @@ +# Copyright (c) 2008 Mellanox Technologies. All rights reserved. # # @@ -24,8 +25,7 @@ if (file_exists("$topdir/config.inc")) { ini_set("memory_limit", "32M"); $topdir = '..'; -$ompi_home = '/l/osl/www/doc/www.open-mpi.org'; -include_once("$ompi_home/dbpassword.inc"); +include_once("$topdir/database.inc"); include_once("$topdir/reporter.inc"); @@ -1465,60 +1465,6 @@ function get_table_indexes($table_name, return simple_select($sql_cmd); } -# Function used to determine which _POST fields -# to INSERT. Prevent non-existent fields from being -# INSERTed -function get_table_fields($table_name) { - -global $dbname; -global $id; - -# These indexes are special in that they link phases -# together and hence, can and do show up in _POST -if ($table_name == "test_build") -$special_indexes = array("mpi_install$id"); -elseif ($table_name == "test_run") -$special_indexes = array("test_build$id"); - -# Crude way to tell whether a field is an index -$is_not_index_clause = - "\n\t (table_name = '$table_name' AND NOT " . - "\n\t (data_type = 'integer' AND " . - "\n\t column_name ~ '_id$' AND " . -
Re: [MTT users] MTT server side problem
Pasha, All of the scripts can be run whenever. They should not be saving state between runs, so there should not be any bad effects on the database by starting them up late in the game. The 'periodic-maintenance.pl' script is a postgresql cleaning/ vacuuming script that helps the database run a bit faster by doing some analysis on itself. Out of all the scripts this one is probably the most important for performance of the MTT Reporter. Looking at the ServerMaintenance wiki page it seems to be in need of updating. Nothing major is missing, but there are some new cron scripts that should be added. I'll put that on my todo list. Cheers, Josh On May 6, 2008, at 11:29 AM, Pavel Shamis (Pasha) wrote: I'm not sure which cron jobs you're referring to. Do you mean these? https://svn.open-mpi.org/trac/mtt/browser/trunk/server/php/cron I talked about this one: https://svn.open-mpi.org/trac/mtt/wiki/ServerMaintenance The only thing there are the regular mtt-resu...@open-mpi.org email alerts and some out-of-date DB monitoring junk. You can ignore that stuff. Josh, are there some nightly (DB pruning/cleaning/vacuuming?) cron jobs that Pasha should be running? -Ethan Thanks. Ethan Mallove wrote: Hi Pasha, I thought this issue was solved in r1119 (see below). Do you have the latest mtt/server scripts? https://svn.open-mpi.org/trac/mtt/changeset/1119/trunk/server/php/submit -Ethan On Tue, May/06/2008 03:26:43PM, Pavel Shamis (Pasha) wrote: About the issue: 1. On client side I see ""*** WARNING: MTTDatabase client did not get a serial" As result of the error some of MTT results is not visible via the web reporter 2. On server side I found follow error message: [client 10.4.3.214] PHP Fatal error: Allowed memory size of 33554432 bytes exhausted (tried to allocate 23592960 bytes) in /.autodirect/swgwork/MTT/mtt/submit/index.php(79) : eval()'d code on line 77515 [Mon May 05 19:26:05 2008] [notice] caught SIGTERM, shutting down [Mon May 05 19:30:54 2008] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec) [Mon May 05 19:30:54 2008] [notice] Digest: generating secret for digest authentication ... [Mon May 05 19:30:54 2008] [notice] Digest: done [Mon May 05 19:30:54 2008] [notice] LDAP: Built with OpenLDAP LDAP SDK [Mon May 05 19:30:54 2008] [notice] LDAP: SSL support unavailable My memory limit in php.ini file was set on 256MB ! Any ideas ? Thanks. -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Pavel Shamis (Pasha) Mellanox Technologies
[MTT users] BLACS Support
Has anyone tried to use the BLACS tests in ompi-tests with MTT? IU is considering adding it to our testing matrix and wanted to hear of any experiences. Cheers, Josh
Re: [MTT users] Reporter problems
I'm seeing between 12 and 20 seconds on a fairly idle machine. We can likely do better. I'll dig into it this week[end] and see what I can do. 12 - 20 isn't too bad though considering the amount of data that query is returning. :) On Jan 30, 2008, at 2:52 PM, Ethan Mallove wrote: I don't remember a "past 24 hour" summary taking "24 seconds". Are we seeing a slow down due to an accumulation of results? I thought the week-long table partitions would prevent this type of effect? -Ethan On Wed, Jan/30/2008 11:00:46AM, Josh Hursey wrote: This maintenance is complete. The reporter should be operating as normal. There are a few other maintenance items, but I am pushing them to the weekend since it will result in a bit of a slowdown again. Thanks for your patience. Cheers, Josh On Jan 29, 2008, at 9:47 AM, Josh Hursey wrote: The reporter should be responding much better now. I tweaked the maintenance scripts so they no longer push nearly as hard on the database. They are still running, but the query you specified seems to run in approx. 15-20 sec. with the current load. -- Josh On Jan 29, 2008, at 8:38 AM, Josh Hursey wrote: For the next 24 - 48 hours this is to be expected. Sorry :( I started some maintenance work last night, and it is taking a bit longer than I expected (due to integrity constraint checking most likely). The maintenance scripts are pushing fairly hard on the database, so I would expect some slowdown with the reporter (and maybe client submits). If this becomes a substantial problem for anyone please let me know, and I may be able to shift this work to the weekend. In the mean time I'll see if I can reduce the load a bit. -- Josh On Jan 29, 2008, at 7:44 AM, Tim Prins wrote: Hi, Using the reporter this morning it is being awfully slow, as in it is taking about 3 minutes to do a top level summary search for: Date: past 24 hours Org: IU Platform name: IU_Odin I don't know whether this is a known problem or not. I seem to recall that after the last database upgrade such a search was taking just a few seconds. Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] [MTT devel] Test runs not getting into database
As a quick followup here. The problem seems to be with how the mtt-relay is reporting the server name to the submission site. I implemented a quick work around (allowing this particular host to connect), but we are efforting a real solution at the moment. I have filed a bug about this: http://svn.open-mpi.org/trac/mtt/ticket/305 -- Josh On Sep 6, 2007, at 10:04 AM, Josh Hursey wrote: Weird this looks like a mirror issue again. Below is some more debug output from MTT on BigRed: <> *** Reporter initializing Evaluating: MTTDatabase Initializing reporter module: MTTDatabase Evaluating: require MTT::Reporter::MTTDatabase Evaluating: $ret = ::Reporter::MTTDatabase::Init(@args) Evaluating: XXUsernameXX Evaluating: XXPasswordXX Evaluating: http://s10c2b3.dim:8008/ Evaluating: OMPI Evaluating: 1 Evaluating: IU_BigRed Set HTTP credentials for realm "OMPI" MTTDatabase getting a client serial number... MTTDatabase trying proxy: / Default (none) MTTDatabase got response: Sorry, this page is not mirrored. Please see the http://www.open-mpi.org/mtt/submit/index.php;>original version of this page on the main Open MPI web site. *** WARNING: MTTDatabase did not get a serial Making dir: /N/ptl01/mpiteam/bigred/20070906-CronTest-cron/pb_0/mttdatabase- submit (cwd: /N/ptl01/mpiteam/bigred/20070906-CronTest-cron/pb_0) <> In the INI file we have the following for the reporter so we can do the redirect through the head node (s10c2b3.dim): <> [Reporter: IU database] module = MTTDatabase mttdatabase_realm = OMPI mttdatabase_url = http://s10c2b3.dim:8008/ mttdatabase_username = XXUsernameXX mttdatabase_password = XXPasswordXX mttdatabase_platform = IU_BigRed mttdatabase_keep_debug_files = 1 <> It looks like IU is using the trunk version of the mtt-relay, and the branch version of the MTT client. The mtt-relay code is the same on both the trunk and the branch. The relay seems to be submitting to: https://www.open-mpi.org/mtt/submit/index.php Any thoughts on why this might be happening? It looks like the mirror check is messed up again. -- Josh On Sep 5, 2007, at 11:31 PM, Josh Hursey wrote: yeah I'll try to take a look at it tomorrow. I suspect that something is going wrong with the relay, but I can't really think of what it might be at the moment. -- Josh On Sep 5, 2007, at 9:11 PM, Jeff Squyres wrote: Josh / Ethan -- Not getting a serial means that the client is not getting a value back from the server that it can parse into a serial. Can you guys dig into this and see why the mtt dbdebug file that Tim has at the end of this message is not getting a serial? Thanks... On Sep 5, 2007, at 9:24 AM, Tim Prins wrote: Here is the smallest one. Let me know if you need anything else. Tim Jeff Squyres wrote: Can you send any one of those mtt database files? We'll need to figure out if this is a client or a server problem. :-( On Sep 5, 2007, at 7:40 AM, Tim Prins wrote: Hi, BigRed has not gotten its test results into the database for a while. This is running the ompi-core-testers branch. We run by passing the results through the mtt-relay. The mtt-output file has lines like: *** WARNING: MTTDatabase did not get a serial; phases will be isolated from each other in the reports Reported to MTTDatabase: 1 successful submit, 0 failed submits (total of 1 result) I have the database submit files if they would help. Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users $VAR1 = { 'exit_signal_1' => -1, 'duration_1' => '5 seconds', 'mpi_version' => '1.3a1r16038', 'trial' => 0, 'mpi_install_section_name_1' => 'bigred 32 bit gcc', 'client_serial' => undef, 'hostname' => 's1c2b12', 'result_stdout_1' => '/bin/rm -f *.o *~ PI* core IMB-IO IMB-EXT IMB-MPI1 exe_io exe_ext exe_mpi1 touch IMB_declare.h touch exe_mpi1 *.c; rm -rf exe_io exe_ext make MPI1 CPP=MPI1 make[1]: Entering directory `/N/ptl01/mpiteam/bigred/20070905- Wednesday/pb_0/installs/d7Ri/tests/imb/IMB_2.3/src\' mpicc -I. -DMPI1 -O -c IMB.c mpicc -I. -DMPI1 -O -c IMB_declare.c mpicc -I. -DMPI1 -O -c IMB_init.c mpicc -I. -DMPI1 -O -c IMB_mem_manager.c mpicc -I. -DMPI1 -O -c IMB_parse_name_mpi1.c mpicc -I. -DMPI1 -O -c IMB_benchlist.c mpicc -I. -DMPI1 -O -c IMB_strgs.c mpicc -I. -DMPI1 -O -c IMB_err_handler.c mpicc -I. -DMPI1 -O -c IMB_g_info.c mpicc -I. -DMPI1 -O -c IMB_warm_up.c mpicc -I. -DMPI1 -O -c IMB_output.c mpicc -I. -DMPI1 -O -c IMB_pingpong.c mpicc -I. -DMPI1 -O -c IMB_pingping.c mpicc -I. -DMPI1 -O -c IMB_allreduce.c mpicc -I. -DMPI1 -O -c IMB_reduce_scatter.c mpicc -I. -DMPI1 -O -c I
Re: [MTT users] Test runs not getting into database
yeah I'll try to take a look at it tomorrow. I suspect that something is going wrong with the relay, but I can't really think of what it might be at the moment. -- Josh On Sep 5, 2007, at 9:11 PM, Jeff Squyres wrote: Josh / Ethan -- Not getting a serial means that the client is not getting a value back from the server that it can parse into a serial. Can you guys dig into this and see why the mtt dbdebug file that Tim has at the end of this message is not getting a serial? Thanks... On Sep 5, 2007, at 9:24 AM, Tim Prins wrote: Here is the smallest one. Let me know if you need anything else. Tim Jeff Squyres wrote: Can you send any one of those mtt database files? We'll need to figure out if this is a client or a server problem. :-( On Sep 5, 2007, at 7:40 AM, Tim Prins wrote: Hi, BigRed has not gotten its test results into the database for a while. This is running the ompi-core-testers branch. We run by passing the results through the mtt-relay. The mtt-output file has lines like: *** WARNING: MTTDatabase did not get a serial; phases will be isolated from each other in the reports Reported to MTTDatabase: 1 successful submit, 0 failed submits (total of 1 result) I have the database submit files if they would help. Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users $VAR1 = { 'exit_signal_1' => -1, 'duration_1' => '5 seconds', 'mpi_version' => '1.3a1r16038', 'trial' => 0, 'mpi_install_section_name_1' => 'bigred 32 bit gcc', 'client_serial' => undef, 'hostname' => 's1c2b12', 'result_stdout_1' => '/bin/rm -f *.o *~ PI* core IMB-IO IMB-EXT IMB-MPI1 exe_io exe_ext exe_mpi1 touch IMB_declare.h touch exe_mpi1 *.c; rm -rf exe_io exe_ext make MPI1 CPP=MPI1 make[1]: Entering directory `/N/ptl01/mpiteam/bigred/20070905- Wednesday/pb_0/installs/d7Ri/tests/imb/IMB_2.3/src\' mpicc -I. -DMPI1 -O -c IMB.c mpicc -I. -DMPI1 -O -c IMB_declare.c mpicc -I. -DMPI1 -O -c IMB_init.c mpicc -I. -DMPI1 -O -c IMB_mem_manager.c mpicc -I. -DMPI1 -O -c IMB_parse_name_mpi1.c mpicc -I. -DMPI1 -O -c IMB_benchlist.c mpicc -I. -DMPI1 -O -c IMB_strgs.c mpicc -I. -DMPI1 -O -c IMB_err_handler.c mpicc -I. -DMPI1 -O -c IMB_g_info.c mpicc -I. -DMPI1 -O -c IMB_warm_up.c mpicc -I. -DMPI1 -O -c IMB_output.c mpicc -I. -DMPI1 -O -c IMB_pingpong.c mpicc -I. -DMPI1 -O -c IMB_pingping.c mpicc -I. -DMPI1 -O -c IMB_allreduce.c mpicc -I. -DMPI1 -O -c IMB_reduce_scatter.c mpicc -I. -DMPI1 -O -c IMB_reduce.c mpicc -I. -DMPI1 -O -c IMB_exchange.c mpicc -I. -DMPI1 -O -c IMB_bcast.c mpicc -I. -DMPI1 -O -c IMB_barrier.c mpicc -I. -DMPI1 -O -c IMB_allgather.c mpicc -I. -DMPI1 -O -c IMB_allgatherv.c mpicc -I. -DMPI1 -O -c IMB_alltoall.c mpicc -I. -DMPI1 -O -c IMB_sendrecv.c mpicc -I. -DMPI1 -O -c IMB_init_transfer.c mpicc -I. -DMPI1 -O -c IMB_chk_diff.c mpicc -I. -DMPI1 -O -c IMB_cpu_exploit.c mpicc -o IMB-MPI1 IMB.o IMB_declare.o IMB_init.o IMB_mem_manager.o IMB_parse_name_mpi1.o IMB_benchlist.o IMB_strgs.o IMB_err_handler.o IMB_g_info.o IMB_warm_up.o IMB_output.o IMB_pingpong.o IMB_pingping.o IMB_allreduce.o IMB_reduce_scatter.o IMB_reduce.o IMB_exchange.o IMB_bcast.o IMB_barrier.o IMB_allgather.o IMB_allgatherv.o IMB_alltoall.o IMB_sendrecv.o IMB_init_transfer.o IMB_chk_diff.o IMB_cpu_exploit.o make[1]: Leaving directory `/N/ptl01/mpiteam/bigred/20070905- Wednesday/pb_0/installs/d7Ri/tests/imb/IMB_2.3/src\' ', 'mpi_name' => 'ompi-nightly-trunk', 'number_of_results' => '1', 'phase' => 'Test Build', 'compiler_version_1' => '3.3.3', 'exit_value_1' => 0, 'result_message_1' => 'Success', 'start_timestamp_1' => 'Wed Sep 5 04:16:52 2007', 'compiler_name_1' => 'gnu', 'suite_name_1' => 'imb', 'test_result_1' => 1, 'mtt_client_version' => '2.1devel', 'fields' => 'compiler_name,compiler_version,duration,exit_signal,exit_value,mpi_g e t_section_name,mpi_install_id,mpi_install_section_name,mpi_name,mpi_v e rsion,phase,result_message,result_stdout,start_timestamp,suite_name,t e st_result', 'mpi_install_id' => undef, 'platform_name' => 'IU_BigRed', 'local_username' => 'mpiteam', 'mpi_get_section_name_1' => 'ompi-nightly-trunk' }; ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] Database submit error
Short Version: -- I just finished the fix, and the submit script is back up and running. This was a bug that arose in testing, but somehow did not get propagated to the production database. Long Version: - The new databases uses partition tables to archive test results. As part of this there are some complex rules to mask the partition table complexity from the users of the db. There was a bug in the insert rule in which the 'id' of the submitted result (mpi_install, test_build, and test_run) was a different value than expected since the 'id' was not translated properly to the partition table setup. The fix was to drop all rules and replace them with the correct versions. The submit errors you saw below were caused by integrity checks in the submit script that keep data from being submitted that do not have a proper lineage (e.g., you cannot submit a test_run without having submitted a test_build and an mpi_install result). The bug caused the client and the server to become confused on what the proper 'id' should be and when the submit script attempted to 'guess' the correct run it was unsuccessful and errored out. So sorry this bug lived this long, but it should be fixed now. -- Josh On Aug 28, 2007, at 10:16 AM, Jeff Squyres wrote: Josh found the problem and is in the process of fixing it. DB submits are currently disabled while Josh is working on the fix. More specific details coming soon. Unfortunately, it looks like all data from last night will be junk. :-( You might as well kill any MTT scripts that are still running from last night. On Aug 28, 2007, at 9:14 AM, Jeff Squyres wrote: Josh and I are investigating -- the total runs in the db in the summary report from this morning is far too low. :-( On Aug 28, 2007, at 9:13 AM, Tim Prins wrote: It installed and the tests built and made it into the database: http://www.open-mpi.org/mtt/reporter.php?do_redir=293 Tim Jeff Squyres wrote: Did you get a correct MPI install section for mpich2? On Aug 28, 2007, at 9:05 AM, Tim Prins wrote: Hi all, I am working with the jms branch, and when trying to use mpich2, I get the following submit error: *** WARNING: MTTDatabase server notice: mpi_install_section_name is not in mtt database. MTTDatabase server notice: number_of_results is not in mtt database. MTTDatabase server notice: phase is not in mtt database. MTTDatabase server notice: test_type is not in mtt database. MTTDatabase server notice: test_build_section_name is not in mtt database. MTTDatabase server notice: variant is not in mtt database. MTTDatabase server notice: command is not in mtt database. MTTDatabase server notice: fields is not in mtt database. MTTDatabase server notice: resource_manager is not in mtt database. MTT submission for test run MTTDatabase server notice: Invalid test_build_id (47368) given. Guessing that it should be -1 MTTDatabase server error: ERROR: Unable to find a test_build to associate with this test_run. MTTDatabase abort: (Tried to send HTTP error) 400 MTTDatabase abort: No test_build associated with this test_run *** WARNING: MTTDatabase did not get a serial; phases will be isolated from each other in the reports Reported to MTTDatabase: 1 successful submit, 0 failed submits (total of 12 results) This happens for each test run section. Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] trouble with new reporter
I don't think so. But it is possible that they got perturbed a bit with the upgrade i guess. :/ On Aug 27, 2007, at 4:31 PM, Jeff Squyres wrote: Josh -- did our cookies change? On Aug 27, 2007, at 4:29 PM, Tim Prins wrote: Hmm... I just tried this at home and it works. Maybe I need to get rid of old cookies? Tim On Monday 27 August 2007 02:30:17 pm Jeff Squyres wrote: Is this an effect of "preferfnces" cookies not propagating properly? On Aug 27, 2007, at 2:26 PM, Josh Hursey wrote: Weird. I just tried this and it worked fine for me. Showing 25 skampi runs for IU all trials. Can you try it again? -- Josh On Aug 27, 2007, at 2:11 PM, Tim Prins wrote: All, First, I have to say the new faster reporter is very nice. However, I am running into some difficulty with trial runs. Here is what I did: 1. went to www.open-mpi.org/mtt/reporter.php 2. Clicked preferences, toggled show trial runs 3. typed 'IU' into org 4. Press summary So far so good, I see the performance results I expect. But then if I click on the performance results, I get 'no data available for the specified query" Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- Jeff Squyres Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] trouble with new reporter
Weird. I just tried this and it worked fine for me. Showing 25 skampi runs for IU all trials. Can you try it again? -- Josh On Aug 27, 2007, at 2:11 PM, Tim Prins wrote: All, First, I have to say the new faster reporter is very nice. However, I am running into some difficulty with trial runs. Here is what I did: 1. went to www.open-mpi.org/mtt/reporter.php 2. Clicked preferences, toggled show trial runs 3. typed 'IU' into org 4. Press summary So far so good, I see the performance results I expect. But then if I click on the performance results, I get 'no data available for the specified query" Thanks, Tim ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] MTT database performance boost
That's awesome. Good work :) -- Josh On Mar 1, 2007, at 11:59 AM, Ethan Mallove wrote: Folks, If some of you hadn't already noticed, reports (see http://www.open-mpi.org/mtt/) on Test Runs have been taking an upwards of 5-7 minutes to load as of late. This was due in part to some database design issues (compounded by the fact that we now have nearly 3 million test results archived, dating back to November). To mitigate the performance issues, there is now a sliding window n-day "speedy" database that will be used automatically for recent reports. (Currently n=7, but there is only 2 days worth of "speedy" data as of this email). Reports which date back earlier than the sliding window will take some time as they will be coming from the slower "archive" database. Cheers, Ethan ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] [devel-core] MTT 2.0 tutorial teleconference
I'll be there as well On Jan 4, 2007, at 3:44 PM, Tim Mattox wrote: I'll be there for the call on Tuesday. We are looking forward to switching IU to MTT 2.0 The new report/results pages are great! -- Tim Mattox - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/ ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [MTT users] Corrupted MTT database or incorrucet query
On Nov 13, 2006, at 10:27 AM, Ethan Mallove wrote: I can infer that you have an MPI Install section labeled "odin 64 bit gcc". A few questions: * What is the mpi_get for that section (or how does that parameter get filled in by your automated scripts)? I attached the generated INI file for you to look at. nightly-trunk-64-gcc.ini-gen Description: Binary data It is the same value for all parallel runs of GCC+64bit (same value for all branches) * Do you start with a fresh scratch tree every run? Yep. Every run, and all of the parallel runs. * Could you email me your scratch/installs/mpi_installs.xml files? The attached mpi_installs.xml is from the trunk+gcc+64bit parallel scratch directory. I checked on how widespread this issue is, and found that 18,700 out of 474,000 Test Run rows in the past month have a mpi_version/command (v1.2-trunk) mismatch. Occuring in both directions (version=1.2, command=trunk and vice versa). They occur on these clusters: Cisco MPI development cluster IU Odin IU - Thor - TESTING Interesting... There *is* that race condition in which one mtt submitting could overwrite another's index. Do you have "trunk" and "1.2" runs submitting to the database at the same time? Yes we do. :( The parallel blocks as we call them are separate scratch directories in which MTT is running concurrently. Meaning that we have N parallel block scratch directories each running one instance of MTT. So it is possible (and highly likely) that when the reporter phase fires all of the N parallel blocks are firing it about the same time. Without knowing how the reporter is doing the inserts into the database I don't think I can help much more than that on debugging. When the reporter fires for the DB: - Does it start a transaction for the connection, do the inserts, then commit? - Does it ship the inserts to the server then allow it to run them, or does the client do all of the individual inserts? -- Josh On Sun, Nov/12/2006 06:04:17PM, Jeff Squyres (jsquyres) wrote: I feel somewhat better now. Ethan - can you fix? -Original Message- From: Tim Mattox [[1]mailto:timat...@open-mpi.org] Sent: Sunday, November 12, 2006 05:34 PM Eastern Standard Time To: General user list for the MPI Testing Tool Subject:[MTT users] Corrupted MTT database or incorrucet query Hello, I just noticed that the MTT summary page is presenting incorrect information for our recent runs at IU. It is showing failures for the 1.2b1 that actaully came from the trunk! See the first entry in this table: http://www.open-mpi.org/mtt/reporter.php? _start_test_timestamp=200 6-11-12%2019:12:02%20through%202006-11-12% 2022:12:02_platform_id=co ntains_platform_id=IU_phase=runs_success=fail_atom=*by_ t est_case=Table_agg_timestamp=- _mpi_name=All_mpi_version =All_os_name=All_os_version=All_platform_hardware=All _ platform_id=All_platform_id=off&1- page=off_bookmarks_bookmar ks Click on the [i] in the upper right (the first entry) to get the popup window which shows the MPIRrun cmd as: mpirun -mca btl tcp,sm,self -np 6 --prefix /san/homedirs/mpiteam/mtt-runs/odin/20061112-Testing-NOCLN/ parallel-bl ock-3/installs/ompi-nightly-trunk/odin_64_bit_gcc/1.3a1r12559/ install dynamic/spawn Note the path has "1.3a1r12559" in the name... it's a run from the trunk, yet the table showed this as a 1.2b1 run. There are several of these missattributed errors. This would explain why Jeff saw some ddt errors on the 1.2 brach yesterday, but was unable to reproduce them. They were from the trunk! -- Tim Mattox - [2]http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... [3]http://www.the-brights.net/ ___ mtt-users mailing list mtt-us...@open-mpi.org [4]http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users References 1. mailto:timat...@open-mpi.org 2. http://homepage.mac.com/tmattox/ 3. http://www.the-brights.net/ 4. http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users -- -Ethan ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [MTT users] Fwd: [mtt-results] Nightly MPI Install Failures
IU/Thor Short Story: - The IU/thor tests are borked because of the scheduler. Ignore these results for now. IU/Thor Longer Story: - SLURM is setup to kill any job that's 'idle' for more than N min, where N is kinda small. We are compiling, but SLURM is not looking at the compile but the MTT script which is pretty much doing nothing until the compile complete. Thus SLURM thinks that MTT is 'idle' and kills the allocation :( We fixed this on Odin, but our sysadmin needs to make the change to Thor. It is one line in a config file, but getting him to do much is like pulling teeth with telepathy somedays :/ Ignore thor for now. It should be running alonside Odin in the next day or two. -- Josh On Nov 3, 2006, at 11:23 AM, Jeff Squyres wrote: I see some failures from LANL and IU/thor that *look* like the tests were aborted before they completed (e.g., "rm -rf" of the scratch dir while MTT was running). Can someone from both organizations confirm that these are bogus results? Begin forwarded message: From: mtt-resu...@osl.iu.edu Date: November 3, 2006 9:00:12 AM EST To: mtt-resu...@open-mpi.org Subject: [mtt-results] Nightly MPI Install Failures Reply-To: MPI Test Tool result submissionsQuery Description Current Time (GMT)2006-11-03 14:00:11 Date Range (GMT)2006-11-02 14:00:11 through 2006-11-03 14:00:11 successfail CountBy test case * Summary of MPI Installs that failed HardwareOsOs verMpiMpi revClusterCompilerCompiler verMPI Install[i] PassFail sun4uSunOSSunOS 5.10Open MPI trunk1.3a1r12408Sun 32-bitsun5.701 Details Config args: --enable-shared --enable-mpi-f90 --with-mpi-f90-size=trivial CC=cc CXX=CC FC=f90 F77=f77 CFLAGS=-xarch=v8plusa -xO5 -xmemalign=8s CXXFLAGS=-xarch=v8plusa -xO5 -xmemalign=8s FFLAGS=-xarch=v8plusa -xO5 -xmemalign=8s FCFLAGS=-xarch=v8plusa -xO5 -xmemalign=8s -KPIC Stdout: libtool: compile: cc -DHAVE_CONFIG_H -I. -I. -I../../../opal/ include -I../../../orte/include -I../../../ompi/include -I../../../ ompi/include -I../../../opal/libltdl -I../../.. -DNDEBUG - xarch=v8plusa -xO5 -xmemalign=8s -mt -c base/io_base_close.c -KPIC - DPIC -o base/.libs/io_base_close.o libtool: compile: cc -DHAVE_CONFIG_H -I. -I. -I../../../opal/ include -I../../../orte/include -I../../../ompi/include -I../../../ ompi/include -I../../../opal/libltdl -I../../.. -DNDEBUG - xarch=v8plusa -xO5 -xmemalign=8s -mt -c base/ io_base_component_list.c -KPIC -DPIC -o base/.libs/ io_base_component_list.o libtool: compile: cc -DHAVE_CONFIG_H -I. -I. -I../../../opal/ include -I../../../orte/include -I../../../ompi/include -I../../../ ompi/include -I../../../opal/libltdl -I../../.. -DNDEBUG - xarch=v8plusa -xO5 -xmemalign=8s -mt -c base/io_base_delete.c -KPIC -DPIC -o base/.libs/io_base_delete.o source=base/io_base_find_available.c object=base/ io_base_find_available.lo libtool=yes \ DEPDIR=.deps depmode=none /bin/bash ../../../config/depcomp \ /bin/bash ../../../libtool --tag=CC --mode=compile cc - DHAVE_CONFIG_H -I. -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../ompi/include -I../../../ opal/libltdl -I../../.. -DNDEBUG -xarch=v8plusa -xO5 -xmemalign=8s - mt -c -o base/io_base_find_available.lo base/io_base_find_available.c source=base/io_base_open.c object=base/io_base_open.lo libtool=yes \ DEPDIR=.deps depmode=none /bin/bash ../../../config/depcomp \ /bin/bash ../../../libtool --tag=CC --mode=compile cc - DHAVE_CONFIG_H -I. -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../ompi/include -I../../../ opal/libltdl -I../../.. -DNDEBUG -xarch=v8plusa -xO5 -xmemalign=8s - mt -c -o base/io_base_open.lo base/io_base_open.c source=base/io_base_request.c object=base/io_base_request.lo libtool=yes \ DEPDIR=.deps depmode=none /bin/bash ../../../config/depcomp \ /bin/bash ../../../libtool --tag=CC --mode=compile cc - DHAVE_CONFIG_H -I. -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../ompi/include -I../../../ opal/libltdl -I../../.. -DNDEBUG -xarch=v8plusa -xO5 -xmemalign=8s - mt -c -o base/io_base_request.lo base/io_base_request.c source=base/io_base_register_datarep.c object=base/ io_base_register_datarep.lo libtool=yes \ DEPDIR=.deps depmode=none /bin/bash ../../../config/depcomp \ /bin/bash ../../../libtool --tag=CC --mode=compile cc - DHAVE_CONFIG_H -I. -I. -I../../../opal/include -I../../../orte/ include -I../../../ompi/include -I../../../ompi/include -I../../../ opal/libltdl -I../../.. -DNDEBUG -xarch=v8plusa -xO5 -xmemalign=8s - mt -c -o base/io_base_register_datarep.lo base/ io_base_register_datarep.c libtool: compile: cc -DHAVE_CONFIG_H -I. -I. -I../../../opal/ include -I../../../orte/include -I../../../ompi/include -I../../../ ompi/include -I../../../opal/libltdl -I../../.. -DNDEBUG - xarch=v8plusa -xO5 -xmemalign=8s -mt -c base/io_base_request.c - KPIC -DPIC -o base/.libs/io_base_request.o libtool: compile:
Re: [MTT users] Discussion on teleconf yesterday?
The discussion started with the bug characteristics of v1.2 versus the trunk. It seemed from the call that IU was the only institution that can asses this via MTT as noone else spoke up. Since people were interested in seeing things that were breaking I suggested that I start forwarding the IU internal MTT reports (run nightly and weekly) to the test...@open-mpi.org. This was meet by Brain insisting that it would result in "thousands" of emails to the development list. I clarified that it is only 3 - 4 messages a day from IU. However if all other institutions do this then it would be a bunch of email (where 'a bunch' would still be less than 'thousands'). That's how we got to a 'we need a single summary presented to the group' comment. It should be noted that we brought up IU sending to the 'testing@open- mpi.org' list as a bandaid until MTT could do it better. This single summary can be email or a webpage that people can check. Rich said that he would prefer a webpage, and noone else really had a comment. That got us talking about the current summary page that MTT generates. Tim M mentioned that the current website is difficult to figure out how to get the answers you need. I agree, it is hard [usability] for someone to go to the summary page and answer the question "So what failed from IU last night, and how does that differ from Yesterday -- e.g., what regressed and progressed yesterday at IU?". The website is flexible enough to due it, but having a couple of basic summary pages would be nice for basic users. What that should look like we can discuss further. The IU group really likes the emails that we currently generate. A plain-text summary of the previous run. I posted copies on the MTT bug tracker here: http://svn.open-mpi.org/trac/mtt/ticket/61 Currently we have not put the work in to aggregate the runs, so for each ini file that we run we get 1 email to the IU group. This is fine for the moment, but as we add the rest of the clusters and dimensions in the testing matrix we will need MTT to aggregate the results for us and generate such an email. So I think the general feel of the discussion is that we need the following from MTT: - A 'basic' summary page providing answers to some general frequently asked queries. The current interface is too advanced for the current users. - A summary email [in plain-text preferably] similar to the one that IU generated showing an aggregation of the previous nights results for (a) all reporters (b) my institution [so I can track them down and file bugs]. - 1 email a day on the previous nights testing results. Some relevant bugs currently in existence: http://svn.open-mpi.org/trac/mtt/ticket/92 http://svn.open-mpi.org/trac/mtt/ticket/61 http://svn.open-mpi.org/trac/mtt/ticket/94 The other concern is that given the frequency of testing as bugs appear from the testing someone needs to make sure the bug tracker is updated. I think the group is unclear about how this is done. Meaning when a MTT identifies a test as failed whom is responsible for putting the bug in the bug tracker? The obvious solution is the institution that identified the bug. [Warning: My opinion] But then that becomes unwieldy for IU since we have a large testing matrix, and would need to commit someone to doing this everyday (and it may take all day to properly track a set of bugs). Also this kind of punishes an institution for testing more instead of providing incentive to test. -- Page Break -- Context switch -- In case you all want to know what we are doing here at IU. I attached to this email our planed MTT testing matrix. Currently we have BigRed and Odin running the complete matrix less the BLACS tests. Wotan and Thor will come online as we get more resources to support them. In order to do such a complex testing matrix we have various .ini files that we use. And since some of the dimensions in the matrix are large we break some of the tests into a couple .ini files that are submitted concurrently to have them run in a reasonable time. | BigRed | Odin | Thor | Wotan -+--+--++-- Sun |N |N | IMB | BLACS Mon |N BLACS |N |N |N Tues |N |N IMB*|N |N Wed |N IMB*|N |N |N Thur |N |N BLACS |N |N Fri |N |N |N |N Sat |N Intel* |N Intel* | BLACS | IMB N = Nightly run * = Large runs All runs start at 2 am on the day listed. = BigRed = Nightly --- - Branches: trunk, v1.2 - Configurations: All 64 and 32 bit builds * MX, LoadLeveler, No debug, gcc 3.x - Test Suites * Trivial * IBM suite - Processing Elements/tasks/cores/... * # < 8 hours * 7 nodes/28 tasks [to start with] - Runtime Parameters * PML ob1/BTL mx,sm,self * PML cm /MTL mx Weekly: Monday 2am
Re: [MTT users] Ibm test suite build "failing"
Yea I believe so. From what I can tell it looks pretty much the same IIRC. -- Josh On Oct 2, 2006, at 7:35 PM, Jeff Squyres wrote: Josh -- Is your "failed" IBM build looking like this? http://www.open-mpi.org/~emallove/svn/mtt/trunk/server/php/ reporter.php? _start_test_timestamp=2006-10-01%2000:00:00%20through%202006-10-02% 2021:27:2 7_agg_timestamp=- _phase=builds_success=Fail_platform_hardwar e=All_os_name=All_os_version=All_mpi_name=All_mpi_name =off ef_mpi_version=All_mpi_version=off_platform_id=All_platfor m_id=o n_atom=*by_test_case=Table_test_build_section_name=off_t est_bu ild_section_name=All_title=Details%20of%20Test%20Builds%20that %20faile d&1-page=off_bookmarks_bookmarks (hopefully that'll wrap ok and you can click on it...) It's a development web page, so don't bookmark it, but it looks like Sun is having the same problems you are (trunk). The build outwardly succeeds -- no error messages are shown -- but MTT is reporting that it fails (click on the "[I]" to see the stdout/stderr). -- Jeff Squyres Server Virtualization Business Unit Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [MTT users] Running the v1.1 nightly
So I dug into this about as much as I can, and found that the v1.1 build seems to have completed successfully, but doesn't contain the 'success = 1' line. I attached the build dumps in case that helps figure this one out. ibm-build.tar.bz2 Description: Binary data Any thoughts on why this might happen, Josh On Sep 29, 2006, at 3:40 PM, Josh Hursey wrote: Has anyone been using MTT to test the v1.1 nightly? I have been trying to run the [trivial,ibm] tests against [trunk,v1.2,v1.1]. MTT will build all the sources and all the tests with all the sources. It will then run the trivial tests against all three sources, but only the ibm tests against the trunk and v1.2. I looked at the logs produced and there didn't seem to be any errors with the ibm+v1.1 test build. Would there be any other reason this would happen? Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
[MTT users] Running the v1.1 nightly
Has anyone been using MTT to test the v1.1 nightly? I have been trying to run the [trivial,ibm] tests against [trunk,v1.2,v1.1]. MTT will build all the sources and all the tests with all the sources. It will then run the trivial tests against all three sources, but only the ibm tests against the trunk and v1.2. I looked at the logs produced and there didn't seem to be any errors with the ibm+v1.1 test build. Would there be any other reason this would happen? Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [MTT users] Post run result submission
So the login node is the only one that has a window to the outside world. I can't access the outside world from within an allocation. So my script does: - Login Node: 1) Get MPI Tarballs - 1 Compute node: 0) Allocate a compute node to compile. 1) Build/Install MPI builds 2) Deallocate compute node - Login Node: 1) Get MPI Test sources - N Compute Nodes: 0) Allocate N compute Nodes to run the tests on 1) Build/Install Tests 2) Run the tests... - Login Node: 0) Check to make sure we are all done (scheduler didn't kill the job, etc.). 1) Report results to MTT * * This is what I am missing currently. I currently have the "Reporter: IU Database" section commented out so that once the tests finish they don't try to post the database, since they can't see the outside world. On Sep 26, 2006, at 3:17 PM, Ethan Mallove wrote: On Tue, Sep/26/2006 02:01:41PM, Josh Hursey wrote: I'm setting up MTT on BigRed at IU, and due to some visibility requirements of the compute nodes I segment the MTT operations. Currently I have a perl script that does all the svn and wget interactions from the login node, then compiles and runs on the compute nodes. This all seems to work fine. Now I am wondering how to get the textfile results that were generated back to the MTT database once the run has finished. If you run the "MPI Install", "Test build", and "Test run" sections from the same machine (call it the "Install-Build-Run" node), I would think you could then additionaly run the "Reporter: IU Database" section. Or can you not do the HTTP POST from Install-Build-Run node? -Ethan I know HLRS deals with this situation, is there a supported way of doing this yet or is it a future work item still? Currently I have a method to send a summary email to our team after the results are generated, so this isn't a show stopper for IU or anything, just something so we can share our results with the rest of the team. Cheers, Josh ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] How goes the MTT?
Things are working well at IU. We have nightly and weekly runs going smoothly on our Odin cluster. Have not started using it on BigRed (our LoadLeveler scheduled environment), but hope to do that soonish. Cheers, Josh On Sep 19, 2006, at 10:18 AM, Ethan Mallove wrote: Folks, Just checking in with everyone on how the new client is working out for you. Any feature requests and/or bugs for the client and/or reports? I see submissions from HLRS and IU, so I am assuming things are working (to some degree). Do keep us apprised of any issues, observations, complaints, etc. Thanks! -Ethan ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users
Re: [MTT users] New stuff
After iterating a bit with Jeff. It seems that the error indicates that I had a malformed ini file. I accidently left a bit of the old script in there when I updated. :[ After removing that and doing a sanity check for other bits things are working once again. Thanks :) Josh On Sep 14, 2006, at 5:36 PM, Josh Hursey wrote: Here you go: [mpiteam@odin ~/mtt]$ ./client/mtt --mpi-get --mpi-install --scratch / u/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 --file /u/mpiteam/ local/etc/ompi-iu-odin-core.ini --verbose --print-time --debug | tee ~/mtt.out Debug is 1, Verbose is 1 *** MTT: ./client/mtt --mpi-get --mpi-install --scratch /u/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 --file /u/mpiteam/local/etc/ompi-iu-odin-core.ini --verbose --print-time --debug Scratch: /u/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 Scratch resolved: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 Making dir: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18/ sources (cwd: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18) Making dir: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18/ installs (cwd: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18) Reading ini file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini *** WARNING: Could not read INI file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini; skipping [mpiteam@odin ~/mtt]$ cat ~/mtt.out Debug is 1, Verbose is 1 *** MTT: ./client/mtt --mpi-get --mpi-install --scratch /u/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 --file /u/mpiteam/local/etc/ompi-iu-odin-core.ini --verbose --print-time --debug Scratch: /u/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 Scratch resolved: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18 Making dir: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18/ sources (cwd: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18) Making dir: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18/ installs (cwd: /san/homedirs/mpiteam/mtt-runs/Testing-09-14-2006-17-14-18) Reading ini file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini *** WARNING: Could not read INI file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini; skipping [mpiteam@odin ~/mtt]$ ls -l ~/local/etc/ompi-iu-odin-core.ini -rw-r- 1 mpiteam projects 13741 Sep 14 17:01 /u/mpiteam/local/ etc/ompi-iu-odin-core.ini On Sep 14, 2006, at 5:33 PM, Ethan Mallove wrote: On Thu, Sep/14/2006 05:20:23PM, Josh Hursey wrote: Maybe I jumped the gun a bit, but I just updated and tried to run mtt and get the following error message when I run: Reading ini file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini *** WARNING: Could not read INI file: /u/mpiteam/local/etc/ompi-iu-odin-core.ini; skipping The file exists and was working previously. Any thoughts on why this might happen? Never seen this one. I think I need more details. Could you do: $ client/mtt -f file.ini | tee mtt.out $ cat mtt.out $ ls -l file.ini I assume the mtt.out is very short if it's dying while trying to read the ini. Thanks! -Ethan Cheers, Josh On Sep 14, 2006, at 2:53 PM, Jeff Squyres wrote: Howdy MTT users! We have a bunch of important updates for you, including some that *REQURE* action tomorrow morning (15 Sep 2006: update your client and update your INI files). Please go read the full text of the announcement here: http://svn.open-mpi.org/trac/mtt/wiki/News-14-Sep-2006 As usual, please let us know if you have any questions, comments, feedback, etc. Thanks! -- Jeff Squyres Server Virtualization Business Unit Cisco Systems ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/ ___ mtt-users mailing list mtt-us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-users Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
Re: [MTT users] Tests timing out
This fixes the hanging and gets me running (and passing) some/most of the tests [Trivial and ibm]. Yay! I have a 16 processor job running on Odin at the moment that seems to be going well so far. Thanks for your help. Want me to file a bug about the tcsh problem below? -- Josh On Aug 30, 2006, at 2:30 PM, Jeff Squyres wrote: Bah! This is the result of perl expanding $? To 0 -- it seems that I need to escape $? So that it's not output as 0. Sorry about that! So is this just for the sourcing files, or for your overall (hanging) problems? On 8/30/06 2:28 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: So here are the results of my exploration. I have things running now. The problem was that the user that I am running under does not set the LD_LIBRARY_PATH variable at any point. So when MTT tries to export the variable it does: if (0LD_LIBRARY_PATH == 0) then setenv LD_LIBRARY_PATH /san//install/lib else setenv LD_LIBRARY_PATH /san//install/lib: $LD_LIBRARY_PATH endif So this causes tcsh to emit the error the LD_LIBRARY_PATH is not defined. So it is not set due to the error. I fixed this by always declaring it in the .cshrc file to "". However MTT could do a sanity check before trying to check the value to see if it is defined. Something like: if ($?LD_LIBRARY_PATH) then else setenv LD_LIBRARY_PATH "" endif if (0LD_LIBRARY_PATH == 0) then setenv LD_LIBRARY_PATH /san//install/lib else setenv LD_LIBRARY_PATH /san//install/lib: $LD_LIBRARY_PATH endif or something of the sort. As another note, could we start a "How to debug MTT" Wiki page with some of the information that Jeff sent in this message regarding the dumping of env vars? I think that would be helpful when getting things started. Thanks for all your help, I'm sure I'll have more questions in the near future. Cheers, Josh On Aug 30, 2006, at 12:31 PM, Jeff Squyres wrote: On 8/30/06 12:10 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: MTT directly sets environment variables in its own environment (via $ENV{whatever} = "foo") before using fork/exec to launch compiles and runs. Hence, the forked children inherit the environment variables that we set (E.g., PATH and LD_LIBRARY_PATH). So if you source the env vars files that MTT drops, that should be sufficient. Does it drop them to a file, or is it printed in the debugging output anywhere? I'm having a bit of trouble finding these strings in the output. It does not put these in the -debug output. The files that it drops are in the scratch dir. You'll need to go into scratch/installs, and then it depends on what your INI file section names are. You'll go to: /installs And there should be files named "mpi_installed_vars.[csh|sh]" that you can source, depending on your shell. IT should set PATH and LD_LIBRARY_PATH. The intent of these files is for exactly this purpose -- for a human to test borked MPI installs inside the MTT scratch tree. As for setting the values on *remote* nodes, we do it solely via the --prefix option. I wonder if --prefix is broken under SLURM...? That might be something to check -- youmight be inadvertantly mixing installations of OMPI...? Yep I'll check it out. Cheers, Josh On 8/30/06 10:36 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: I'm trying to replicate the MTT environment as much as possible, and have a couple of questions. Assume there is no mpirun in my PATH/LD_LIBRARY_PATH when I start MTT. After MTT builds Open MPI, how does it export these variables so that it can build the tests? How does it export these when it runs those tests (solely via --prefix)? Cheers, josh On Aug 30, 2006, at 10:25 AM, Josh Hursey wrote: I already tried that. However I'm trying it in a couple different ways and getting some mixed results. Let me formulate the error cases and get back to you. Cheers, Josh On Aug 30, 2006, at 10:17 AM, Ralph H Castain wrote: Well, why don't you try first separating this from MTT? Just run the command manually in batch mode and see if it works. If that works, then the problem is with MTT. Otherwise, we have a problem with notification. Or are you saying that you have already done this? Ralph On 8/30/06 8:03 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: yet another point (sorry for the spam). This may not be an MTT issue but a broken ORTE on the trunk :( When I try to run in a allocation (srun -N 16 -A) things run fine. But if I try to run in batch mode (srun -N 16 -b myscript.sh) then I see the same hang as in MTT. seems that mpirun is not getting properly notified of the completion of the job. :( I'll try to investigate a bit further today. Any thoughts on what might be causing this? Cheers, Josh On Aug 30, 2006, at 9:54 AM, Josh Hursey wrote: forgot this bit in my mail. With the mpirun just hanging out there I at
Re: [MTT users] Tests timing out
On Aug 30, 2006, at 11:36 AM, Jeff Squyres wrote: (sorry -- been afk much of this morning) MTT directly sets environment variables in its own environment (via $ENV{whatever} = "foo") before using fork/exec to launch compiles and runs. Hence, the forked children inherit the environment variables that we set (E.g., PATH and LD_LIBRARY_PATH). So if you source the env vars files that MTT drops, that should be sufficient. Does it drop them to a file, or is it printed in the debugging output anywhere? I'm having a bit of trouble finding these strings in the output. As for setting the values on *remote* nodes, we do it solely via the --prefix option. I wonder if --prefix is broken under SLURM...? That might be something to check -- youmight be inadvertantly mixing installations of OMPI...? Yep I'll check it out. Cheers, Josh On 8/30/06 10:36 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: I'm trying to replicate the MTT environment as much as possible, and have a couple of questions. Assume there is no mpirun in my PATH/LD_LIBRARY_PATH when I start MTT. After MTT builds Open MPI, how does it export these variables so that it can build the tests? How does it export these when it runs those tests (solely via --prefix)? Cheers, josh On Aug 30, 2006, at 10:25 AM, Josh Hursey wrote: I already tried that. However I'm trying it in a couple different ways and getting some mixed results. Let me formulate the error cases and get back to you. Cheers, Josh On Aug 30, 2006, at 10:17 AM, Ralph H Castain wrote: Well, why don't you try first separating this from MTT? Just run the command manually in batch mode and see if it works. If that works, then the problem is with MTT. Otherwise, we have a problem with notification. Or are you saying that you have already done this? Ralph On 8/30/06 8:03 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: yet another point (sorry for the spam). This may not be an MTT issue but a broken ORTE on the trunk :( When I try to run in a allocation (srun -N 16 -A) things run fine. But if I try to run in batch mode (srun -N 16 -b myscript.sh) then I see the same hang as in MTT. seems that mpirun is not getting properly notified of the completion of the job. :( I'll try to investigate a bit further today. Any thoughts on what might be causing this? Cheers, Josh On Aug 30, 2006, at 9:54 AM, Josh Hursey wrote: forgot this bit in my mail. With the mpirun just hanging out there I attached GDB and got the following stack trace: (gdb) bt #0 0x003d1b9bd1af in poll () from /lib64/tls/libc.so.6 #1 0x002a956e6389 in opal_poll_dispatch (base=0x5136d0, arg=0x513730, tv=0x7fbfffee70) at poll.c:191 #2 0x002a956e28b6 in opal_event_base_loop (base=0x5136d0, flags=5) at event.c:584 #3 0x002a956e26b7 in opal_event_loop (flags=5) at event.c: 514 #4 0x002a956db7c7 in opal_progress () at runtime/ opal_progress.c: 259 #5 0x0040334c in opal_condition_wait (c=0x509650, m=0x509600) at ../../../opal/threads/condition.h:81 #6 0x00402f52 in orterun (argc=9, argv=0x7fb0b8) at orterun.c:444 #7 0x004028a3 in main (argc=9, argv=0x7fb0b8) at main.c:13 Seems that mpirun is waiting for things to complete :/ On Aug 30, 2006, at 9:53 AM, Josh Hursey wrote: On Aug 30, 2006, at 7:19 AM, Jeff Squyres wrote: On 8/29/06 8:57 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: Does this apply to *all* tests, or only some of the tests (like allgather)? All of the tests: Trivial and ibm. They all timeout :( Blah. The trivial tests are simply "hello world", so they should take just about no time at all. Is this running under SLURM? I put the code in there to know how many procs to use in SLURM but have not tested it in eons. I doubt that's the problem, but that's one thing to check. Yep it is in SLURM. and it seems that the 'number of procs' code is working fine as it changes with different allocations. Can you set a super-long timeout (e.g., a few minutes), and while one of the trivial tests is running, do some ps's on the relevant nodes and see what, if anything, is running? E.g., mpirun, the test executable on the nodes, etc. Without setting a long timeout. It seems that mpirun is running, but nothing else and only on the launching node. When a test starts you see the mpirun launching properly: $ ps aux | grep ... USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mpiteam 15117 0.5 0.8 113024 33680 ? S09:32 0:06 perl ./ client/mtt --debug --scratch /u/mpiteam/tmp/mtt-scratch -- file /u/ mpiteam/local/etc/ompi-iu-odin-core.ini --verbose --print-time mpiteam 15294 0.0 0.0 00 ?Z09:32 0:00 [sh] mpiteam 28453 0.2 0.0 38072 3536 ?S09:50 0:00 mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/tmp/mtt- scratch/installs/omp
Re: [MTT users] Tests timing out
I'm trying to replicate the MTT environment as much as possible, and have a couple of questions. Assume there is no mpirun in my PATH/LD_LIBRARY_PATH when I start MTT. After MTT builds Open MPI, how does it export these variables so that it can build the tests? How does it export these when it runs those tests (solely via --prefix)? Cheers, josh On Aug 30, 2006, at 10:25 AM, Josh Hursey wrote: I already tried that. However I'm trying it in a couple different ways and getting some mixed results. Let me formulate the error cases and get back to you. Cheers, Josh On Aug 30, 2006, at 10:17 AM, Ralph H Castain wrote: Well, why don't you try first separating this from MTT? Just run the command manually in batch mode and see if it works. If that works, then the problem is with MTT. Otherwise, we have a problem with notification. Or are you saying that you have already done this? Ralph On 8/30/06 8:03 AM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: yet another point (sorry for the spam). This may not be an MTT issue but a broken ORTE on the trunk :( When I try to run in a allocation (srun -N 16 -A) things run fine. But if I try to run in batch mode (srun -N 16 -b myscript.sh) then I see the same hang as in MTT. seems that mpirun is not getting properly notified of the completion of the job. :( I'll try to investigate a bit further today. Any thoughts on what might be causing this? Cheers, Josh On Aug 30, 2006, at 9:54 AM, Josh Hursey wrote: forgot this bit in my mail. With the mpirun just hanging out there I attached GDB and got the following stack trace: (gdb) bt #0 0x003d1b9bd1af in poll () from /lib64/tls/libc.so.6 #1 0x002a956e6389 in opal_poll_dispatch (base=0x5136d0, arg=0x513730, tv=0x7fbfffee70) at poll.c:191 #2 0x002a956e28b6 in opal_event_base_loop (base=0x5136d0, flags=5) at event.c:584 #3 0x002a956e26b7 in opal_event_loop (flags=5) at event.c:514 #4 0x002a956db7c7 in opal_progress () at runtime/ opal_progress.c: 259 #5 0x0040334c in opal_condition_wait (c=0x509650, m=0x509600) at ../../../opal/threads/condition.h:81 #6 0x00402f52 in orterun (argc=9, argv=0x7fb0b8) at orterun.c:444 #7 0x004028a3 in main (argc=9, argv=0x7fb0b8) at main.c:13 Seems that mpirun is waiting for things to complete :/ On Aug 30, 2006, at 9:53 AM, Josh Hursey wrote: On Aug 30, 2006, at 7:19 AM, Jeff Squyres wrote: On 8/29/06 8:57 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: Does this apply to *all* tests, or only some of the tests (like allgather)? All of the tests: Trivial and ibm. They all timeout :( Blah. The trivial tests are simply "hello world", so they should take just about no time at all. Is this running under SLURM? I put the code in there to know how many procs to use in SLURM but have not tested it in eons. I doubt that's the problem, but that's one thing to check. Yep it is in SLURM. and it seems that the 'number of procs' code is working fine as it changes with different allocations. Can you set a super-long timeout (e.g., a few minutes), and while one of the trivial tests is running, do some ps's on the relevant nodes and see what, if anything, is running? E.g., mpirun, the test executable on the nodes, etc. Without setting a long timeout. It seems that mpirun is running, but nothing else and only on the launching node. When a test starts you see the mpirun launching properly: $ ps aux | grep ... USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND mpiteam 15117 0.5 0.8 113024 33680 ? S09:32 0:06 perl ./ client/mtt --debug --scratch /u/mpiteam/tmp/mtt-scratch --file /u/ mpiteam/local/etc/ompi-iu-odin-core.ini --verbose --print-time mpiteam 15294 0.0 0.0 00 ?Z09:32 0:00 [sh] mpiteam 28453 0.2 0.0 38072 3536 ?S09:50 0:00 mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/tmp/mtt- scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/1.3a1r11497/ install collective/allgather_in_place mpiteam 28454 0.0 0.0 41716 2040 ?Sl 09:50 0:00 srun -- nodes=16 --ntasks=16 -- nodelist=odin022,odin021,odin020,odin019,odin018,odin017,odin016,o d in 0 15 ,odin014,odin013,odin012,odin011,odin010,odin009,odin008,odin007 orted --no-daemonize --bootproxy 1 --ns-nds slurm --name 0.0.1 -- num_procs 16 --vpid_start 0 --universe mpit...@odin007.cs.indiana.edu:default-universe-28453 --nsreplica "0.0.0;tcp://129.79.240.107:40904" --gprreplica "0.0.0;tcp:// 129.79.240.107:40904" mpiteam 28455 0.0 0.0 23212 1804 ?Ssl 09:50 0:00 srun -- nodes=16 --ntasks=16 -- nodelist=odin022,odin021,odin020,odin019,odin018,odin017,odin016,o d in 0 15 ,odin014,odin013,odin012,odin011,odin010,odin009,odin008,odin007 orted --no-daemonize --bootproxy 1 --ns-nds slurm --name 0.0.1 -- num_procs 16 --vpid_start 0 --universe mpit...@odin007.cs.indiana.edu:
Re: [MTT users] Tests timing out
On Aug 29, 2006, at 6:57 PM, Jeff Squyres wrote: On 8/29/06 1:55 PM, "Josh Hursey" <jjhur...@open-mpi.org> wrote: So I'm having trouble getting tests to complete without timing out in MTT. It seems that the tests timeout and hang in MTT, but complete normally outside of MTT. Does this apply to *all* tests, or only some of the tests (like allgather)? All of the tests: Trivial and ibm. They all timeout :( Here are some details: Build: Open MPI Trunk (1.3a1r11481) Tests: Trivial ibm BTL: tcp self Nodes/processes: 16 nodes (32 processors) on the Odin Cluster at IU In MTT all of the tests timeout: Running command: mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/tmp/mtt-scratch/installs/ompi-nightly- trunk/ odin_g cc_warnings/1.3a1r11481/install collective/allgather Timeout: 1 - 1156872348 (vs. now: 1156872028) Past timeout! 1156872348 < 1156872349 Past timeout! 1156872348 < 1156872349 [snipped] : returning 0 String now: 0 *** WARNING: Test: allgather, np=32, variant=1: TIMED OUT (failed) Outside of MTT using the same build the test runs and completes normally: $ cd ~/tmp/mtt-scratch/installs/ompi-nightly-trunk/ odin_gcc_warnings/1.3a1r11481/tests/ibm/ibm/ $ mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/ tmp/mtt-scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/ 1.3a1r11481/install collective/allgather Where is mpirun in your path? MTT actually drops sourceable files in the top-level install dir (i.e., the 1.3a1r11481) that you can source in your shell and set the PATH/LD_LIBRARY_PATH for that install. Can you source it and try to run again? Yep I exported the PATH/LD_LIBRARY_PATH to the one cited in the -- prefix argument before running manually. How long does it take to run manually -- just a few seconds, or a long time (that could potentially timeout)? Just a few seconds (say 5 or so). -- Jeff Squyres Server Virtualization Business Unit Cisco Systems Josh Hursey jjhur...@open-mpi.org http://www.open-mpi.org/
[MTT users] Tests timing out
Hey all, So I'm having trouble getting tests to complete without timing out in MTT. It seems that the tests timeout and hang in MTT, but complete normally outside of MTT. Here are some details: Build: Open MPI Trunk (1.3a1r11481) Tests: Trivial ibm BTL: tcp self Nodes/processes: 16 nodes (32 processors) on the Odin Cluster at IU In MTT all of the tests timeout: Running command: mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/tmp/mtt-scratch/installs/ompi-nightly-trunk/ odin_g cc_warnings/1.3a1r11481/install collective/allgather Timeout: 1 - 1156872348 (vs. now: 1156872028) Past timeout! 1156872348 < 1156872349 Past timeout! 1156872348 < 1156872349 Command complete, exit status: 72057594037927935 Evaluating: ((_exit_status(), 0), (_exit_status(), 77)) Got name: test_exit_status Got args: _do: $ret = MTT::Values::Functions::test_exit_status() _exit_status returning: 72057594037927935 String now: ((72057594037927935, 0), (_exit_status(), 77)) Got name: eq Got args: 72057594037927935, 0 _do: $ret = MTT::Values::Functions::eq(72057594037927935, 0) got: 72057594037927935 0 : returning 0 String now: (0, (_exit_status(), 77)) Got name: test_exit_status Got args: _do: $ret = MTT::Values::Functions::test_exit_status() _exit_status returning: 72057594037927935 String now: (0, (72057594037927935, 77)) Got name: eq Got args: 72057594037927935, 77 _do: $ret = MTT::Values::Functions::eq(72057594037927935, 77) got: 72057594037927935 77 : returning 0 String now: (0, 0) Got name: or Got args: 0, 0 _do: $ret = MTT::Values::Functions::or(0, 0) got: 0 0 : returning 0 String now: 0 *** WARNING: Test: allgather, np=32, variant=1: TIMED OUT (failed) Outside of MTT using the same build the test runs and completes normally: $ cd ~/tmp/mtt-scratch/installs/ompi-nightly-trunk/ odin_gcc_warnings/1.3a1r11481/tests/ibm/ibm/ $ mpirun -mca btl tcp,self -np 32 --prefix /san/homedirs/mpiteam/ tmp/mtt-scratch/installs/ompi-nightly-trunk/odin_gcc_warnings/ 1.3a1r11481/install collective/allgather $ Any thoughts on why this might be happening in MTT but not outside of it? Cheers, Josh