Re: [OMPI devel] UD BTL alltoall hangs
On Aug 29, 2007, at 7:05 PM, Andrew Friedley wrote: $ mpirun -debug -np 2 -bynode -debug-daemons ./NPmpi -- Internal error -- the orte_base_user_debugger MCA parameter was not able to be found. Please contact the Open RTE developers; this should not happen. -- Grepping for that param in ompi_info shows: MCA orte: parameter "orte_base_user_debugger" (current value: "totalview @mpirun@ -a @mpirun_args@ : ddt -n @np@ -start @executable@ @executable_argv@ @single_app@ : fxp @mpirun@ -a @mpirun_args@") This has been broken or a while. It's a long story to explain, but a fix is on the way. Until then you should be using the latest command "tv8 mpirun -a -np 2 -bynode `pwd`/NPmpi". The `pwd` is really important for some reason, otherwise TotalView is unable to find the executable. The problem is that the name of the process will be "./NPmpi" and TotalView does not have access to the path where the executable was launched (at least that's the reason I think). Once you do this, you should be good to go. george. What's going on? I also tried running totalview directly, using a line like this: totalview mpirun -a -np 2 -bynode -debug-daemons ./NPmpi Totalview comes up and seems to be running debugging the mpirun process, with only one thread. Doesn't seem to be aware that this is an MPI job with other MPI processes.. any ideas? Andrew George Bosilca wrote: The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send (). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except that DR eventually trips a watchdog timeout, fails the BTL, and terminates the job. Other collectives such as allreduce and allgather do not hang -- only alltoall. I can also reproduce the hang on LLNL's Atlas machine. Can anyone else reproduce this (Torsten might have to make a copy of nbcbench available)? Anyone have any ideas as to what's wrong? Andrew ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list
Re: [OMPI devel] UD BTL alltoall hangs
Thanks for the suggestion; though that appears to hang with no output whatsoever. Andrew Aurelien Bouteiller wrote: You should try mpirun -np 2 -bynode totalview ./NPmpi Aurelien Le 29 août 07 à 13:05, Andrew Friedley a écrit : OK, I've never used totalview before. So doing some FAQ reading I got an xterm on an Atlas node (odin doesn't have totalview AFAIK). Trying a simple netpipe run just to get familiar with things results in this: $ mpirun -debug -np 2 -bynode -debug-daemons ./NPmpi -- Internal error -- the orte_base_user_debugger MCA parameter was not able to be found. Please contact the Open RTE developers; this should not happen. -- Grepping for that param in ompi_info shows: MCA orte: parameter "orte_base_user_debugger" (current value: "totalview @mpirun@ -a @mpirun_args@ : ddt -n @np@ -start @executable@ @executable_argv@ @single_app@ : fxp @mpirun@ -a @mpirun_args@") What's going on? I also tried running totalview directly, using a line like this: totalview mpirun -a -np 2 -bynode -debug-daemons ./NPmpi Totalview comes up and seems to be running debugging the mpirun process, with only one thread. Doesn't seem to be aware that this is an MPI job with other MPI processes.. any ideas? Andrew George Bosilca wrote: The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send (). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except that DR eventually trips a watchdog timeout, fails the BTL, and terminates the job. Other collectives such as allreduce and allgather do not hang -- only alltoall. I can also reproduce the hang on LLNL's Atlas machine. Can anyone else reproduce this (Torsten might have to make a copy of nbcbench available)? Anyone have any ideas as to what's wrong? Andrew ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list
Re: [OMPI devel] SM BTL hang issue
hmmm, interesting since my version doesn't abort at all. --td Li-Ta Lo wrote: On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote: To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core system. It'll print out the following and then hang: Target duration (seconds): 10.00 # of messages sent in that time: 589207 Microseconds per message: 16.972 I know almost nothing about FORTRAN but the stack dump told me it got NULL pointer reference when accessing the "me" variable in the do .. while loop. How can this happen? [ollie@exponential ~]$ mpirun -np 2 a.out 100 [exponential:22145] *** Process received signal *** [exponential:22145] Signal: Segmentation fault (11) [exponential:22145] Signal code: Address not mapped (1) [exponential:22145] Failing at address: (nil) [exponential:22145] [ 0] [0xb7f2a440] [exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e] [exponential:22145] [ 2] a.out(main+0x27) [0x8049127] [exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0x4e75ef70] [exponential:22145] [ 4] a.out [0x8048aa1] [exponential:22145] *** End of error message *** call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1, $ MPI_COMM_WORLD,ier) 804909e: 8b 45 d4mov0xffd4(%ebp),%eax 80490a1: 83 c0 01add$0x1,%eax It is compiled with g77/g90. Ollie ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote: > To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core > system. It'll print out the following and then hang: > Target duration (seconds): 10.00 > # of messages sent in that time: 589207 > Microseconds per message: 16.972 > I know almost nothing about FORTRAN but the stack dump told me it got NULL pointer reference when accessing the "me" variable in the do .. while loop. How can this happen? [ollie@exponential ~]$ mpirun -np 2 a.out 100 [exponential:22145] *** Process received signal *** [exponential:22145] Signal: Segmentation fault (11) [exponential:22145] Signal code: Address not mapped (1) [exponential:22145] Failing at address: (nil) [exponential:22145] [ 0] [0xb7f2a440] [exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e] [exponential:22145] [ 2] a.out(main+0x27) [0x8049127] [exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0) [0x4e75ef70] [exponential:22145] [ 4] a.out [0x8048aa1] [exponential:22145] *** End of error message *** call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1, $ MPI_COMM_WORLD,ier) 804909e: 8b 45 d4mov0xffd4(%ebp),%eax 80490a1: 83 c0 01add$0x1,%eax It is compiled with g77/g90. Ollie
Re: [OMPI devel] UD BTL alltoall hangs
OK, I've never used totalview before. So doing some FAQ reading I got an xterm on an Atlas node (odin doesn't have totalview AFAIK). Trying a simple netpipe run just to get familiar with things results in this: $ mpirun -debug -np 2 -bynode -debug-daemons ./NPmpi -- Internal error -- the orte_base_user_debugger MCA parameter was not able to be found. Please contact the Open RTE developers; this should not happen. -- Grepping for that param in ompi_info shows: MCA orte: parameter "orte_base_user_debugger" (current value: "totalview @mpirun@ -a @mpirun_args@ : ddt -n @np@ -start @executable@ @executable_argv@ @single_app@ : fxp @mpirun@ -a @mpirun_args@") What's going on? I also tried running totalview directly, using a line like this: totalview mpirun -a -np 2 -bynode -debug-daemons ./NPmpi Totalview comes up and seems to be running debugging the mpirun process, with only one thread. Doesn't seem to be aware that this is an MPI job with other MPI processes.. any ideas? Andrew George Bosilca wrote: The first step will be to figure out which version of the alltoall you're using. I suppose you use the default parameters, and then the decision function in the tuned component say it is using the linear all to all. As the name state it, this means that every node will post one receive from any other node and then will start sending to every other node the respective fragment. This will lead to a lot of outstanding sends and receives. I doubt that the receive can cause a problem, so I expect the problem is coming from the send side. Do you have TotalView installed on your odin ? If yes there is a simple way to see how many sends are pending and where ... That might pinpoint [at least] the process where you should look to see what' wrong. george. On Aug 29, 2007, at 12:37 AM, Andrew Friedley wrote: I'm having a problem with the UD BTL and hoping someone might have some input to help solve it. What I'm seeing is hangs when running alltoall benchmarks with nbcbench or an LLNL program called mpiBench -- both hang exactly the same way. With the code on the trunk running nbcbench on IU's odin using 32 nodes and a command line like this: mpirun -np 128 -mca btl ofud,self ./nbcbench -t MPI_Alltoall -p 128-128 -s 1-262144 hangs consistently when testing 256-byte messages. There are two things I can do to make the hang go away until running at larger scale. First is to increase the 'btl_ofud_sd_num' MCA param from its default value of 128. This allows you to run with more procs/nodes before hitting the hang, but AFAICT doesn't fix the actual problem. What this parameter does is control the maximum number of outstanding send WQEs posted at the IB level -- when the limit is reached, frags are queued on an opal_list_t and later sent by progress as IB sends complete. The other way I've found is to play games with calling mca_btl_ud_component_progress() in mca_btl_ud_endpoint_post_send (). In fact I replaced the CHECK_FRAG_QUEUES() macro used around btl_ofud_endpoint.c:77 with a version that loops on progress until a send WQE slot is available (as opposed to queueing). Same result -- I can run at larger scale, but still hit the hang eventually. It appears that when the job hangs, progress is being polled very quickly, and after spinning for a while there are no outstanding send WQEs or queued sends in the BTL. I'm not sure where further up things are spinning/blocking, as I can't produce the hang at less than 32 nodes / 128 procs and don't have a good way of debugging that (suggestions appreciated). Furthermore, both ob1 and dr PMLs result in the same behavior, except that DR eventually trips a watchdog timeout, fails the BTL, and terminates the job. Other collectives such as allreduce and allgather do not hang -- only alltoall. I can also reproduce the hang on LLNL's Atlas machine. Can anyone else reproduce this (Torsten might have to make a copy of nbcbench available)? Anyone have any ideas as to what's wrong? Andrew ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
If you are going to look at it, I will not bother with this. Rich On 8/29/07 10:47 AM, "Gleb Natapov"wrote: > On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote: >> Gleb, >> Are you looking at this ? > Not today. And I need the code to reproduce the bug. Is this possible? > >> >> Rich >> >> >> On 8/29/07 9:56 AM, "Gleb Natapov" wrote: >> >>> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: Is this trunk or 1.2? >>> Oops. I should read more carefully :) This is trunk. >>> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > I have a program that does a simple bucket brigade of sends and receives > where rank 0 is the start and repeatedly sends to rank 1 until a certain > amount of time has passed and then it sends and all done packet. > > Running this under np=2 always works. However, when I run with greater > than 2 using only the SM btl the program usually hangs and one of the > processes has a long stack that has a lot of the following 3 calls in it: > > [25] opal_progress(), line 187 in "opal_progress.c" > [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" > [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" > > When stepping through the ompi_fifo_write_to_head routine it looks like > the fifo has overflowed. > > I am wondering if what is happening is rank 0 has sent a bunch of > messages that have exhausted the > resources such that one of the middle ranks which is in the process of > sending cannot send and therefore > never gets to the point of trying to receive the messages from rank 0? > > Is the above a possible scenario or are messages periodically bled off > the SM BTL's fifos? > > Note, I have seen np=3 pass sometimes and I can get it to pass reliably > if I raise the shared memory space used by the BTL. This is using the > trunk. > > > --td > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >>> -- >>> Gleb. >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Gleb, Are you looking at this ? Rich On 8/29/07 9:56 AM, "Gleb Natapov"wrote: > On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote: >> Is this trunk or 1.2? > Oops. I should read more carefully :) This is trunk. > >> >> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: >>> I have a program that does a simple bucket brigade of sends and receives >>> where rank 0 is the start and repeatedly sends to rank 1 until a certain >>> amount of time has passed and then it sends and all done packet. >>> >>> Running this under np=2 always works. However, when I run with greater >>> than 2 using only the SM btl the program usually hangs and one of the >>> processes has a long stack that has a lot of the following 3 calls in it: >>> >>> [25] opal_progress(), line 187 in "opal_progress.c" >>> [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" >>> [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" >>> >>> When stepping through the ompi_fifo_write_to_head routine it looks like >>> the fifo has overflowed. >>> >>> I am wondering if what is happening is rank 0 has sent a bunch of >>> messages that have exhausted the >>> resources such that one of the middle ranks which is in the process of >>> sending cannot send and therefore >>> never gets to the point of trying to receive the messages from rank 0? >>> >>> Is the above a possible scenario or are messages periodically bled off >>> the SM BTL's fifos? >>> >>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably >>> if I raise the shared memory space used by the BTL. This is using the >>> trunk. >>> >>> >>> --td >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >> >> -- >> Gleb. >> ___ >> devel mailing list >> de...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/devel > > -- > Gleb. > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Trunk. --td Gleb Natapov wrote: Is this trunk or 1.2? On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: I have a program that does a simple bucket brigade of sends and receives where rank 0 is the start and repeatedly sends to rank 1 until a certain amount of time has passed and then it sends and all done packet. Running this under np=2 always works. However, when I run with greater than 2 using only the SM btl the program usually hangs and one of the processes has a long stack that has a lot of the following 3 calls in it: [25] opal_progress(), line 187 in "opal_progress.c" [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" When stepping through the ompi_fifo_write_to_head routine it looks like the fifo has overflowed. I am wondering if what is happening is rank 0 has sent a bunch of messages that have exhausted the resources such that one of the middle ranks which is in the process of sending cannot send and therefore never gets to the point of trying to receive the messages from rank 0? Is the above a possible scenario or are messages periodically bled off the SM BTL's fifos? Note, I have seen np=3 pass sometimes and I can get it to pass reliably if I raise the shared memory space used by the BTL. This is using the trunk. --td ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] SM BTL hang issue
Is this trunk or 1.2? On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote: > I have a program that does a simple bucket brigade of sends and receives > where rank 0 is the start and repeatedly sends to rank 1 until a certain > amount of time has passed and then it sends and all done packet. > > Running this under np=2 always works. However, when I run with greater > than 2 using only the SM btl the program usually hangs and one of the > processes has a long stack that has a lot of the following 3 calls in it: > > [25] opal_progress(), line 187 in "opal_progress.c" > [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c" > [27] mca_bml_r2_progress(), line 110 in "bml_r2.c" > > When stepping through the ompi_fifo_write_to_head routine it looks like > the fifo has overflowed. > > I am wondering if what is happening is rank 0 has sent a bunch of > messages that have exhausted the > resources such that one of the middle ranks which is in the process of > sending cannot send and therefore > never gets to the point of trying to receive the messages from rank 0? > > Is the above a possible scenario or are messages periodically bled off > the SM BTL's fifos? > > Note, I have seen np=3 pass sometimes and I can get it to pass reliably > if I raise the shared memory space used by the BTL. This is using the > trunk. > > > --td > > > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Gleb.
Re: [MTT devel] Thoughts on tagging...
Humm. How do we deal with row # now that we don't have temporary tables? I remember having to hack around this a bit to get it to work. What I was initially thinking was that we would tag each row with it's corresponding triplet [mpi_install_id, test_build_id, test_run_id] then use the appropriate ID when executing the query. All three for all three phases, test_run_id for test_run phase, etc. The tag table fields would look something like: tag_id, tag, mpi_install_id, test_build_id, test_run_id So a single triplet can still be associated with multiple tags. Something to ponder some more before next weeks meeting. -- Josh On Aug 27, 2007, at 11:45 AM, Ethan Mallove wrote: With regards to a Gmail-style interface for labeling, got a comment/concern on the SQL. IIRC, when cherry-picking was implemented (for performance reports), we attempted to compose a WHERE clause that would effect the cherry-pick. This led to some wild WHERE clauses. E.g., consider the following example: +---++++---+ |# |A |B |C |D | +---++++---+ |1 |1 |2 |3 |4 | |2 |5 |6 |7 |8 | |3 |9 |10 |11 |12 | |4 |28 |10 |11 |12 | |5 |29 |10 |11 |12 | +---++++---+ If we want to cherry-pick lines 1, 2 and 3, then our WHERE clause will look like this: SELECT blah blah FROM blah,blah WHERE (A=1 AND B=2 AND C=3 AND D=4) OR (A=5 AND B=6 AND C=7 AND D=8) OR (A=9 AND B=10 AND C=11 AND D=12) We eventually chose do the filtering in PHP on row # since there does not seem to be a good way to filter by row # in SQL. The point being, a nasty WHERE clause *could* lead to a lng tag operation. Given the above, a good starting point might be to restrict tagging to the following: 1. Allow tagging only on entire reports 2. Allow tagging only on a single row at a time My $0.02. -Ethan On Fri, Aug/24/2007 03:07:54PM, Jeff Squyres wrote: I volunteered to do this on the call today. Here's my thoughts on tagging: 1. From the client, it would be nice to be able to specify a comma- delimited list of tags at any phase. Tags would be inherited by successive phases if not explicitly overridden. E.g., if you specify a "foo" tag in an MPI get, it'll be used in all phases that use that MPI get. Tags can be specified in one of three forms: +foo: means to *add* this tag to the existing/inherited set -foo: means to *remove* this tag from the existing/inherited set foo: if any tag does not have a +/- prefix, then the inherited set is cleared, effectively making the current set of tags be only the non-prefixed tags and +tags For example: [MPI Get: AAA] # + and - have little meaning for MPI Get tags = foo, bar, baz [Test Get: BBB] # + and - have little meaning for Test Get tags = yar, fweezle, bozzle [Test Build: CCC] # Test build inherits tags from MPI Get and Test Get tags = +fa-schizzle, -yar # Resulting tag set: foo, bar, baz, fweezle, bozzle, fa-schizzle [Test build: DDD] # Override everything tags = yowza, gurple # Resulting tag set: yowza, gurple 2. For the reporter, I think we only want authenticated users to be able to create / manipulate tags. Authentication can be via SVN username / password or the HTTPS submit username / password; I don't have strong preferences. Anyone can query on tags, of course. 3. We should have easy "add these results to a tag" and "remove these results from a tag" operations, similar to GMail/labels. I think the rule should be that if you can show MPI details (i.e., not the summary page), you can add/remove tags. Perhaps something as simple as a text box with two buttons: Add tag, Remove tag. 3a. Example: you drill down to a set of test runs. You type in "jeff results" in the text box and click the "add tag" button. This adds the tag "jeff results" to all the result rows that are checked (it is not an error if the "jeff results" tag already exists on some/all of the result rows). 3b. Example: you drill down to a set of test runs. You type in "jeff results" in the text box and click on the "remove tag" button. This removes the tag "jeff results" from all the result rows that are checked (it is not an error if the jeff results" tag is not on some/ all of the result rows). 4. Per Gmail index label listing, it would be nice to see a list of tags that exist on a given result row. It could be as simple as adding another show/hide column for the tags on a given result row. But it gets a little more complicated because one row many represent many different results -- do we show the union of tags for all the rollup rows? Maybe we can use different colors / attributes to represent "this tag exists on *some* of the results in this row" vs. "this tag exists on *all* of the results in this row"...? 4a. If the tags are listed as a column, they should also (of course) be clickable so that if you click on them, you get the entire set of
Re: [MTT devel] --trial pruning for v3 schema?
yeah I think we should be ok for now leving the trial results in the db. Let's revisit this in the future if things start to slow down again. -- Josh On Aug 28, 2007, at 3:28 PM, Jeff Squyres wrote: Let's see how the speed goes as we keep accumulating trials. If it becomes a problem, we can re-examine pruning. But unless it becomes a problem, let's just keep the philosophy of "keep everything". Disk space is cheap. Hopefully, dbv3 will let us store arbitrary amounts of data with no loss in performance. :-) :-) :-) On Aug 28, 2007, at 11:26 AM, Ethan Mallove wrote: Are we not going to prune "trial" results in the new schema? We previously pruned trial results to improve query speed, but for small date range intervals this might not be worthwhile with the new schema. Also, I assume a "trial" MPI install could key to a "non-trial" test build, which would mean we can only prune out test runs if we decide to prune at all. -Ethan ___ mtt-devel mailing list mtt-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel -- Jeff Squyres Cisco Systems ___ mtt-devel mailing list mtt-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/mtt-devel