Re: parallelizing crashtest runs (was: minutes of ESC call ...)
On 03/11/14 00:28, Markus Mohrhard wrote: The new script should scale nearly perfectly. There are still a few enhancements on my list so if anyone is interested in python tasks please talk to me. I could be completely off, but this makes me think of running an update on gentoo. make can restrict it to x processes at a time (advised as being number of processors plus 1), or (and I don't know how this is done) it monitors load, and only fires off new processes if load is below a target level (again, I'd guess it should default to number of processors). Don't know how practical it would be for someone to try and code that... Cheers, Wol ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: parallelizing crashtest runs (was: minutes of ESC call ...)
Hey, On Fri, Oct 31, 2014 at 2:45 PM, Christian Lohmaier lohma...@googlemail.com wrote: Hi Markus, *, On Fri, Oct 31, 2014 at 2:38 PM, Markus Mohrhard markus.mohrh...@googlemail.com wrote: The quick and ugly one is to partition the directories into 100 file directories. I have a script for that as I have done exactly that for the memcheck run on the 70 core Largo server. It is a quick and ugly implementation. The clean and much better solution is to move away from directory based invocation and partion by files on the fly. Yeah, I also thought of keeping the per-directory/filetype processing, but instead run multiple dirs at once, rather divide the set of files of a given dir into the number of workers chunks. I have a proof-of-concept somewhere on my machine and will push a working version during the next days. nice :-) So a working version is currently running on the VM. The version in the repo will be updated as soon as the script finishes without a problem. It parallelizes now nearly perfectly as it divides the work in 100 file chunks and works on them. This means that after the last update of the test files we have 641 jobs that will be put into a queue and we process as many jobs in parallel as we want (5 at the VM at the moment). Additionally the updated version of the script no longer hard codes a mapping from the file extension to the component and instead queries LibreOffice to see which component opened the file. That allows to remove quite a few mappings and will result in all file types to be imported. The old version only imported file types that were registered. The new script should scale nearly perfectly. There are still a few enhancements on my list so if anyone is interested in python tasks please talk to me. Regards, Markus ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
parallelizing crashtest runs (was: minutes of ESC call ...)
Hi *, On Thu, Oct 30, 2014 at 5:39 PM, Michael Meeks michael.me...@collabora.com wrote: * Crashtest futures / automated test scripts (Markus) + call on Tuesday; new testing hardware. + result - get a Manitu server leave room in the budget for ondemand Amazon instances (with spot pricing) if there is special need at some point. [...] When I played with the crashtest setup I noticed some limitations in the current layout of the crashtest-setup that prevents just using lots of cores/high parallelism to get faster results. The problem is that it is parallelized per directory, but the amount of files in a directory is not evenly distributed at all. So when the script decides to start odt tests last, the whole set of odt files will only be tested in one thread, leaving the other CPU-cores idling around with nothing to do. I did add a sorting statement to the script, so it will start with the directories with most files[1], but even with that you run into the problem that towards the end of the testrun not all cores will be used. As the AMD Opterons in the Manitu ones are less capable per-cpu this will set a limit to how much you can accelerate the run by just assigning more cores to it. Didn't look into the overall setup to know whether just segmenting the large directories into smaller ones is easy to do or not (i.e instead of having one odt dir with 10500+ files, have 20 with ~ 500 each. ciao Christian [1] added the sorted statement that uses the number of files in the directory as the key to sort by: def get_numfiles(directory): return len([f for f in os.listdir(directory)]) def get_directories(): d='.' directories = [o for o in os.listdir(d) if os.path.isdir(os.path.join(d,o))] return sorted(directories, key=get_numfiles, reverse=True) ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: parallelizing crashtest runs (was: minutes of ESC call ...)
Hi Christian, On Fri, 2014-10-31 at 14:23 +0100, Christian Lohmaier wrote: When I played with the crashtest setup I noticed some limitations in the current layout of the crashtest-setup that prevents just using lots of cores/high parallelism to get faster results. Oh - these sound a bit silly =) The problem is that it is parallelized per directory, but the amount of files in a directory is not evenly distributed at all. So when the script decides to start odt tests last, the whole set of odt files will only be tested in one thread, leaving the other CPU-cores idling around with nothing to do. Interesting; if we know how many cores we have, surely we can just get each thread to do a 'readdir' and divide that into n chunks - and tackle the N'th of those (?) Or is the reason we do that to make stitching together the reports simpler ? Didn't look into the overall setup to know whether just segmenting the large directories into smaller ones is easy to do or not (i.e instead of having one odt dir with 10500+ files, have 20 with ~ 500 each. Presumably there is no real reason to do anything odd to the file-system - we can partition the work in whatever way seems best (?) with some better, but still simple algorithm for partitioning / reporting ? but - honestly, it's no use asking me - this is Markus' baby - I'm sure he has a plan =) ATB, Michael. -- michael.me...@collabora.com , Pseudo Engineer, itinerant idiot ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: parallelizing crashtest runs (was: minutes of ESC call ...)
Hey, On Fri, Oct 31, 2014 at 2:23 PM, Christian Lohmaier lohma...@googlemail.com wrote: Hi *, On Thu, Oct 30, 2014 at 5:39 PM, Michael Meeks michael.me...@collabora.com wrote: * Crashtest futures / automated test scripts (Markus) + call on Tuesday; new testing hardware. + result - get a Manitu server leave room in the budget for ondemand Amazon instances (with spot pricing) if there is special need at some point. [...] When I played with the crashtest setup I noticed some limitations in the current layout of the crashtest-setup that prevents just using lots of cores/high parallelism to get faster results. The problem is that it is parallelized per directory, but the amount of files in a directory is not evenly distributed at all. So when the script decides to start odt tests last, the whole set of odt files will only be tested in one thread, leaving the other CPU-cores idling around with nothing to do. I did add a sorting statement to the script, so it will start with the directories with most files[1], but even with that you run into the problem that towards the end of the testrun not all cores will be used. As the AMD Opterons in the Manitu ones are less capable per-cpu this will set a limit to how much you can accelerate the run by just assigning more cores to it. Didn't look into the overall setup to know whether just segmenting the large directories into smaller ones is easy to do or not (i.e instead of having one odt dir with 10500+ files, have 20 with ~ 500 each. ciao Christian [1] added the sorted statement that uses the number of files in the directory as the key to sort by: def get_numfiles(directory): return len([f for f in os.listdir(directory)]) def get_directories(): d='.' directories = [o for o in os.listdir(d) if os.path.isdir(os.path.join(d,o))] return sorted(directories, key=get_numfiles, reverse=True) This is currently a known limitation but there are two solutions to the problem: The quick and ugly one is to partition the directories into 100 file directories. I have a script for that as I have done exactly that for the memcheck run on the 70 core Largo server. It is a quick and ugly implementation. The clean and much better solution is to move away from directory based invocation and partion by files on the fly. I have a proof-of-concept somewhere on my machine and will push a working version during the next days. This would even give us about half a day on our current setup as ods and odt are normally the last two running for about half a day longer than the rest of the script. With both solutions this scales perfectly. We have already tested it on the Largo server where I was able to keep a load of 70 for exactly a week (with memcheck but that does only affect the overall runtime). Regards, Markus ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice
Re: parallelizing crashtest runs (was: minutes of ESC call ...)
Hi Markus, *, On Fri, Oct 31, 2014 at 2:38 PM, Markus Mohrhard markus.mohrh...@googlemail.com wrote: The quick and ugly one is to partition the directories into 100 file directories. I have a script for that as I have done exactly that for the memcheck run on the 70 core Largo server. It is a quick and ugly implementation. The clean and much better solution is to move away from directory based invocation and partion by files on the fly. Yeah, I also thought of keeping the per-directory/filetype processing, but instead run multiple dirs at once, rather divide the set of files of a given dir into the number of workers chunks. I have a proof-of-concept somewhere on my machine and will push a working version during the next days. nice :-) ciao Christian ___ LibreOffice mailing list LibreOffice@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/libreoffice