[m5-dev] Cron m5t...@zizzer /z/m5/regression/do-regression quick
* build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/inorder-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-atomic passed. * build/ALPHA_SE/tests/fast/quick/20.eio-short/alpha/eio/simple-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-atomic-mp passed. * build/ALPHA_SE/tests/fast/quick/01.hello-2T-smt/alpha/linux/o3-timing passed. * build/ALPHA_SE/tests/fast/quick/30.eio-mp/alpha/eio/simple-timing-mp passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby passed. * build/ALPHA_SE/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby passed. * build/ALPHA_SE/tests/fast/quick/50.memtest/alpha/linux/memtest passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_SE_MOESI_hammer/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_directory/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_SE_MOESI_CMP_token/tests/fast/quick/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-atomic-dual passed. * build/ALPHA_FS/tests/fast/quick/10.linux-boot/alpha/linux/tsunami-simple-timing-dual passed. * build/ALPHA_FS/tests/fast/quick/80.netperf-stream/alpha/linux/twosys-tsunami-simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/inorder-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/o3-timing passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-atomic passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing-ruby passed. * build/MIPS_SE/tests/fast/quick/00.hello/mips/linux/simple-timing passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/o3-timing passed. * build/POWER_SE/tests/fast/quick/00.hello/power/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/00.hello/sparc/linux/simple-timing-ruby passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/o3-timing passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-atomic passed. * build/SPARC_SE/tests/fast/quick/02.insttest/sparc/linux/simple-timing passed. * build/SPARC_SE/tests/fast/quick/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp passed. *
[m5-dev] Implementation of findTagInSet
I am looking at possible performance optimizations in Ruby. As you can see grasp from the mail excerpt below, the function findTagInSet() consumes lots of time. I am thinking of making the changes as suggested by Brad. I have questions for m5-dev members, in particular for Derek and Steve. How did you arrive at the conclusion that findTagInSet() is a problem? What benchmarks, profiling tools to use? Thanks Nilay -- Forwarded message -- Date: Mon, 20 Sep 2010 22:57:39 -0500 From: Beckmann, Brad brad.beckm...@amd.com To: 'Nilay Vaish' ni...@cs.wisc.edu Cc: Daniel Gibson gib...@cs.wisc.edu Subject: RE: Performane Optimizations in Ruby == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge percentage of time was being spent in CacheMemory's findTagInSet function. Right now that function uses a hashmap across the entire cache to map tags to way ids. I think Derek recently implemented this change in hopes to improve performance, and it might have for small caches, but I don't think it helps for larger caches. There a couple of possible solutions: per set hashmaps, or reordering the ways so that the MRU blocks are at the lower ids and use a loop. I think we should investigate both solutions and see which is better. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Implementation of findTagInSet
I just compiled m5.prof and ran it (forgot what workload I ran on it, probably one of the parsec benchmarks; it probably doesn't matter a lot). If you've never used gprof before, this is a great time to learn! Steve On Tue, Nov 2, 2010 at 10:40 AM, Nilay Vaish ni...@cs.wisc.edu wrote: I am looking at possible performance optimizations in Ruby. As you can see grasp from the mail excerpt below, the function findTagInSet() consumes lots of time. I am thinking of making the changes as suggested by Brad. I have questions for m5-dev members, in particular for Derek and Steve. How did you arrive at the conclusion that findTagInSet() is a problem? What benchmarks, profiling tools to use? Thanks Nilay -- Forwarded message -- Date: Mon, 20 Sep 2010 22:57:39 -0500 From: Beckmann, Brad brad.beckm...@amd.com To: 'Nilay Vaish' ni...@cs.wisc.edu Cc: Daniel Gibson gib...@cs.wisc.edu Subject: RE: Performane Optimizations in Ruby == CacheMemory findTagInSet == Recently Steve mentioned to me that a huge percentage of time was being spent in CacheMemory's findTagInSet function. Right now that function uses a hashmap across the entire cache to map tags to way ids. I think Derek recently implemented this change in hopes to improve performance, and it might have for small caches, but I don't think it helps for larger caches. There a couple of possible solutions: per set hashmaps, or reordering the ways so that the MRU blocks are at the lower ids and use a loop. I think we should investigate both solutions and see which is better. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] build_dir has been deprecated
I went to build ALPHA_FS just now, and I must have upgraded scons as part of my most recent system update because now I get a bunch of the following warnings. scons: warning: The build_dir keyword has been deprecated; use the variant_dir keyword instead. File /home/gblack/m5/repos/m5/build/ALPHA_FS/SConscript, line 251, in module Things still seem to work, but do we want to change build_dir to variant_dir to clean that up? Will that break compatibility with an old version we still want to work? Gabe ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.
Do you mean (1) or (2)? I thought that with (1) the stats would not change. My bias would be (2), but (1) seems livable enough. In either case it would be nice to put in a warn_once() if we don't already have one so it's obvious that SW prefetches are being ignored. Steve On Sun, Oct 31, 2010 at 9:45 AM, Ali Saidi sa...@umich.edu wrote: Any input? Otherwise I'm going with (1) and have new stats to go with it. Ali On Oct 27, 2010, at 12:02 AM, Ali Saidi wrote: Hmmm... three emails when one should have done. There are three options: 1. Make them actual no-ops (e.g. stop marking them as mem refs, data prefetch, etc). The instruction count will stay the same here. The functionality will stay the same. The instructions will be further away from working -- not that I think anyone will make them work in the future. 2. Leave them in their half bake memop state where they're memops that never call read() and don't write back anything, so the instruction count is different since the inst count gets incremented after the op completes. This is what I currently have. 3. Make them actually work. I've tried to muck with this without success for a while now. Ali On Oct 26, 2010, at 11:58 PM, Ali Saidi wrote: The other portion of this, is when I try to make them act like loads, but not actually write a register I break the o3 cpu in ways that 4 hours has not been able to explain. Ali On Oct 26, 2010, at 10:42 PM, Ali Saidi wrote: The count gets smaller because since they don't actually access memory, they never complete and therefore they never increment the instruction count. Ali On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote: I vote for updating the stats... it's really wrong that we ignored them previously. On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi sa...@umich.edu wrote: Ok. So next question. With the CPU model treating prefetches as normal memory instructions the # of instructions changes for the timing simple cpu because the inst count stat is incremented in completeAccess(). So, one option is to update the stats to reflect the new count. The other option would be to stop marking the prefetch instructions as memory ops in which case they would just execute as nop. Any thoughts? Ali On Oct 24, 2010, at 12:14 AM, Steve Reinhardt wrote: No, we've lived with Alpha prefetches the way they are for long enough now I don't see where fixing them buys us that much. Steve On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi sa...@umich.edu wrote: Sounds goo to me. I'll take a look at what I need to do to implement it. Any arguments with the Alpha prefetch instructions staying nops? Ali On Oct 22, 2010, at 6:52 AM, Steve Reinhardt wrote: On Tue, Oct 19, 2010 at 11:14 PM, Ali Saidi sa...@umich.edu wrote: I think the prefetch should be sent the the TLB unconditionally, and then if the prefetch faults the CPU should toss the instruction rather than the TLB returning no fault and the CPU i guess checking if the PA is set? I agree that we should override the fault in the CPU. Are we violently agreeing? OK, it's becoming a little clearer to me now. I think we're agreeing that the TLB should be oblivious to whether an access is a prefetch or not, so that's a start. The general picture I'd like to see is that once a prefetch returns from the TLB, the CPU does something like: if (inst-fault == NoFault) { access the cache } else if (inst-isPrefetch()) { maybe set a flag if necessary inst-fault = NoFault; } ...so basically everywhere else down the pipeline where we check for faults we don't have to explicitly except prefetches from normal fault handling. If there are points past this one where we really care to know if a prefetch accessed the cache or not, then maybe we need a flag to remember that (sort of a dynamic version of the NO_ACCESS static flag), but I don't know if that's really necessary or not. Clearly if the cache access doesn't happen right there, then we can add the flag and use it later to decide whether to access the cache. Anyway, this is the flavor I was going for... any issues with it? Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.
Unfortunately, the stats change in all cases. For (1) the instructions no longer have IsMemRef set which means the num_refs changes for all CPUs and the change causes some minor changes in the O3. With (2) they're half baked, so the models call initiateAcc() but it doesn't actually initiate the access, so completeAcc() is never called and thus they aren't counted as part of the instruction count. (2) isn't ideal since half-calling the initiateAcc() might lead to some problems down the road. I'll post a diff today. Ali On Tue, 2 Nov 2010 12:18:08 -0700, Steve Reinhardt wrote: Do you mean (1) or (2)? I thought that with (1) the stats would not change. My bias would be (2), but (1) seems livable enough. In either case it would be nice to put in a warn_once() if we don't already have one so it's obvious that SW prefetches are being ignored. Steve On Sun, Oct 31, 2010 at 9:45 AM, Ali Saidi wrote: Any input? Otherwise I'm going with (1) and have new stats to go with it. Ali On Oct 27, 2010, at 12:02 AM, Ali Saidi wrote: Hmmm... three emails when one should have done. There are three options: 1. Make them actual no-ops (e.g. stop marking them as mem refs, data prefetch, etc). The instruction count will stay the same here. The functionality will stay the same. The instructions will be further away from working -- not that I think anyone will make them work in the future. 2. Leave them in their half bake memop state where they're memops that never call read() and don't write back anything, so the instruction count is different since the inst count gets incremented after the op completes. This is what I currently have. 3. Make them actually work. I've tried to muck with this without success for a while now. Ali On Oct 26, 2010, at 11:58 PM, Ali Saidi wrote: The other portion of this, is when I try to make them act like loads, but not actually write a register I break the o3 cpu in ways that 4 hours has not been able to explain. Ali On Oct 26, 2010, at 10:42 PM, Ali Saidi wrote: The count gets smaller because since they don't actually access memory, they never complete and therefore they never increment the instruction count. Ali On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote: I vote for updating the stats... it's really wrong that we ignored them previously. On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi wrote: Ok. So next question. With the CPU model treating prefetches as normal memory instructions the # of instructions changes for the timing simple cpu because the inst count stat is incremented in completeAccess(). So, one option is to update the stats to reflect the new count. The other option would be to stop marking the prefetch instructions as memory ops in which case they would just execute as nop. Any thoughts? Ali On Oct 24, 2010, at 12:14 AM, Steve Reinhardt wrote: No, we've lived with Alpha prefetches the way they are for long enough now I don't see where fixing them buys us that much. Steve On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi wrote: Sounds goo to me. I'll take a look at what I need to do to implement it. Any arguments with the Alpha prefetch instructions staying nops? Ali On Oct 22, 2010, at 6:52 AM, Steve Reinhardt wrote: On Tue, Oct 19, 2010 at 11:14 PM, Ali Saidi wrote: I think the prefetch should be sent the the TLB unconditionally, and then if the prefetch faults the CPU should toss the instruction rather than the TLB returning no fault and the CPU i guess checking if the PA is set? I agree that we should override the fault in the CPU. Are we violently agreeing? OK, it's becoming a little clearer to me now. I think we're agreeing that the TLB should be oblivious to whether an access is a prefetch or not, so that's a start. The general picture I'd like to see is that once a prefetch returns from the TLB, the CPU does something like: if (inst-fault == NoFault) { access the cache } else if (inst-isPrefetch()) { maybe set a flag if necessary inst-fault = NoFault; } ...so basically everywhere else down the pipeline where we check for faults we don't have to explicitly except prefetches from normal fault handling. If there are points past this one where we really care to know if a prefetch accessed the cache or not, then maybe we need a flag to remember that (sort of a dynamic version of the NO_ACCESS static flag), but I don't know if that's really necessary or not. Clearly if the cache access doesn't happen right there, then we can add the flag and use it later to decide whether to access the cache. Anyway, this is the flavor I was going for... any issues with it? Steve ___ m5-dev mailing list m5-dev@m5sim.org [5] http://m5sim.org/mailman/listinfo/m5-dev [6] ___ m5-dev mailing list m5-dev@m5sim.org
Re: [m5-dev] Review Request: ARM: Mark prefetches as such and allow timing CPU to handle them.
There is a NO_ACCESS reference that is implemnted in the simple CPU, but not in the O3 CPU. Ultimately, it's a couple of lines of code difference between (1) and (2). I've also written the isa description for (3), so I'm even up for revisiting it after we get prefetches working with other x86 or arm in O3 and seeing if it's fixed. Ali On Tue, 2 Nov 2010 13:12:17 -0700, Steve Reinhardt wrote: Aren't there other NO_ACCESS references (in other ISAs) that call initiateAcc() but not completeAcc()? If so, then that by itself doesn't seem like justification to avoid solution (2). If not, then I suppose I agree with you. Steve On Tue, Nov 2, 2010 at 12:54 PM, Ali Saidi wrote: Unfortunately, the stats change in all cases. For (1) the instructions no longer have IsMemRef set which means the num_refs changes for all CPUs and the change causes some minor changes in the O3. With (2) they're half baked, so the models call initiateAcc() but it doesn't actually initiate the access, so completeAcc() is never called and thus they aren't counted as part of the instruction count. (2) isn't ideal since half-calling the initiateAcc() might lead to some problems down the road. I'll post a diff today. Ali On Tue, 2 Nov 2010 12:18:08 -0700, Steve Reinhardt wrote: Do you mean (1) or (2)? I thought that with (1) the stats would not change. My bias would be (2), but (1) seems livable enough. In either case it would be nice to put in a warn_once() if we don't already have one so it's obvious that SW prefetches are being ignored. Steve On Sun, Oct 31, 2010 at 9:45 AM, Ali Saidi wrote: Any input? Otherwise I'm going with (1) and have new stats to go with it. Ali On Oct 27, 2010, at 12:02 AM, Ali Saidi wrote: Hmmm... three emails when one should have done. There are three options: 1. Make them actual no-ops (e.g. stop marking them as mem refs, data prefetch, etc). The instruction count will stay the same here. The functionality will stay the same. The instructions will be further away from working -- not that I think anyone will make them work in the future. 2. Leave them in their half bake memop state where they're memops that never call read() and don't write back anything, so the instruction count is different since the inst count gets incremented after the op completes. This is what I currently have. 3. Make them actually work. I've tried to muck with this without success for a while now. Ali On Oct 26, 2010, at 11:58 PM, Ali Saidi wrote: The other portion of this, is when I try to make them act like loads, but not actually write a register I break the o3 cpu in ways that 4 hours has not been able to explain. Ali On Oct 26, 2010, at 10:42 PM, Ali Saidi wrote: The count gets smaller because since they don't actually access memory, they never complete and therefore they never increment the instruction count. Ali On Oct 26, 2010, at 9:53 PM, Steve Reinhardt wrote: I vote for updating the stats... it's really wrong that we ignored them previously. On Tue, Oct 26, 2010 at 5:47 PM, Ali Saidi wrote: Ok. So next question. With the CPU model treating prefetches as normal memory instructions the # of instructions changes for the timing simple cpu because the inst count stat is incremented in completeAccess(). So, one option is to update the stats to reflect the new count. The other option would be to stop marking the prefetch instructions as memory ops in which case they would just execute as nop. Any thoughts? Ali On Oct 24, 2010, at 12:14 AM, Steve Reinhardt wrote: No, we've lived with Alpha prefetches the way they are for long enough now I don't see where fixing them buys us that much. Steve On Sat, Oct 23, 2010 at 6:13 PM, Ali Saidi wrote: Sounds goo to me. I'll take a look at what I need to do to implement it. Any arguments with the Alpha prefetch instructions staying nops? Ali On Oct 22, 2010, at 6:52 AM, Steve Reinhardt wrote: On Tue, Oct 19, 2010 at 11:14 PM, Ali Saidi wrote: I think the prefetch should be sent the the TLB unconditionally, and then if the prefetch faults the CPU should toss the instruction rather than the TLB returning no fault and the CPU i guess checking if the PA is set? I agree that we should override the fault in the CPU. Are we violently agreeing? OK, it's becoming a little clearer to me now. I think we're agreeing that the TLB should be oblivious to whether an access is a prefetch or not, so that's a start. The general picture I'd like to see is that once a prefetch returns from the TLB, the CPU does something like: if (inst-fault == NoFault) { access the cache } else if (inst-isPrefetch()) { maybe set a flag if necessary inst-fault = NoFault; } ...so basically everywhere else down the pipeline where we check for faults we don't have to explicitly except prefetches from normal fault handling. If there are points past this one where