[gem5-dev] Ruby Virtual Network Error in MOESI_CMP Protocol
Hi, I am running into a problem in which a message from the Network is sent to the supposedly unused MessageBuffers in the MOESI_CMP_directory Protocol. The message buffers that I am referring to, are the foo/goo buffers in *-L1cache.sm. I know there is a recent patch that removes these from the code because they are unused, but I am using a slightly older changeset of M5 before that patch was applied. http://reviews.gem5.org/r/490/ Specifically, I am running into the assert(false) failure. in_port(goo_in, RequestMsg, goo) { if (goo_in.isReady()) { peek(goo_in, RequestMsg) { assert(false); } } } Looking at that patch, all it does was remove those instances of the port. I tried commenting them out manually, but then I get an error of a virtual net not being connected. panic: Ordering property of fromNet node 3 j 1 has not been set @ cycle 2453317927500 [enqueue:build/ALPHA_FS_MOESI_CMP_directory/mem/ruby/buffers/MessageBuffer.cc, line 170] I double checked to make sure that the virtual network #s did not change, and they did not. In debugging with the ProtocolTrace flag, the last thing that appears, is a message that is enqueued on the forwardFromDir message buffer in *-dir.sm. (I have made some changes to the coherence protocol, in terms of transitions/new states, but did not add any new message buffers). I am not quite sure how/when these virtual networks are connected, just as long as I enqueue/dequeue the message from the correct buffer. Therefore, my question(s) are, 1) how does the network know which buffer to enqueue messages from the network when there are multiple network=from buffers, because then I can figure out why the goo message buffer is being used. 2) If a message buffer is full, does the message get enqueued on another buffer going in the same direction (e.g. to L1 cache)? Thanks. Malek ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Ruby Virtual Network Error in MOESI_CMP Protocol
It is node 3, vnet 1, but vnet=1 corresponds to the MessageBuffer goo, network=From, virtual_network=1, ordered=false; And goo corresponds to the goo_in. port in_port(goo_in, RequestMsg, goo) { if (goo_in.isReady()) { peek(goo_in, RequestMsg) { assert(false); } } } As for the node 3, it just comes up on that because of how the thread gets scheduled onto that node. Malek On Tue, May 17, 2011 at 3:39 PM, Korey Sewell ksew...@umich.edu wrote: panic: Ordering property of fromNet node 3 j 1 has not been set @ cycle 2453317927500 [enqueue:build/ALPHA_FS_MOESI_CMP_directory/mem/ruby/buffers/MessageBuffer.cc, line 170] Are you sure that's not node 3, vnet 1??? -- - Korey ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev ___ gem5-dev mailing list gem5-dev@m5sim.org http://m5sim.org/mailman/listinfo/gem5-dev
[m5-dev] Ruby Store/Coaslescing Buffer Implementation for TimingSimpleCPU
Hello, I am interested in implementing a storebuffer (coalescing buffer) for Ruby's Memory Model in M5/GEM5 for use in my current research. I wish to be able to coalesce speculative stores + non-speculative stores to the same cache line and then flush them to the cache during certain acquire/release constructs. I see that there was an existing directory called storebuffer, but was removed not too long ago. Reading the associated thread on the mailing list it seems that it was removed because it is not in use (given that O3 is not yet functional with Ruby), nor was never actually even used in the original GEM implementation. Here is the link to that thread: http://www.mail-archive.com/m5-dev@m5sim.org/msg10575.html In further reading of that thread, I see that there is/was general consensus that the Ruby Store Buffer will be merged with M5 O3's LSQ. For my research, O3 CPU Model is not a requirement, although storebuffers tend to be used typically only in O3 execution. For what I need to do, my specific question is as follows: A) Would it be better/easier to implement a new Buffer (similar to the MessageBuffer class) from the Ruby Side or B) actually reuse M5's existing O3's LSQ buffer in the Timing CPU Model. I think that A) might be the easier method to go for the following reasons: 1) It seems that the Sequencer class already has functionality to support coalescing stores to the same cache line (in reading the previous storebuffer thread) 2) This would make the coalescing buffer CPU Model independent 3) Avoid having to change the Timing CPU Code which may make it more likely to mess up how the CPU Model handles other memory related things (ISA-Dependent Memory references, split data requests, prefetching, etc). 4) Allows me to make it a Ruby Only change on the Ruby Code side of things as opposed to the M5 side of things. However, my hesitation with this approach is because 1) the way the Sequencer operates, it is the interface between the CPU Core and the Ruby Memory Model (converting M5 requests to Ruby Requests and what not), so 'logically' I guess it might make more sense to implement the store buffer before Ruby sees the store requests, and just have the sequencer do its thing with the coalescing? 2) The conclusion of the previous storebuffer thread was that work is currently?/will be done implementing the store buffer on the M5 side of things. Depending on if I go with Approach A), I know I would have to change which message buffer L1 communicates with L2, such that instead of sending stores through the L2 Request Buffer, I would send it as follows: L1 - Coaslescing Buffer - L2 Request Network Buffer - L2 instead of L1 - L2 Request Network Buffer - L2 But I am not sure how exactly I would go about this if I want to add this coalescing buffer to sit between the CPU Core and L1 as well? Could those familiar with Ruby comment on my thoughts/offer suggestions? Thanks Malek ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] Ruby Question about MESI_CMP and implications of a Blocking Protocol
Hi, 1) I was wondering if the MESI_CMP protocol is currently implemented as a 'blocking' protocol, similar to how the MOESI_CMP version is. I see that on this link on the GEMS page that it indicates that the MOESI_CMP one is blocking, but doesn't indicate anything about the MESI_CMP version. http://www.cs.wisc.edu/gems/doc/gems-wiki/moin.cgi/Protocols?action=fullsearchcontext=180value=blockingtitlesearch=Titles In the MOESI_CMP Version there are coherenceResponseType Messages such as 'Unblock' / 'Exclusive_Unblock' which seems to enforce the blocking aspect, and I also see these types in the MESI version, but just implementing them does not necessarily enforce blocking in all possible situations. 2) Even if the MESI version is non-blocking, because Ruby only currently works with the Timing CPU Model, only one request can be issued to the memory model at a time anyway (and I believe the CPU stalls anyway until that request COMPLETES), but generally is it possible to have multiple outstanding/in progress in the ruby memory model even though Timing CPU is Blocking? For example, can Core 0 do a STORE X to L1 as L1 does a writeback of Data: Y to L2? I suspect no, that in a blocking Cache Coherence Protocol I cannot do that, but just wanted to confirm. Malek ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Korey, I don't seem to have encountered that deadlock threshold when booting the old changeset. I tried both 16 + 20 core configurations just now and they seem to work. Although, they do take a really really long time compared to ~1-4 cores. I have also tried previously booting 64 cores, some time ago, and that also worked, but that took several hours. In general though, that threshold is just a fixed number, and as the CMP machine gets bigger, the 5 seems to be way too low, and would have to be multiplied by a factor of 2 -3? I tried using the default crossbar topology, maybe you encounter the deadlock threshold using Mesh? Malek On Fri, Mar 18, 2011 at 12:12 PM, Korey Sewell ksew...@umich.edu wrote: message below Why did it work before the block size patch? - When the ChuckGenerator sees the block size is 0, it doesn't split up the request into multiple patches and sends the whole dma request at once. That is fine because the DMASequencer splits the request into multiple requests and only responds to the dma port when the entire request is complete. With regards to the old changeset that boots with the block size = 0, I was not able to boot a large scale CMP system (more than 16 cores) due to the deadlock threshold being triggered. I'm assuming that Brad has a read on how to fix that problem so I'll probably start working on what is causing that deadlock so hopefully we can kind of pipeline the bug fixes. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Brad/Korey, An Update of what I have seen. I did notice that in the failing case, the DMASequencer would think that the request is completed (length of request == 64) when in fact it should be 8192. The 8192 reflects the byte sector size, but what is interesting is that a DPRINTF(IdeIDisk) in ide_disk.cc right before it fails indicates that the request length is 8192. So there is something wrong with the transfer in the RubyPorts. I have a feeling it might be also linked with the timing simpleCpu changes about handling split requests, although Alpha does not support split requests, that is independent of the DMA transfers. Also, comparing Ruby Traces (with and without failing changeset) the first PRD BaseAddr is consistent between them, but not consistent between Ruby/M5. So the fact that the PRD BaseAddr is 'wrong' in the one case does not prevent it from booting the Kernel. Not really sure if that helps anymore. Malek On Tue, Mar 15, 2011 at 6:50 PM, Korey Sewell ksew...@umich.edu wrote: Sorry for the confusion, I definitely garbled up some terminology. I meant that the M5 ran with the atomic model to compare with the timing Ruby model. M5-atomic maybe runs in 10-15 mins and then Ruby 20-30 mins. I am able to get the problem point in the Ruby simulation (bad DMA access) in about 20 mins. I able to get to that same problem point in the M5-atomic mode in about 10 mins so as to see what to compare against and what values are being set/unset incorrectly. On Tue, Mar 15, 2011 at 6:22 PM, Beckmann, Brad brad.beckm...@amd.comwrote: I'm confused. Korey, I thought this DMA problem only existed with Ruby? If so, how were you able to reproduce it using atomic mode? Ruby does not work with the atomic cpu model. Please clarify, thanks! Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Korey Sewell Sent: Tuesday, March 15, 2011 12:09 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Brad/Malek, I've been able to regenerate this error in about 20mins now (instead of hours) by running things in atomic mode. Not sure if that helps or not... On Tue, Mar 15, 2011 at 3:03 PM, Beckmann, Brad brad.beckm...@amd.comwrote: How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? I don't run it in FS mode. Since the DMA bug manifests only after hours of execution, I wanted to first verify that the DMA protocol support was solid using the mem tester. Somewhat surprisingly, I found several bugs in MOESI_CMP_directory's support of DMA. It turns out that the initial DMA support in that protocol wasn't very well thought out. Now I fixed those bugs, but since the DMA problem also arises with the MOESI_hammer protocol, I'm confident that my patches don't fix the real problem. Brad ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Brad, How is that you are able to run the memtester in FS Mode? I see the ruby_mem_tester.py in /configs/example/ but it seems that it is only configured for SE Mode as far as Ruby is concerned? Also, how would the default block size be '0' without that problem changeset? If it was 0, doesn't that mean it's not passing the data from the DMA transfer? It would have to be at least 1? Malek On Mon, Mar 14, 2011 at 5:32 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Malek, Just to reiterate, I don't think my patches will fix the underlining problem. Instead, my patches just fix various corner cases in the protocols. I suspect these corner cases are never actually reached in real execution. The fact that your dma traces point out that the Ruby and Classic configurations use different base addresses makes me think this might be a problem with configuration and device registration. We should investigate further. Brad -Original Message- From: Malek Musleh [mailto:malek.mus...@gmail.com] Sent: Monday, March 14, 2011 9:11 AM To: M5 Developer List Cc: Beckmann, Brad Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey/Brad, I commented out the following lines: In RubyPort.hh unsigned deviceBlockSize() const; In RubyPort.cc unsigned RubyPort::M5Port::deviceBlockSize() const { return (unsigned) RubySystem::getBlockSizeBytes(); } I also did a diff trace between M5 and Ruby using the IdeDisk traceflag as indicated earlier on. In the Ruby Trace, it stalls at this 2398589225000: system.disk0: Write to disk at offset: 0x1 data 0 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 Waiting for the Interrupt to be Posted. However, a comparison between the M5 and Ruby traces suggest that they differ on the following line: RubyTrace: 239858940: system.disk0: Write to disk at offset: 0x2 data 0x10 2398589575000: system.disk0: Write to disk at offset: 0x3 data 0 2398589742000: system.disk0: Write to disk at offset: 0x4 data 0 2398589909000: system.disk0: Write to disk at offset: 0x5 data 0 2398590088000: system.disk0: Write to disk at offset: 0x6 data 0xe0 2398596763500: system.disk0: Write to disk at offset: 0x7 data 0xc8 2398597916500: system.disk0: PRD: baseAddr:0x87298000 (0x7298000) byteCount:8192 (16) eot:0x8000 sector:0 2398597916500: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 M5 Trace: 2237623634000: system.disk0: Write to disk at offset: 0x7 data 0xc8 2237624206501: system.disk0: PRD: baseAddr:0x87392000 (0x7392000) byteCount:8192 (16) eot:0x8000 sector:0 2237624206501: system.disk0: doDmaWrite, diskDelay: 100 totalDiskDelay: 116 If you note that the PRD:baseAddr it tries to access is different, which I would think should be the same right? There is no reason why it should be different? The 0 or 1 block size, and the sequential retries are forcing the DMA timer to time out the request, and thus fails in the dma inconsistent state. I have attached both sets of traces in case it sheds anymore light on to the cause of the problem. In any case, it might not matter too much now since Brad was able to reproduce the problem and has a patch for it, but may be of use for future M5 changes. Malek On Mon, Mar 14, 2011 at 11:54 AM, Beckmann, Brad brad.beckm...@amd.com wrote: Thanks Malek. Very interesting. Yes, this 5 line changeset seems rather benign, but actually has huge ramifications. With this change, the RubyPort passes the correct block size to the cpu/device models. Without it, I believe the block size defaults to 0 or 1...I can't remember which. While that seems rather inconsequential, I noticed when I made this change that the memtester behaved quite differently. In particular, it keeps issuing requests until sendTiming returns false, instead of just one request/cpu at a time. Therefore another patch in this series added the retry mechanism to the RubyPort. I'm still not sure exactly what the problem is with ruby+dma, but I suspect that the dma devices are behaving differently now that the RubyPort passes the correct block size. I was able to spend a few hours on this over the weekend. I am now able to reproduce the error and I have a few protocol bug fixes queued up. However, I don't think those fixes actually solved the main issue. I don't think I'll be able to get to it today
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Brad, I found the problem that was causing this error. Specifically, it is this changeset: changeset: 7909:eee578ed2130 user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfers Link: http://reviews.m5sim.org/r/393/diff/#index_header Previously, I mentioned it was a couple of changesets prior to this one, but the changes between them are related, so it wasn't as obvious what was happening. In fact, this corresponds to the assert() for the block size you had put in to deal with x86 unaligned accesses, but then later removed because of LL/SC in Alpha. It's not clear to me why this is causing a problem, or rather why this doesn't return the default 64 byte block size from the ruby system, but commenting out those lines of code allowed it to work. Maybe Korey could confirm? Malek On Wed, Mar 9, 2011 at 8:24 PM, Beckmann, Brad brad.beckm...@amd.com wrote: I still have not been able to reproduce the problem, but I haven't tried in a few weeks. So does this happen when booting up the system, independent of what benchmark you are running? If so, could you send me your command line? I'm sure the disk image and kernel binaries between us are different, so I don't necessarily think I'll be able to reproduce your problem, but at least I'll be able to isolate it. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Wednesday, March 09, 2011 4:41 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS - DMA Controller problem? Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS - DMA Controller problem?
Hi Korey, I ran into a similar problem with a different benchmark/boot up attempt. There is another thread on m5-dev with 'Ruby FS failing with recent changesets' as the subject. I was able to track down the changeset which it was coming from, but I did not look further into the changeset as to why it was causing it. Brad said he would take a look at it, but I am not sure if he was able to reproduce the problem. Malek On Wed, Mar 9, 2011 at 7:08 PM, Korey Sewell ksew...@umich.edu wrote: Hi all, I'm trying to run Ruby in FS mode for the FFT benchmark. However, I've been unable to fully boot the kernel and error with a panic in the IDE disk controller: panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62640732569001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Has anybody run into a similar error or does anyone have any suggestions for debugging the problem? I can run the same code using the M5 memory system and FFT finishes properly so it's definitely a ruby-specific thing. It seems to track this down , I could diff instruction traces (M5 v. Ruby) or maybe even diff trace output from the IdeDisk trace flags but those routes seem a bit heavy-handed considering the amount of trace output generated. The command line this was run with is: build/ALPHA_FS_MOESI_CMP_directory/m5.opt configs/example/ruby_fs.py -b fft_64t_base -n 1 The output in system.terminal is: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected hdc: M5 IDE Disk, ATA DISK drive hdc: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 2866752 sectors (1467 MB), CHS=2844/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 1008000 sectors (516 MB), CHS=1000/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt Thanks again, any help or thoughts would be well appreciated. -- - Korey ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Ruby FS Fails with recent Changesets
Hi Brad, I tested your latest changeset, and it seems that it 'solves' the handleResponse error I was getting when running 3 or more cores, but the dma_expiry error is still there. Such that, now the error is consistent, no matter what number of cores I try to run with: For more information see: http://www.m5sim.org/warn/3e0eccba panic: Inconsistent DMA transfer state: dmaState = 2 devState = 1 @ cycle 62411238889001 [doDmaTransfer:build/ALPHA_FS_MOESI_CMP_directory/dev/ide_disk.cc, line 323] Memory Usage: 382600 KBytes - M5 Terminal --- hda: max request size: 128KiB hda: 101808 sectors (52 MB), CHS=101/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 4177920 sectors (2139 MB), CHS=4144/16/63 hdb:4hdb: dma_timer_expiry: dma status == 0x65 hdb: DMA interrupt recovery hdb: lost interrupt The panic error seems to suggest an inconsistent DMA state, so I tried reverting to an older changeset (before DMA changes were pushed out) such as 7936, and even 7930 but no such luck. The changeset that I know works from last week or so is changeset 7842. Looking at the changset summaries between 7842 and 7930 seem to indicate a lot of changes 'unrelated' to the DMA, such as O3, InOrderCPU, and x86 changes. That being said, I did not do a diff on those intermediate changesets to verify that maybe a related file was slightly modified in the process. I might be able to spend some more time trying changesets till I narrow down which one its coming from, but maybe the new panic message might give you some indication on how to fix it? (I think the panic messaged appeared now and not before because I let the simulation terminate itself when running overnight as opposed to me killing it once I saw the dma_expiry message on the M5 Terminal). Malek On Wed, Feb 9, 2011 at 7:00 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Hi Malek, Yes, thanks for letting us know. I'm pretty sure I know what the problem is. Previously, if a SC operation failed, the RubyPort would convert the request packet to a response packet, bypassed writing the functional view of memory, and pass it back up to the CPU. In my most recent patches I generalized the mechanism that converts request packets to response packets and avoids writing functional memory. However, I forgot to remove the duplicate request to response conversion for failed SC requests. Therefore, I bet you are encounter that assertion error on that duplicate call. It should be a simple one line change that fixes your problem. I'll push it momentarily and it would be great if you could confirm that my change does indeed fix your problem. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Gabe Black Sent: Wednesday, February 09, 2011 3:54 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS Fails with recent Changesets Thanks for letting us know. If it wouldn't be too much trouble, could you please try some other changesets near the one that isn't working and try to determine which one specifically broke things? A bunch of changes went in recently so it would be helpful to narrow things down. I'm not very involved with Ruby right now personally, but I assume that would be useful information for the people that are. Gabe On 02/09/11 14:51, Malek Musleh wrote: Hello, I first started using the Ruby Model in M5 about a week or so ago, and was able to boot in FS mode (up to 64 cores once applying the BigTsunami patches). In order to keep up with the changes in the Ruby code, I have started fetching recent updates from the devrepo. However, in fetching the updates to the recent changesets (from the last 2 days) Ruby FS does not boot. I tried both MESI_CMP_directory and MOESI_CMP_directory. If running 2 cores or less I get this at the terminal screen after letting it run for some time: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 101808 sectors (52 MB), CHS=101/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 --- problem When running 3 or more cores, I get the following assertion failure: info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux Listening for system connection on port 3456 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 0: system.remote_gdb.listener: listening for remote gdb #0
Re: [m5-dev] Ruby FS Fails with recent Changesets
I should note that I did not try the Mesh2D configuration to see if that results in the same error or not, and although I specify the topology to be the Crossbar I believe Crossbar is already the default implementation. Malek On Thu, Feb 10, 2011 at 5:26 PM, Malek Musleh malek.mus...@gmail.com wrote: Hi Brad, I tested the different changesets and have narrowed down to where it begins. The last changeset that works (since 7842) is 7905. At 7906 this is the error: command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt ./configs/example/ruby\ _fs.py -n 4 --topology Crossbar Global frequency set at 1 ticks per second info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux Listening for system connection on port 3456 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001 0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002 0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003 REAL SIMULATION info: Entering event queue @ 0. Starting simulation... info: Launching CPU 1 @ 835461000 info: Launching CPU 2 @ 846156000 info: Launching CPU 3 @ 856768000 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba 1349195500: system.terminal: attach terminal 0 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba m5.opt: build/ALPHA_FS_MOESI_CMP_directory/mem/ruby/system/RubyPort.cc:230: virt\ ual bool RubyPort::M5Port::recvTiming(Packet*): Assertion `Address(ruby_request.\ paddr).getOffset() + ruby_request.len = RubySystem::getBlockSizeBytes()' failed\ . Program aborted at cycle 2406378289516 Aborted The same error occurs for 7907 - 7908. At changeset 7909 is where the dma_expiry error first shows up: 7909: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legac\ y ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 101808 sectors (52 MB), CHS=101/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 4177920 sectors (2139 MB), CHS=4144/16/63 I tested changeset 7920: and thats where I notice the handleResponse() 7920: M5 compiled Feb 10 2011 14:49:49 M5 revision 39c86a8306d2+ 7920+ default M5 started Feb 10 2011 14:53:38 M5 executing on sherpa05 command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt ./configs/example/ruby\ _fs.py -n 4 --topology Crossbar Global frequency set at 1 ticks per second info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux Listening for system connection on port 3456 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001 0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002 0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003 REAL SIMULATION info: Entering event queue @ 0. Starting simulation... info: Launching CPU 1 @ 835461000 info: Launching CPU 2 @ 846156000 info: Launching CPU 3 @ 856768000 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba 1128875500: system.terminal: attach terminal 0 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba m5.opt: build/ALPHA_FS_MOESI_CMP_directory/mem/packet.hh:590: void Packet::makeResponse(): Assertion `needsResponse()' failed. Program aborted at cycle 36235566500 Aborted Note that I have not tested changesets 7911-7918. I have tested the MOESI_CMP_directory protocol on all of these with m5.opt. I have testes using MESI_CMP_directory for some of them and got the same messages. This is my command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt - ./configs/example/ruby_fs.py -n 4 --topology Crossbar The error comes at about 15 minutes in to boot the kernel. Note that it takes a while for the io to be scheduled. io scheduler noop registered io scheduler anticipatory registered io scheduler deadline registered io scheduler cfq registered (default) In all cases though where the dma_expiry occurs (which does not include changesets 7906-7908), the last thing that appears is this: ide0 at 0x8410-0x8417,0x8422
Re: [m5-dev] Ruby FS Fails with recent Changesets
Ah yes sorry about that. I have 2 directories one in which is a clean copy of the m5 repo, and one private one in which i fetch changes into from the clean one. The changesets I referred to all correspond to the strictly clean m5 repo, but never the less, Here is a copy from an hg log to the changesets I have referred too. changeset: 7922:7532067f818e user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:19 2011 -0800 summary: ruby: support to stallAndWait the mandatory queue changeset: 7921:351f1761765f user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:19 2011 -0800 summary: ruby: minor fix to deadlock panic message changeset: 7920:39c86a8306d2 user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:19 2011 -0800 summary: boot: script that creates a checkpoint after Linux boot up changeset: 7919:3a02353d6e43 user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:19 2011 -0800 summary: garnet: Split network power in ruby.stats changeset: 7910:8a92b39be50e user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:18 2011 -0800 summary: ruby: Fix RubyPort to properly handle retrys changeset: 7909:eee578ed2130 user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfer s changeset: 7908:4e83ebb67794 user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Add support for locked memory accesses in X86_FS changeset: 7907:d648b8409d4c user:Joel Hestness hestn...@cs.utexas.edu date:Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Update the Ruby request type names for LL/SC changeset: 7906:5ccd97218ca0 user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:18 2011 -0800 summary: ruby: Assert for x86 misaligned access changeset: 7905:00ad807ed2ca user:Brad Beckmann brad.beckm...@amd.com date:Sun Feb 06 22:14:18 2011 -0800 summary: ruby: x86 fs config support Malek On Thu, Feb 10, 2011 at 5:35 PM, Gabe Black gbl...@eecs.umich.edu wrote: Numbers like 7905 are only meaningful in a strict sense in your own tree since different trees might number things differently. The longer hex value is universal. It's possible the trees are similar enough that those would match, but there's no guarantee. Gabe On 02/10/11 14:26, Malek Musleh wrote: Hi Brad, I tested the different changesets and have narrowed down to where it begins. The last changeset that works (since 7842) is 7905. At 7906 this is the error: command line: ./build/ALPHA_FS_MOESI_CMP_directory/m5.opt ./configs/example/ruby\ _fs.py -n 4 --topology Crossbar Global frequency set at 1 ticks per second info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux Listening for system connection on port 3456 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001 0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002 0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003 REAL SIMULATION info: Entering event queue @ 0. Starting simulation... info: Launching CPU 1 @ 835461000 info: Launching CPU 2 @ 846156000 info: Launching CPU 3 @ 856768000 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba 1349195500: system.terminal: attach terminal 0 warn: Prefetch instrutions is Alpha do not do anything For more information see: http://www.m5sim.org/warn/3e0eccba m5.opt: build/ALPHA_FS_MOESI_CMP_directory/mem/ruby/system/RubyPort.cc:230: virt\ ual bool RubyPort::M5Port::recvTiming(Packet*): Assertion `Address(ruby_request.\ paddr).getOffset() + ruby_request.len = RubySystem::getBlockSizeBytes()' failed\ . Program aborted at cycle 2406378289516 Aborted The same error occurs for 7907 - 7908. At changeset 7909 is where the dma_expiry error first shows up: 7909: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legac\ y ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 101808 sectors (52 MB), CHS=101/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 hda: DMA interrupt recovery hda: lost interrupt unknown partition table hdb: max request size: 128KiB hdb: 4177920 sectors (2139 MB), CHS=4144/16/63 I tested changeset 7920
Re: [m5-dev] Ruby FS Fails with recent Changesets
/libpython2.5.so.1.0 #20 0x7f2c1384fdac in PyRun_StringFlags () from /usr/lib/libpython2.5.so.1.0 #21 0x007d2d41 in m5Main (argc=6, argv=0x7fff1c323a88) at build/ALPHA_FS_MOESI_CMP_directory/sim/init.cc:248 #22 0x0040a717 in main (argc=6, argv=0x7fff1c323a88) at build/ALPHA_FS_MOESI_CMP_directory/sim/main.cc:57 I thought of trying to checkpoint to a given point in the process, but I noticed that checkpointing is not yet supported (comment in ruby_fs.py as well as another thread on dev about this). Out of curiousity, is that something currently in the works and or would require a lot of time to implement? Let me know if there is anything else that I can help with. Malek On Thu, Feb 10, 2011 at 7:45 PM, Beckmann, Brad brad.beckm...@amd.com wrote: Ah, ok this assert problem makes sense to me. I suspect this is one of those situations where unexpected normal operations proves the assert is incorrect. As the comment associated with patch 7906 says, I added that assert because I wanted to make sure that unaligned x86 cpu accesses were not passed to the Ruby sequencer. However, I didn't realized at the time that DMA accesses go through the same path (i.e. the dma sequencer also inherits from RubyPort). While the normal cpu sequencer cannot handle unaligned accesses, the dma sequencer can. Therefore, that assert is incorrect and should be removed. Though I've been running a lot of FS simulations lately, none of them have had any DMA activity. Thus I haven't encountered the error myself. I will try to check in a fix as soon as I can but right now I'm having trouble connecting to m5sim.org. As soon as that problem is resolved, I'll push the fix (removing the assert). Now that should fix your problem with 7906, but I'm not sure if that will fix your dma_expiry error. So do you encounter that error running .fast and/or .debug? Can you provide me a call stack for the error? Is there an easy way for me to reproduce it? I doubt the topology makes a bit of difference. And yes you can get a protocol trace by specifying the ProtocolTrace trace-flag. Brad -Original Message- From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf Of Malek Musleh Sent: Thursday, February 10, 2011 2:51 PM To: M5 Developer List Subject: Re: [m5-dev] Ruby FS Fails with recent Changesets Ah yes sorry about that. I have 2 directories one in which is a clean copy of the m5 repo, and one private one in which i fetch changes into from the clean one. The changesets I referred to all correspond to the strictly clean m5 repo, but never the less, Here is a copy from an hg log to the changesets I have referred too. changeset: 7922:7532067f818e user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:19 2011 -0800 summary: ruby: support to stallAndWait the mandatory queue changeset: 7921:351f1761765f user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:19 2011 -0800 summary: ruby: minor fix to deadlock panic message changeset: 7920:39c86a8306d2 user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:19 2011 -0800 summary: boot: script that creates a checkpoint after Linux boot up changeset: 7919:3a02353d6e43 user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:19 2011 -0800 summary: garnet: Split network power in ruby.stats changeset: 7910:8a92b39be50e user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:18 2011 -0800 summary: ruby: Fix RubyPort to properly handle retrys changeset: 7909:eee578ed2130 user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Fix to return cache block size to CPU for split data transfer s changeset: 7908:4e83ebb67794 user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Add support for locked memory accesses in X86_FS changeset: 7907:d648b8409d4c user: Joel Hestness hestn...@cs.utexas.edu date: Sun Feb 06 22:14:18 2011 -0800 summary: Ruby: Update the Ruby request type names for LL/SC changeset: 7906:5ccd97218ca0 user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:18 2011 -0800 summary: ruby: Assert for x86 misaligned access changeset: 7905:00ad807ed2ca user: Brad Beckmann brad.beckm...@amd.com date: Sun Feb 06 22:14:18 2011 -0800 summary: ruby: x86 fs config support Malek On Thu, Feb 10, 2011 at 5:35 PM, Gabe Black gbl...@eecs.umich.edu wrote: Numbers like 7905 are only meaningful in a strict sense in your own tree since different trees might number things differently. The longer hex value is universal. It's possible the trees are similar enough that those would match, but there's no guarantee
[m5-dev] Ruby FS Fails with recent Changesets
Hello, I first started using the Ruby Model in M5 about a week or so ago, and was able to boot in FS mode (up to 64 cores once applying the BigTsunami patches). In order to keep up with the changes in the Ruby code, I have started fetching recent updates from the devrepo. However, in fetching the updates to the recent changesets (from the last 2 days) Ruby FS does not boot. I tried both MESI_CMP_directory and MOESI_CMP_directory. If running 2 cores or less I get this at the terminal screen after letting it run for some time: hda: M5 IDE Disk, ATA DISK drive hdb: M5 IDE Disk, ATA DISK drive hda: UDMA/33 mode selected hdb: UDMA/33 mode selected ide0 at 0x8410-0x8417,0x8422 on irq 31 ide1 at 0x8418-0x841f,0x8426 on irq 31 ide_generic: please use probe_mask=0x3f module parameter for probing all legacy ISA IDE ports ide2 at 0x1f0-0x1f7,0x3f6 on irq 14 ide3 at 0x170-0x177,0x376 on irq 15 hda: max request size: 128KiB hda: 101808 sectors (52 MB), CHS=101/16/63 hda:4hda: dma_timer_expiry: dma status == 0x65 --- problem When running 3 or more cores, I get the following assertion failure: info: kernel located at: /home/musleh/M5/m5_system_2.0b3/binaries/vmlinux Listening for system connection on port 3456 0: system.tsunami.io.rtc: Real-time clock set to Thu Jan 1 00:00:00 2009 0: system.remote_gdb.listener: listening for remote gdb #0 on port 7000 0: system.remote_gdb.listener: listening for remote gdb #1 on port 7001 0: system.remote_gdb.listener: listening for remote gdb #2 on port 7002 0: system.remote_gdb.listener: listening for remote gdb #3 on port 7003 REAL SIMULATION info: Entering event queue @ 0. Starting simulation... info: Launching CPU 1 @ 834794000 info: Launching CPU 2 @ 845489000 info: Launching CPU 3 @ 856101000 m5.opt: build/ALPHA_FS_MESI_CMP_directory/mem/packet.hh:590: void Packet::makeResponse(): Assertion `needsResponse()' failed. Program aborted at cycle 97716 Aborted The top of the tree is this last changeset: changeset: 7939:215c8be67063 tag: tip user:Brad Beckmann brad.beckm...@amd.com date:Tue Feb 08 18:07:54 2011 -0800 summary: regess: protocol regression tester updates I am not sure if those whom it concern are aware of it or not, or if there will be a soon to be updated changeset already in the works for this or not, but I figured I would bring it to your attention. Malek ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev