Re: [m5-dev] semantics of translating unaligned accesses

2008-10-30 Thread Gabe Black
...? [EMAIL PROTECTED] wrote: Anybody? Quoting Gabe Black [EMAIL PROTECTED]: Hacking on this a little more, I've come across another wrinkle. The functions which finish a timing memory access expect to get a packet which has their data. That means that if multiple accesses are

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-30 Thread Steve Reinhardt
You'd have to look at which fields of the packet actually get used across the various ISAs... for example, Alpha store conditionals use the extra data slot (forgot exactly what we call it) to get the bits that say whether the SC succeeded. The CPU will have to do some combining no matter what, so

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-30 Thread Gabe Black
That's fine. I wanted to bring it up since it seemed somewhat hacky like you said. Gabe Steve Reinhardt wrote: You'd have to look at which fields of the packet actually get used across the various ISAs... for example, Alpha store conditionals use the extra data slot (forgot exactly what we

[m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Gabe Black
I injured my eye the other day and had some visitors from out of town, so I've been out of commission for the last several days. Now that I'm mostly back in the saddle, I've started working on getting the simple timing CPU working under x86. It's been going smoothly, but one issue that's come

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Ali Saidi
I think I must be missing something. How are x86 unaligned accesses handled now? 1 unaligned access - 3 aligned accesses that are combined? What chops them into piece? microcode? Some other means? The issue is that an access could span a page boundary so half of it could be in the TLB

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Gabe Black
In the atomic simple CPU, they're broken up across the CPU's peers blocksize aligned boundaries by the CPU and done as x different accesses, usually 1 or 2. There may be some weird case were it's more than 2, but I can't think of any off the top of my head. The CPU then composites the results into

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Steve Reinhardt
There's also the possibility that these accesses could be atomic, right? On Mon, Oct 27, 2008 at 5:33 PM, Gabe Black [EMAIL PROTECTED] wrote: In the atomic simple CPU, they're broken up across the CPU's peers blocksize aligned boundaries by the CPU and done as x different accesses, usually 1

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Steve Reinhardt
OK... so if we have to have the ability to lock a block in the cache for the duration of the instruction, then how those blocks get in there wrt TLB accesses is moot? That sounds plausible to me. Just want to make sure that if there is an interaction that we don't design something now that we

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Ali Saidi
I still don't get what is doing the cutting into pieces? Is it done by the CPU? or does the cpu execute two instructions to fill portions of a register? Ali On Oct 27, 2008, at 6:19 PM, Gabe Black wrote: I think so, if I'm understanding you right. I looked in the manuals, and in the

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Gabe Black
The CPU. Ali Saidi wrote: I still don't get what is doing the cutting into pieces? Is it done by the CPU? or does the cpu execute two instructions to fill portions of a register? Ali On Oct 27, 2008, at 6:19 PM, Gabe Black wrote: I think so, if I'm understanding you right. I

Re: [m5-dev] semantics of translating unaligned accesses

2008-10-27 Thread Gabe Black
Hacking on this a little more, I've come across another wrinkle. The functions which finish a timing memory access expect to get a packet which has their data. That means that if multiple accesses are required, the CPU will have to take all the packets it gets as responses and build a larger