Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
(I wrote most of this email before I saw your second message, so I'm not sure whether it still applies or not, but I figured I'd send it anyway.) Hi Gabe, I certainly agree with your goals, but I will quibble with some of the details... On Thu, Jul 22, 2010 at 2:17 PM, Gabriel Michael Black wrote: > To me there's actually a spectrum of ISA dependence, incompletely described > below: > > 1. If ISA == suchandsuch do this, otherwise do that. > 2. If ISA has characteristic such and such, do this, otherwise do that. > 3. Here, ISA, you take care of this. > 4. ISA data parameterizing non-ISA stuff. > > Number 1 is the worst since it's hard to maintain, can be cumbersome to > specify, and it isn't always clear -why- ISA suchandsuch needs a particular > behavior. I could go into a lot more detail about why I think #1 is the worst---I think you're being too kind here---but since we agree it's bad I won't bother. > I'd say number 2 is second worst, and that's what this is an example of. > It's better since it's obvious why the code is separated out and there can > be sharing between CPU models, etc., but at the same time it increases the > CPU's awareness of the ISA on it and partially breaks down the barriers of > abstraction. It also sets up special case code paths where, for instance, > only x86 on the timing CPU would possibly exhibit a certain bug. If someone > changes things for ARM and everything seems to work, they could be subtley > breaking x86 which they aren't familiar with and weren't thinking about when > they made their change. I don't really agree with this analysis... I think you can also look at #2 as a special case of #4 where the ISA parameter is a bool rather than (for example) an int. I agree there's a qualitative difference in terms of how dramatically control flow is affected, but I think it's a perspective that makes it psychologically more appealing. And the fact that it sets up special code paths is orthogonal: if there's a bug in how unaligned accesses are handled, that's only going to show up in x86 regardless of whether that code path is explicitly blocked by a HasUnalignedAccess flag or just implicitly never gets exercised because sreqLow is always NULL for non-x86 ISAs. Overall I don't disagree with your ranking; in fact we've already discussed moving the unaligned access handling for both the TLB and cache into the ISA-specific TLB and a shim, respectively, which would be an example of moving this code from case 2 to case 3. However I view it as more of a gulf, where #1 is really bad, and 2-4 are reasonable things to do, and while in general 4>3>2 there are things that are just easier to do via #2 and there's no need to unnaturally avoid that. > I think it's bad news to have a big list of yes or no checkboxes in each ISA > directory which turn on and off behaviors. This is especially true when it's > ambiguous whether to say yes or no, if the behavior changes based on > circumstances, etc., and if/when the checkbox is interpretted subtley (or > not so subtley) differently by the consumer. I'm not advocating for a huge list of ambiguous checkboxes... I'm just advocating for #2 in preference to #1, and also saying that #2 is really not so bad that we need to bend over backward to avoid it when it seems natural. Steve ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
Actually, maybe this falls into a subcategory of 2. which is why it seems more acceptable. 2.a If the ISA says it's safe to use this optimization which doesn't affect behavior and which could be turned off and only result in lower simulator performance, skip this code/check/whatever. This still has some of the drawbacks I mentioned for the list-of-checkboxes model, but at least if it doesn't fit there's always a failsafe option. These would be optional hints, like telling the compiler a function never returns. This may have been what you were talking about in the first place, in which case great, we're on the same page. I'd been debating whether to send out my little list anyway since I think it's productive to spend a few cycles thinking about this stuff. Gabe Quoting Gabriel Michael Black : To me there's actually a spectrum of ISA dependence, incompletely described below: 1. If ISA == suchandsuch do this, otherwise do that. 2. If ISA has characteristic such and such, do this, otherwise do that. 3. Here, ISA, you take care of this. 4. ISA data parameterizing non-ISA stuff. Number 1 is the worst since it's hard to maintain, can be cumbersome to specify, and it isn't always clear -why- ISA suchandsuch needs a particular behavior. I'd say number 2 is second worst, and that's what this is an example of. It's better since it's obvious why the code is separated out and there can be sharing between CPU models, etc., but at the same time it increases the CPU's awareness of the ISA on it and partially breaks down the barriers of abstraction. It also sets up special case code paths where, for instance, only x86 on the timing CPU would possibly exhibit a certain bug. If someone changes things for ARM and everything seems to work, they could be subtley breaking x86 which they aren't familiar with and weren't thinking about when they made their change. 3. Three is better in some ways and worse in others. It's clear what's happening, there's a lot of flexibility, and the CPU isn't actually aware of -how- something is being done, just that, say, this would be a good time to check for interrupts, whatever that means. It's not as great because you have more complex interactions between ISAs and CPUs, and you have to do a bit more work in the ISA. It can also be hard to make some of this functionality work sensibly in order, out of order, single cycle, multi cycle, timing mode, atomic mode, etc. etc. 4. This one is the best when you can get away with it. This is where you, say, make your integer register file 32 registers because the ISA says that's how many it needs. Basically the only draw back to this is that behavior changes a little based on each ISA, but if you can get away with it this is the safest. I think having a multi ISA simulator that will be modified by its end users, especially one with as large a cross product of options as ours, needs to try to be safe and simple before being as absolutely fast as it can be. You know what they say about premature optimization. Ideally we should design things so the big, unnecessary pieces of code just aren't part of the equation because they're in the ISA defined objects, control just doesn't go that way when it wouldn't make sense, etc. And if, for instance, a pointer should always be 0, the code should still behave correctly. The code should do its job correctly no matter -why- it's being asked to do it. I think it's bad news to have a big list of yes or no checkboxes in each ISA directory which turn on and off behaviors. This is especially true when it's ambiguous whether to say yes or no, if the behavior changes based on circumstances, etc., and if/when the checkbox is interpretted subtley (or not so subtley) differently by the consumer. In this particular limitted case it seems relatively ok although not necessarily advisable, but it's a slippery slope I don't consider us have a completely clean history with. Gabe Quoting Steve Reinhardt : In generalI think this is the kind of ISA hook we should be using... in the sense that checking TheISA::HasUnalignedMemAcc is much better than "(TheISA == x86 || TheISA == Power)". I think it's useful not only to avoid the overhead of a dynamic check for an ISA that doesn't need it, but also to clarify that this is code that never gets executed on those ISAs. Maybe for a little one-liner like this it's not a big deal either way, but for bigger hunks of code I think that clarification is potentially useful. Steve On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black wrote: Yes it does, and that sounds reasonable to me. I'd still like to see us use ISA hooks as minimally as possible, but this seems ok. Gabe Quoting Timothy M Jones : Oh, ok, I see where you're going with that. However, the main idea of having TheISA::HasUnalignedMemAcc was that it is a constant specific to each ISA. Therefor
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
To me there's actually a spectrum of ISA dependence, incompletely described below: 1. If ISA == suchandsuch do this, otherwise do that. 2. If ISA has characteristic such and such, do this, otherwise do that. 3. Here, ISA, you take care of this. 4. ISA data parameterizing non-ISA stuff. Number 1 is the worst since it's hard to maintain, can be cumbersome to specify, and it isn't always clear -why- ISA suchandsuch needs a particular behavior. I'd say number 2 is second worst, and that's what this is an example of. It's better since it's obvious why the code is separated out and there can be sharing between CPU models, etc., but at the same time it increases the CPU's awareness of the ISA on it and partially breaks down the barriers of abstraction. It also sets up special case code paths where, for instance, only x86 on the timing CPU would possibly exhibit a certain bug. If someone changes things for ARM and everything seems to work, they could be subtley breaking x86 which they aren't familiar with and weren't thinking about when they made their change. 3. Three is better in some ways and worse in others. It's clear what's happening, there's a lot of flexibility, and the CPU isn't actually aware of -how- something is being done, just that, say, this would be a good time to check for interrupts, whatever that means. It's not as great because you have more complex interactions between ISAs and CPUs, and you have to do a bit more work in the ISA. It can also be hard to make some of this functionality work sensibly in order, out of order, single cycle, multi cycle, timing mode, atomic mode, etc. etc. 4. This one is the best when you can get away with it. This is where you, say, make your integer register file 32 registers because the ISA says that's how many it needs. Basically the only draw back to this is that behavior changes a little based on each ISA, but if you can get away with it this is the safest. I think having a multi ISA simulator that will be modified by its end users, especially one with as large a cross product of options as ours, needs to try to be safe and simple before being as absolutely fast as it can be. You know what they say about premature optimization. Ideally we should design things so the big, unnecessary pieces of code just aren't part of the equation because they're in the ISA defined objects, control just doesn't go that way when it wouldn't make sense, etc. And if, for instance, a pointer should always be 0, the code should still behave correctly. The code should do its job correctly no matter -why- it's being asked to do it. I think it's bad news to have a big list of yes or no checkboxes in each ISA directory which turn on and off behaviors. This is especially true when it's ambiguous whether to say yes or no, if the behavior changes based on circumstances, etc., and if/when the checkbox is interpretted subtley (or not so subtley) differently by the consumer. In this particular limitted case it seems relatively ok although not necessarily advisable, but it's a slippery slope I don't consider us have a completely clean history with. Gabe Quoting Steve Reinhardt : In generalI think this is the kind of ISA hook we should be using... in the sense that checking TheISA::HasUnalignedMemAcc is much better than "(TheISA == x86 || TheISA == Power)". I think it's useful not only to avoid the overhead of a dynamic check for an ISA that doesn't need it, but also to clarify that this is code that never gets executed on those ISAs. Maybe for a little one-liner like this it's not a big deal either way, but for bigger hunks of code I think that clarification is potentially useful. Steve On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black wrote: Yes it does, and that sounds reasonable to me. I'd still like to see us use ISA hooks as minimally as possible, but this seems ok. Gabe Quoting Timothy M Jones : Oh, ok, I see where you're going with that. However, the main idea of having TheISA::HasUnalignedMemAcc was that it is a constant specific to each ISA. Therefore, the compiler should really recognise this and optimise it away wherever it's used. In this case, for ISAs that don't have unaligned memory accesses the whole 'if' block should disappear. For ISAs that do have them then the condition should be reduced to just checking sreqLow. Therefore it's better for the first set of ISAs for this to be kept in. Does that make sense? Tim On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black wrote: I think you missed my point. If the check of TheISA::HasUnalignedMemAcc is redundant, we shouldn't be checking it at all. It's a free, though very small, performance bump, but more significantly it removes a direct dependence on the ISA. Gabe Timothy M. Jones wrote: changeset 3bd51d6ac9ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef description: O
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
In generalI think this is the kind of ISA hook we should be using... in the sense that checking TheISA::HasUnalignedMemAcc is much better than "(TheISA == x86 || TheISA == Power)". I think it's useful not only to avoid the overhead of a dynamic check for an ISA that doesn't need it, but also to clarify that this is code that never gets executed on those ISAs. Maybe for a little one-liner like this it's not a big deal either way, but for bigger hunks of code I think that clarification is potentially useful. Steve On Thu, Jul 22, 2010 at 1:10 PM, Gabriel Michael Black wrote: > Yes it does, and that sounds reasonable to me. I'd still like to see us use > ISA hooks as minimally as possible, but this seems ok. > > Gabe > > Quoting Timothy M Jones : > >> Oh, ok, I see where you're going with that. However, the main idea of >> having TheISA::HasUnalignedMemAcc was that it is a constant specific to each >> ISA. Therefore, the compiler should really recognise this and optimise it >> away wherever it's used. In this case, for ISAs that don't have unaligned >> memory accesses the whole 'if' block should disappear. For ISAs that do >> have them then the condition should be reduced to just checking sreqLow. >> Therefore it's better for the first set of ISAs for this to be kept in. >> Does that make sense? >> >> Tim >> >> On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black >> wrote: >> >>> I think you missed my point. If the check of TheISA::HasUnalignedMemAcc >>> is redundant, we shouldn't be checking it at all. It's a free, though >>> very small, performance bump, but more significantly it removes a direct >>> dependence on the ISA. >>> >>> Gabe >>> >>> Timothy M. Jones wrote: changeset 3bd51d6ac9ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef description: O3CPU: Fix a bug where stores in the cpu where never marked as split. diffstat: src/cpu/o3/lsq_unit.hh | 6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diffs (16 lines): diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh --- a/src/cpu/o3/lsq_unit.hh Thu Jul 22 18:47:52 2010 +0100 +++ b/src/cpu/o3/lsq_unit.hh Thu Jul 22 18:52:02 2010 +0100 @@ -822,6 +822,12 @@ storeQueue[store_idx].sreqLow = sreqLow; storeQueue[store_idx].sreqHigh = sreqHigh; storeQueue[store_idx].size = sizeof(T); + + // Split stores can only occur in ISAs with unaligned memory accesses. If + // a store request has been split, sreqLow and sreqHigh will be non-null. + if (TheISA::HasUnalignedMemAcc && sreqLow) { + storeQueue[store_idx].isSplit = true; + } assert(sizeof(T) <= sizeof(storeQueue[store_idx].data)); T gData = htog(data); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev >>> >>> ___ >>> m5-dev mailing list >>> m5-dev@m5sim.org >>> http://m5sim.org/mailman/listinfo/m5-dev >>> >> >> >> -- >> Timothy M. Jones >> http://homepages.inf.ed.ac.uk/tjones1 >> >> The University of Edinburgh is a charitable body, registered in >> Scotland, with registration number SC005336. >> >> ___ >> m5-dev mailing list >> m5-dev@m5sim.org >> http://m5sim.org/mailman/listinfo/m5-dev >> > > > ___ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
Yes it does, and that sounds reasonable to me. I'd still like to see us use ISA hooks as minimally as possible, but this seems ok. Gabe Quoting Timothy M Jones : Oh, ok, I see where you're going with that. However, the main idea of having TheISA::HasUnalignedMemAcc was that it is a constant specific to each ISA. Therefore, the compiler should really recognise this and optimise it away wherever it's used. In this case, for ISAs that don't have unaligned memory accesses the whole 'if' block should disappear. For ISAs that do have them then the condition should be reduced to just checking sreqLow. Therefore it's better for the first set of ISAs for this to be kept in. Does that make sense? Tim On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black wrote: I think you missed my point. If the check of TheISA::HasUnalignedMemAcc is redundant, we shouldn't be checking it at all. It's a free, though very small, performance bump, but more significantly it removes a direct dependence on the ISA. Gabe Timothy M. Jones wrote: changeset 3bd51d6ac9ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef description: O3CPU: Fix a bug where stores in the cpu where never marked as split. diffstat: src/cpu/o3/lsq_unit.hh | 6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diffs (16 lines): diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh --- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100 +++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100 @@ -822,6 +822,12 @@ storeQueue[store_idx].sreqLow = sreqLow; storeQueue[store_idx].sreqHigh = sreqHigh; storeQueue[store_idx].size = sizeof(T); + +// Split stores can only occur in ISAs with unaligned memory accesses. If +// a store request has been split, sreqLow and sreqHigh will be non-null. +if (TheISA::HasUnalignedMemAcc && sreqLow) { +storeQueue[store_idx].isSplit = true; +} assert(sizeof(T) <= sizeof(storeQueue[store_idx].data)); T gData = htog(data); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- Timothy M. Jones http://homepages.inf.ed.ac.uk/tjones1 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
Oh, ok, I see where you're going with that. However, the main idea of having TheISA::HasUnalignedMemAcc was that it is a constant specific to each ISA. Therefore, the compiler should really recognise this and optimise it away wherever it's used. In this case, for ISAs that don't have unaligned memory accesses the whole 'if' block should disappear. For ISAs that do have them then the condition should be reduced to just checking sreqLow. Therefore it's better for the first set of ISAs for this to be kept in. Does that make sense? Tim On Thu, 22 Jul 2010 14:30:56 -0400, Gabe Black wrote: I think you missed my point. If the check of TheISA::HasUnalignedMemAcc is redundant, we shouldn't be checking it at all. It's a free, though very small, performance bump, but more significantly it removes a direct dependence on the ISA. Gabe Timothy M. Jones wrote: changeset 3bd51d6ac9ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef description: O3CPU: Fix a bug where stores in the cpu where never marked as split. diffstat: src/cpu/o3/lsq_unit.hh | 6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diffs (16 lines): diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh --- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100 +++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100 @@ -822,6 +822,12 @@ storeQueue[store_idx].sreqLow = sreqLow; storeQueue[store_idx].sreqHigh = sreqHigh; storeQueue[store_idx].size = sizeof(T); + +// Split stores can only occur in ISAs with unaligned memory accesses. If +// a store request has been split, sreqLow and sreqHigh will be non-null. +if (TheISA::HasUnalignedMemAcc && sreqLow) { +storeQueue[store_idx].isSplit = true; +} assert(sizeof(T) <= sizeof(storeQueue[store_idx].data)); T gData = htog(data); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- Timothy M. Jones http://homepages.inf.ed.ac.uk/tjones1 The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
I think you missed my point. If the check of TheISA::HasUnalignedMemAcc is redundant, we shouldn't be checking it at all. It's a free, though very small, performance bump, but more significantly it removes a direct dependence on the ISA. Gabe Timothy M. Jones wrote: > changeset 3bd51d6ac9ef in /z/repo/m5 > details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef > description: > O3CPU: Fix a bug where stores in the cpu where never marked as split. > > diffstat: > > src/cpu/o3/lsq_unit.hh | 6 ++ > 1 files changed, 6 insertions(+), 0 deletions(-) > > diffs (16 lines): > > diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh > --- a/src/cpu/o3/lsq_unit.hh Thu Jul 22 18:47:52 2010 +0100 > +++ b/src/cpu/o3/lsq_unit.hh Thu Jul 22 18:52:02 2010 +0100 > @@ -822,6 +822,12 @@ > storeQueue[store_idx].sreqLow = sreqLow; > storeQueue[store_idx].sreqHigh = sreqHigh; > storeQueue[store_idx].size = sizeof(T); > + > +// Split stores can only occur in ISAs with unaligned memory accesses. > If > +// a store request has been split, sreqLow and sreqHigh will be non-null. > +if (TheISA::HasUnalignedMemAcc && sreqLow) { > +storeQueue[store_idx].isSplit = true; > +} > assert(sizeof(T) <= sizeof(storeQueue[store_idx].data)); > > T gData = htog(data); > ___ > m5-dev mailing list > m5-dev@m5sim.org > http://m5sim.org/mailman/listinfo/m5-dev > ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: O3CPU: Fix a bug where stores in the cpu where ...
changeset 3bd51d6ac9ef in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=3bd51d6ac9ef description: O3CPU: Fix a bug where stores in the cpu where never marked as split. diffstat: src/cpu/o3/lsq_unit.hh | 6 ++ 1 files changed, 6 insertions(+), 0 deletions(-) diffs (16 lines): diff -r 02b471d9d400 -r 3bd51d6ac9ef src/cpu/o3/lsq_unit.hh --- a/src/cpu/o3/lsq_unit.hhThu Jul 22 18:47:52 2010 +0100 +++ b/src/cpu/o3/lsq_unit.hhThu Jul 22 18:52:02 2010 +0100 @@ -822,6 +822,12 @@ storeQueue[store_idx].sreqLow = sreqLow; storeQueue[store_idx].sreqHigh = sreqHigh; storeQueue[store_idx].size = sizeof(T); + +// Split stores can only occur in ISAs with unaligned memory accesses. If +// a store request has been split, sreqLow and sreqHigh will be non-null. +if (TheISA::HasUnalignedMemAcc && sreqLow) { +storeQueue[store_idx].isSplit = true; +} assert(sizeof(T) <= sizeof(storeQueue[store_idx].data)); T gData = htog(data); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev