[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 Jeffrey A. Law changed: What|Removed |Added CC||law at gcc dot gnu.org Status|UNCONFIRMED |RESOLVED Resolution|--- |WONTFIX --- Comment #7 from Jeffrey A. Law --- This just isn't something we're going to do. Sorry hpa.
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 Jorn Wolfgang Rennecke changed: What|Removed |Added CC||amylaar at gcc dot gnu.org --- Comment #6 from Jorn Wolfgang Rennecke --- (In reply to H. Peter Anvin from comment #5) > 2. It seems like it almost would require an implementation-specific > performance model. Now, one can validly argue that by setting the cost of > unimplemented instructions to a (near-)infinite value such instructions > should never be generated even if they are "enabled". That might also be a > possible avenue for achieving this. Yes, that makes it possible to implement the interface without actually having a dedicated mask table. However, you still have the headache of how to get code generation to use this effectively. A lot of code generation strategies are basically canned solution that a skilled assembler programmer has devised; you can theoretically use the superoptimizer to find linear sequences for arbitrary instruction sets, but the compilation time cost and the limit to linear sequences makes this impractical. Therefore, as you want to co-develop architecture and software, you likely also have to hack the compiler to make effective use of your architecture. FWIW, 'infinite' cost seems unnecessarily high, considering you could make your assembler replace missing instructions with function calls, and these functions can get linked from a library. So you have a finite cost per-call for the call site size (static instruction count) & time (dynamic instruction count), and a one-time size cost per-object for each function used. Such a library and assembler modification could be prepared for specific extensions that you want to deconstruct, and then used flexibly.
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #5 from H. Peter Anvin --- I don't think source code modifications are a huge problem, but at this point they require tracking down each individual bit. As far as trapping implementations are concerned: 1. In deeply embedded implementations, it is entirely possible that firmware/microcode might be *more* expensive than logic. Although memory arrays are, of course, very dense, they are still extremely general and RISC-V isn't a very sparse instruction set. 2. It seems like it almost would require an implementation-specific performance model. Now, one can validly argue that by setting the cost of unimplemented instructions to a (near-)infinite value such instructions should never be generated even if they are "enabled". That might also be a possible avenue for achieving this. As far as an explosion of subsets, yes, this is really what this means. Bloating a tiny on-chip control processor both in area and timing to implement instructions that never actually appears in the code is at best painful. That being said, I do intend to submit a proposal to the RISC-V ISA folks to subset the Zbb subset. It is worth noting that there are overlaps between the Zb* and Zbk* subsets, but the individual intersection sets do not have their own names. The Zbb instruction set is particularly noxious (and this is indeed an ISA definition problem), because it implements multiple things that are, from an implementation point of view, completely separate and require separate code paths in the ALU: § 1.2.1 Logical with negate - minimal cost; in fact in some implementations it might have zero or even negative cost due to decoder simplification. - Extremely common in embedded operations. § 1.2.2 Count leading/trailing zero bits - Requires dedicated logic. - ctz and clz have very different uses. - Typically clz and ctz will not be able to share logic, either, requiring *two* dedicated units. § 1.2.3 Count population - Requires dedicated logic. - May be useless depending on what the processor needs. § 1.2.4 Integer minimum/maximum - May be cheap or expensive, depending on if an existing comparator can be leveraged. - Quite possibly free or almost free if the AMO instruction set is already supported in its entirety, as that requires max/min already. § 1.2.5 Sign- and zero-extension § 1.2.6 Bitwise rotation - May be very cheap or quite expensive, depending on the implementation of the shift instructions. § 1.2.7 OR combine - Requires dedicated logic. - Virtually useless in control processors that do not process text. § 1.2.8 Byte-reverse - Requires dedicated logic. - These, and some other instructions, are special cases of a bit swap extension proposed in the original bitmanip proposal, but was not included even as a separate set. - Virtually useless in control processors that does not need to interface with cross-endian data. These 8 groups really ought to be given separate names. Is this going to happen again? Quite likely. It seems, as you say, that chopping the public ISA to pieces to support every single use case would seem unlikely. It really comes down to: out of multiple suboptimal cases (forced hardware bloat, custom subsets, extremely fine grained public subsets, vendor-hacked trees that lag behind and/or diverge from upstream), what option is the least amount of badness?
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 palmer at gcc dot gnu.org changed: What|Removed |Added CC||palmer at gcc dot gnu.org --- Comment #4 from palmer at gcc dot gnu.org --- (In reply to Andrew Pinski from comment #3) > (In reply to H. Peter Anvin from comment #2) > > Named subsets are, inherently, designed to make sense toward mass-produced > > products where the hardware and software are designed (mostly) > > independently. However, what I mean with "very deep embedded use" is > > hardware and software being co-designed. > > > > The RISC-V ISA policy is that those are considered vendor-specific subsets > > and are to be given an X* name; however, gcc obviously needs to be able to > > understand the meaning of this X* name. At this point there is no way to do > > without changing the source code in nontrivial ways. > > > > Regardless of if it is done in source code or at runtime, by implementing a > > fine-grained, preferably table-driven, approach to subsets in gcc then it > > would be very simple for a hardware implementor to define their custom > > X-subsets without a lot of surgery to the code, *and* it makes it possible > > to take it one step further and allowing custom (or newly defined! - there > > have been multiple instances already of new subsets of existing instructions > > defined a posteori) instruction subsets to be defined in a configuration > > file. > > I am 100% disagree here. Because if you do this there would be a huge > explosion of what is and is not considered a subset. THIS is why it should > be defined at the ISA level instead. Why just CTZ for ZBB what next just > bseti or bexti of ZBS? > > defining the specific set during your development is different from a > production compiler really. GCC should aim for production compiler quality > even for highly embedded targets. IMO adding some config file for custom subsets is going to make more headaches than it fixes. For a while we had args like "-mno-div", but that's kind of hacky and we eventually ended up with Zmmul to handle it -- having an external config file controlling this would expose a lot of interface surface we don't have a sane way to test. If vendors want a custom subset then they can make one, it'll just be called "X${vendor}${subset}". We've already got a few forks/subsets floating around, look at the T-Head and Ventana stuff. For a few instructions it's pretty mechanical, aside from fixing whatever fallout comes from splitting off the subset. We do currently require (IIRC we still didn't write this down) some amount of public commitment to hardware availability to take that code, but if that's the problem we should try and figure something out. It's certainly a pain for vendors to keep in-development trees around, but we're trading that off with upstream pain -- I've found these sorts of subsets drift around until the HW actually ships, so we don't want to end up stuck keeping around subsets that didn't ship. Vendors also have the option of just implementing all the instructions (via some trap or microcode or whatever), thus turning this into a performance problem. That sort of just trades one problem for another, but we've got some examples of this as well (SiFive traps on a bunch of stuff, for example).
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #3 from Andrew Pinski --- (In reply to H. Peter Anvin from comment #2) > Named subsets are, inherently, designed to make sense toward mass-produced > products where the hardware and software are designed (mostly) > independently. However, what I mean with "very deep embedded use" is > hardware and software being co-designed. > > The RISC-V ISA policy is that those are considered vendor-specific subsets > and are to be given an X* name; however, gcc obviously needs to be able to > understand the meaning of this X* name. At this point there is no way to do > without changing the source code in nontrivial ways. > > Regardless of if it is done in source code or at runtime, by implementing a > fine-grained, preferably table-driven, approach to subsets in gcc then it > would be very simple for a hardware implementor to define their custom > X-subsets without a lot of surgery to the code, *and* it makes it possible > to take it one step further and allowing custom (or newly defined! - there > have been multiple instances already of new subsets of existing instructions > defined a posteori) instruction subsets to be defined in a configuration > file. I am 100% disagree here. Because if you do this there would be a huge explosion of what is and is not considered a subset. THIS is why it should be defined at the ISA level instead. Why just CTZ for ZBB what next just bseti or bexti of ZBS? defining the specific set during your development is different from a production compiler really. GCC should aim for production compiler quality even for highly embedded targets.
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #2 from H. Peter Anvin --- Named subsets are, inherently, designed to make sense toward mass-produced products where the hardware and software are designed (mostly) independently. However, what I mean with "very deep embedded use" is hardware and software being co-designed. The RISC-V ISA policy is that those are considered vendor-specific subsets and are to be given an X* name; however, gcc obviously needs to be able to understand the meaning of this X* name. At this point there is no way to do without changing the source code in nontrivial ways. Regardless of if it is done in source code or at runtime, by implementing a fine-grained, preferably table-driven, approach to subsets in gcc then it would be very simple for a hardware implementor to define their custom X-subsets without a lot of surgery to the code, *and* it makes it possible to take it one step further and allowing custom (or newly defined! - there have been multiple instances already of new subsets of existing instructions defined a posteori) instruction subsets to be defined in a configuration file.
[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020 --- Comment #1 from Andrew Pinski --- This sounds more like something which should be designed on at ISA level and since RISC-V is an open source ISA, it should be discussed at that level ... There are already extensions which are designed this way too. E.g. Zmmul which is a subset of the M extension.