[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2024-09-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |WONTFIX

--- Comment #7 from Jeffrey A. Law  ---
This just isn't something we're going to do.  Sorry hpa.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-15 Thread amylaar at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

Jorn Wolfgang Rennecke  changed:

   What|Removed |Added

 CC||amylaar at gcc dot gnu.org

--- Comment #6 from Jorn Wolfgang Rennecke  ---
(In reply to H. Peter Anvin from comment #5)

> 2. It seems like it almost would require an implementation-specific
> performance model. Now, one can validly argue that by setting the cost of
> unimplemented instructions to a (near-)infinite value such instructions
> should never be generated even if they are "enabled". That might also be a
> possible avenue for achieving this.

Yes, that makes it possible to implement the interface without actually having
a dedicated mask table.  However, you still have the headache of how to get
code generation to use this effectively.  A lot of code generation strategies
are basically canned solution that a skilled assembler programmer has devised;
you can theoretically use the superoptimizer to find linear sequences for
arbitrary instruction sets, but the compilation time cost and the limit to
linear sequences makes this impractical.
Therefore, as you want to co-develop architecture and software, you likely also
have to hack the compiler to make effective use of your architecture.
FWIW, 'infinite' cost seems unnecessarily high, considering you could make your
assembler replace missing instructions with function calls, and these functions
can get linked from a library.  So you have a finite cost per-call for the call
site size (static instruction count) & time (dynamic instruction count), and a
one-time size cost per-object for each function used.  Such a library and
assembler modification could be prepared for specific extensions that you want
to deconstruct, and then used flexibly.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #5 from H. Peter Anvin  ---
I don't think source code modifications are a huge problem, but at this point
they require tracking down each individual bit.

As far as trapping implementations are concerned:

1. In deeply embedded implementations, it is entirely possible that
firmware/microcode might be *more* expensive than logic. Although memory arrays
are, of course, very dense, they are still extremely general and RISC-V isn't a
very sparse instruction set.

2. It seems like it almost would require an implementation-specific performance
model. Now, one can validly argue that by setting the cost of unimplemented
instructions to a (near-)infinite value such instructions should never be
generated even if they are "enabled". That might also be a possible avenue for
achieving this.

As far as an explosion of subsets, yes, this is really what this means.
Bloating a tiny on-chip control processor both in area and timing to implement
instructions that never actually appears in the code is at best painful.

That being said, I do intend to submit a proposal to the RISC-V ISA folks to
subset the Zbb subset. It is worth noting that there are overlaps between the
Zb* and Zbk* subsets, but the individual intersection sets do not have their
own names.

The Zbb instruction set is particularly noxious (and this is indeed an ISA
definition problem), because it implements multiple things that are, from an
implementation point of view, completely separate and require separate code
paths in the ALU:

§ 1.2.1 Logical with negate
- minimal cost; in fact in some implementations it might have zero or
even negative cost due to decoder simplification.
- Extremely common in embedded operations.

§ 1.2.2 Count leading/trailing zero bits
- Requires dedicated logic.
- ctz and clz have very different uses.
- Typically clz and ctz will not be able to share logic, either,
requiring *two* dedicated units.

§ 1.2.3 Count population
- Requires dedicated logic.
- May be useless depending on what the processor needs.

§ 1.2.4 Integer minimum/maximum
- May be cheap or expensive, depending on if an existing comparator can
be leveraged.
- Quite possibly free or almost free if the AMO instruction set is
already supported in its entirety, as that requires max/min already.

§ 1.2.5 Sign- and zero-extension
§ 1.2.6 Bitwise rotation
- May be very cheap or quite expensive, depending on the implementation
of the shift instructions.

§ 1.2.7 OR combine
- Requires dedicated logic.
- Virtually useless in control processors that do not process text.

§ 1.2.8 Byte-reverse
- Requires dedicated logic.
- These, and some other instructions, are special cases of a bit swap
extension proposed in the original bitmanip proposal, but was not included even
as a separate set.
- Virtually useless in control processors that does not need to
interface with cross-endian data.


These 8 groups really ought to be given separate names.

Is this going to happen again? Quite likely.

It seems, as you say, that chopping the public ISA to pieces to support every
single use case would seem unlikely.

It really comes down to: out of multiple suboptimal cases (forced hardware
bloat, custom subsets, extremely fine grained public subsets, vendor-hacked
trees that lag behind and/or diverge from upstream), what option is the least
amount of badness?

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #4 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #3)
> (In reply to H. Peter Anvin from comment #2)
> > Named subsets are, inherently, designed to make sense toward mass-produced
> > products where the hardware and software are designed (mostly)
> > independently. However, what I mean with "very deep embedded use" is
> > hardware and software being co-designed.
> > 
> > The RISC-V ISA policy is that those are considered vendor-specific subsets
> > and are to be given an X* name; however, gcc obviously needs to be able to
> > understand the meaning of this X* name. At this point there is no way to do
> > without changing the source code in nontrivial ways.
> > 
> > Regardless of if it is done in source code or at runtime, by implementing a
> > fine-grained, preferably table-driven, approach to subsets in gcc then it
> > would be very simple for a hardware implementor to define their custom
> > X-subsets without a lot of surgery to the code, *and* it makes it possible
> > to take it one step further and allowing custom (or newly defined! - there
> > have been multiple instances already of new subsets of existing instructions
> > defined a posteori) instruction subsets to be defined in a configuration
> > file.
> 
> I am 100% disagree here. Because if you do this there would be a huge
> explosion of what is and is not considered a subset. THIS is why it should
> be defined at the ISA level instead. Why just CTZ for ZBB what next just
> bseti or bexti of ZBS?
> 
> defining the specific set during your development is different from a
> production compiler really. GCC should aim for production compiler quality
> even for highly embedded targets.

IMO adding some config file for custom subsets is going to make more headaches
than it fixes.  For a while we had args like "-mno-div", but that's kind of
hacky and we eventually ended up with Zmmul to handle it -- having an external
config file controlling this would expose a lot of interface surface we don't
have a sane way to test.

If vendors want a custom subset then they can make one, it'll just be called
"X${vendor}${subset}".  We've already got a few forks/subsets floating around,
look at the T-Head and Ventana stuff.  For a few instructions it's pretty
mechanical, aside from fixing whatever fallout comes from splitting off the
subset.

We do currently require (IIRC we still didn't write this down) some amount of
public commitment to hardware availability to take that code, but if that's the
problem we should try and figure something out.  It's certainly a pain for
vendors to keep in-development trees around, but we're trading that off with
upstream pain -- I've found these sorts of subsets drift around until the HW
actually ships, so we don't want to end up stuck keeping around subsets that
didn't ship.

Vendors also have the option of just implementing all the instructions (via
some trap or microcode or whatever), thus turning this into a performance
problem.  That sort of just trades one problem for another, but we've got some
examples of this as well (SiFive traps on a bunch of stuff, for example).

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #3 from Andrew Pinski  ---
(In reply to H. Peter Anvin from comment #2)
> Named subsets are, inherently, designed to make sense toward mass-produced
> products where the hardware and software are designed (mostly)
> independently. However, what I mean with "very deep embedded use" is
> hardware and software being co-designed.
> 
> The RISC-V ISA policy is that those are considered vendor-specific subsets
> and are to be given an X* name; however, gcc obviously needs to be able to
> understand the meaning of this X* name. At this point there is no way to do
> without changing the source code in nontrivial ways.
> 
> Regardless of if it is done in source code or at runtime, by implementing a
> fine-grained, preferably table-driven, approach to subsets in gcc then it
> would be very simple for a hardware implementor to define their custom
> X-subsets without a lot of surgery to the code, *and* it makes it possible
> to take it one step further and allowing custom (or newly defined! - there
> have been multiple instances already of new subsets of existing instructions
> defined a posteori) instruction subsets to be defined in a configuration
> file.

I am 100% disagree here. Because if you do this there would be a huge explosion
of what is and is not considered a subset. THIS is why it should be defined at
the ISA level instead. Why just CTZ for ZBB what next just bseti or bexti of
ZBS?

defining the specific set during your development is different from a
production compiler really. GCC should aim for production compiler quality even
for highly embedded targets.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread hpa at zytor dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #2 from H. Peter Anvin  ---
Named subsets are, inherently, designed to make sense toward mass-produced
products where the hardware and software are designed (mostly) independently.
However, what I mean with "very deep embedded use" is hardware and software
being co-designed.

The RISC-V ISA policy is that those are considered vendor-specific subsets and
are to be given an X* name; however, gcc obviously needs to be able to
understand the meaning of this X* name. At this point there is no way to do
without changing the source code in nontrivial ways.

Regardless of if it is done in source code or at runtime, by implementing a
fine-grained, preferably table-driven, approach to subsets in gcc then it would
be very simple for a hardware implementor to define their custom X-subsets
without a lot of surgery to the code, *and* it makes it possible to take it one
step further and allowing custom (or newly defined! - there have been multiple
instances already of new subsets of existing instructions defined a posteori)
instruction subsets to be defined in a configuration file.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

--- Comment #1 from Andrew Pinski  ---
This sounds more like something which should be designed on at ISA level and
since RISC-V is an open source ISA, it should be discussed at that level ...

There are already extensions which are designed this way too. E.g. Zmmul which
is a subset of the M extension.