Re: [PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space

2023-09-15 Thread Ken Matsui via Gcc-patches
On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers  wrote:
>
> On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:
>
> > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> > index 545f0f4d9eb..eed6deaf0f8 100644
> > --- a/gcc/c/c-parser.h
> > +++ b/gcc/c/c-parser.h
> > @@ -51,14 +51,14 @@ enum c_id_kind {
> >  /* A single C token after string literal concatenation and conversion
> > of preprocessing tokens to tokens.  */
> >  struct GTY (()) c_token {
> > +  /* If this token is a keyword, this value indicates which keyword.
> > + Otherwise, this value is RID_MAX.  */
> > +  ENUM_BITFIELD (rid) keyword : 16;
> >/* The kind of token.  */
> >ENUM_BITFIELD (cpp_ttype) type : 8;
> >/* If this token is a CPP_NAME, this value indicates whether also
> >   declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
> >ENUM_BITFIELD (c_id_kind) id_kind : 8;
> > -  /* If this token is a keyword, this value indicates which keyword.
> > - Otherwise, this value is RID_MAX.  */
> > -  ENUM_BITFIELD (rid) keyword : 8;
> >/* If this token is a CPP_PRAGMA, this indicates the pragma that
> >   was seen.  Otherwise it is PRAGMA_NONE.  */
> >ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>
> If you want to optimize layout, I'd expect flags to move so it can share
> the same 32-bit unit as the pragma_kind bit-field (not sure if any changes
> should be made to the declaration of flags to maximise the chance of such
> sharing across different host bit-field ABIs).
>
> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 6cbb9a8e031..3c3c482c6ce 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
> >  /* A C++ token.  */
> >
> >  struct GTY (()) cp_token {
> > -  /* The kind of token.  */
> > -  enum cpp_ttype type : 8;
> >/* If this token is a keyword, this value indicates which keyword.
> >   Otherwise, this value is RID_MAX.  */
> > -  enum rid keyword : 8;
> > +  enum rid keyword : 16;
> > +  /* The kind of token.  */
> > +  enum cpp_ttype type : 8;
> >/* Token flags.  */
> >unsigned char flags;
> >/* True if this token is from a context where it is implicitly extern 
> > "C" */
>
> You're missing an update to the "3 unused bits." comment further down.
>
> > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
> >unsigned int directive_index : 7;  /* If is_directive,
> >  then index into directive table.
> >  Otherwise, a NODE_OPERATOR.  */
> > -  unsigned int rid_code : 8; /* Rid code - for front ends.  */
> > +  unsigned int rid_code : 16;/* Rid code - for front ends. 
> >  */
> >unsigned int flags : 9;/* CPP flags.  */
> >ENUM_BITFIELD(node_type) type : 2; /* CPP node type.  */
>
> You're missing an update to the "5 bits spare." comment further down.
>
> Do you have any figures for the effects on compilation time or memory
> usage from the increase in size of these structures?
>

Here is the benchmark result:

https://github.com/ken-matsui/gsoc23/tree/main/reports/gcc-build

I can see regression for compilation time but not for memory.

* Time: +0.950995%
* Memory: No difference proven at 95.0% confidence

Sincerely,
Ken Matsui

> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space

2023-09-14 Thread Ken Matsui via Gcc-patches
On Thu, Sep 14, 2023 at 10:54 AM Joseph Myers  wrote:
>
> On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:
>
> > diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> > index 545f0f4d9eb..eed6deaf0f8 100644
> > --- a/gcc/c/c-parser.h
> > +++ b/gcc/c/c-parser.h
> > @@ -51,14 +51,14 @@ enum c_id_kind {
> >  /* A single C token after string literal concatenation and conversion
> > of preprocessing tokens to tokens.  */
> >  struct GTY (()) c_token {
> > +  /* If this token is a keyword, this value indicates which keyword.
> > + Otherwise, this value is RID_MAX.  */
> > +  ENUM_BITFIELD (rid) keyword : 16;
> >/* The kind of token.  */
> >ENUM_BITFIELD (cpp_ttype) type : 8;
> >/* If this token is a CPP_NAME, this value indicates whether also
> >   declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
> >ENUM_BITFIELD (c_id_kind) id_kind : 8;
> > -  /* If this token is a keyword, this value indicates which keyword.
> > - Otherwise, this value is RID_MAX.  */
> > -  ENUM_BITFIELD (rid) keyword : 8;
> >/* If this token is a CPP_PRAGMA, this indicates the pragma that
> >   was seen.  Otherwise it is PRAGMA_NONE.  */
> >ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
>
> If you want to optimize layout, I'd expect flags to move so it can share
> the same 32-bit unit as the pragma_kind bit-field (not sure if any changes
> should be made to the declaration of flags to maximise the chance of such
> sharing across different host bit-field ABIs).
>

Thank you for your review!

I did not make this change aggressively, but we can do the following
to minimize the fragmentation:

struct GTY (()) c_token {
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  unsigned char flags; /* 1 byte */
}

Supposing a pointer size is 8 bytes and int is 4 bytes, the struct
size would be 24 bytes. The internal fragmentation would be 0 bytes,
and the external fragmentation is 6 bytes since the overall struct
alignment requirement is $K_{max} = 8$ from the pointer.

Here is the original struct before making keyword 16-bit. The overall
struct alignment requirement is $K_{max} = 8$ from the pointer. This
struct size would be 24 bytes since the internal fragmentation is 4
bytes (after location), and the external fragmentation is 3 bytes.

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 8; /* 1 byte */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

If we keep the original order with the 16-bit keyword, the struct size
would be 32 bytes (my current implementation as well, I will update
this patch).

struct GTY (()) c_token {
  ENUM_BITFIELD (cpp_ttype) type : 8; /* 1 byte */
  ENUM_BITFIELD (c_id_kind) id_kind : 8; /* 1 byte */
  ENUM_BITFIELD (rid) keyword : 16; /* 2 bytes */
  ENUM_BITFIELD (pragma_kind) pragma_kind : 8; /* 1 byte */
  location_t location; /* unsigned int, at least 2 bytes, 4 bytes as usual */
  tree value; /* pointer, depends, but 4 or 8 bytes as usual */
  unsigned char flags; /* 1 byte */
}

Likewise, the overall struct alignment requirement is $K_{max} = 8$
from the pointer. The internal fragmentation would be 7 bytes (3 bytes
after pragma_kind + 4 bytes after location), and the external
fragmentation would be 7 bytes.

I think optimizing the size is worth doing unless this breaks GCC.

> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> > index 6cbb9a8e031..3c3c482c6ce 100644
> > --- a/gcc/cp/parser.h
> > +++ b/gcc/cp/parser.h
> > @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
> >  /* A C++ token.  */
> >
> >  struct GTY (()) cp_token {
> > -  /* The kind of token.  */
> > -  enum cpp_ttype type : 8;
> >/* If this token is a keyword, this value indicates which keyword.
> >   Otherwise, this value is RID_MAX.  */
> > -  enum rid keyword : 8;
> > +  enum rid keyword : 16;
> > +  /* The kind of token.  */
> > +  enum cpp_ttype type : 8;
> >/* Token flags.  */
> >unsigned char flags;
> >/* True if this token is from a context where it is implicitly extern 
> > "C" */
>
> You're missing an update to the "3 unused bits." comment further down.
>
> > @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
> >unsigned int directive_index : 7;  /* If is_directive,
> >  then index into directive table.
> >  Otherwise, a NODE_OPERATOR.  */
> > -  unsigned int rid_code : 8; 

Re: [PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space

2023-09-14 Thread Joseph Myers
On Wed, 13 Sep 2023, Ken Matsui via Gcc-patches wrote:

> diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
> index 545f0f4d9eb..eed6deaf0f8 100644
> --- a/gcc/c/c-parser.h
> +++ b/gcc/c/c-parser.h
> @@ -51,14 +51,14 @@ enum c_id_kind {
>  /* A single C token after string literal concatenation and conversion
> of preprocessing tokens to tokens.  */
>  struct GTY (()) c_token {
> +  /* If this token is a keyword, this value indicates which keyword.
> + Otherwise, this value is RID_MAX.  */
> +  ENUM_BITFIELD (rid) keyword : 16;
>/* The kind of token.  */
>ENUM_BITFIELD (cpp_ttype) type : 8;
>/* If this token is a CPP_NAME, this value indicates whether also
>   declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
>ENUM_BITFIELD (c_id_kind) id_kind : 8;
> -  /* If this token is a keyword, this value indicates which keyword.
> - Otherwise, this value is RID_MAX.  */
> -  ENUM_BITFIELD (rid) keyword : 8;
>/* If this token is a CPP_PRAGMA, this indicates the pragma that
>   was seen.  Otherwise it is PRAGMA_NONE.  */
>ENUM_BITFIELD (pragma_kind) pragma_kind : 8;

If you want to optimize layout, I'd expect flags to move so it can share 
the same 32-bit unit as the pragma_kind bit-field (not sure if any changes 
should be made to the declaration of flags to maximise the chance of such 
sharing across different host bit-field ABIs).

> diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
> index 6cbb9a8e031..3c3c482c6ce 100644
> --- a/gcc/cp/parser.h
> +++ b/gcc/cp/parser.h
> @@ -40,11 +40,11 @@ struct GTY(()) tree_check {
>  /* A C++ token.  */
>  
>  struct GTY (()) cp_token {
> -  /* The kind of token.  */
> -  enum cpp_ttype type : 8;
>/* If this token is a keyword, this value indicates which keyword.
>   Otherwise, this value is RID_MAX.  */
> -  enum rid keyword : 8;
> +  enum rid keyword : 16;
> +  /* The kind of token.  */
> +  enum cpp_ttype type : 8;
>/* Token flags.  */
>unsigned char flags;
>/* True if this token is from a context where it is implicitly extern "C" 
> */

You're missing an update to the "3 unused bits." comment further down.

> @@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
>unsigned int directive_index : 7;  /* If is_directive,
>  then index into directive table.
>  Otherwise, a NODE_OPERATOR.  */
> -  unsigned int rid_code : 8; /* Rid code - for front ends.  */
> +  unsigned int rid_code : 16;/* Rid code - for front ends.  
> */
>unsigned int flags : 9;/* CPP flags.  */
>ENUM_BITFIELD(node_type) type : 2; /* CPP node type.  */

You're missing an update to the "5 bits spare." comment further down.

Do you have any figures for the effects on compilation time or memory 
usage from the increase in size of these structures?

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v11 16/40] c, c++: Use 16 bits for all use of enum rid for more keyword space

2023-09-14 Thread Ken Matsui via Gcc-patches
Now that RID_MAX has reached 255, we need to update the bit sizes of every
use of the enum rid from 8 to 16 to support more keywords.

gcc/c-family/ChangeLog:

* c-indentation.h (struct token_indent_info): Make keyword 16 bits
and move this upward to minimize memory fragmentation.

gcc/c/ChangeLog:

* c-parser.cc (c_parse_init): Handle RID_MAX not to exceed the max
value of 16 bits.
* c-parser.h (struct c_token): Make keyword 16 bits and move this
upward to minimize memory fragmentation.

gcc/cp/ChangeLog:

* parser.h (struct cp_token): Make keyword 16 bits and move this
upward to minimize memory fragmentation.
(struct cp_lexer): Likewise, for saved_keyword.

libcpp/ChangeLog:

* include/cpplib.h (struct cpp_hashnode): Make rid_code 16 bits.

Signed-off-by: Ken Matsui 
---
 gcc/c-family/c-indentation.h | 2 +-
 gcc/c/c-parser.cc| 6 +++---
 gcc/c/c-parser.h | 6 +++---
 gcc/cp/parser.h  | 8 
 libcpp/include/cpplib.h  | 2 +-
 5 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/gcc/c-family/c-indentation.h b/gcc/c-family/c-indentation.h
index c0e07bf49f1..1ce8753813a 100644
--- a/gcc/c-family/c-indentation.h
+++ b/gcc/c-family/c-indentation.h
@@ -25,8 +25,8 @@ along with GCC; see the file COPYING3.  If not see
 struct token_indent_info
 {
   location_t location;
+  ENUM_BITFIELD (rid) keyword : 16;
   ENUM_BITFIELD (cpp_ttype) type : 8;
-  ENUM_BITFIELD (rid) keyword : 8;
 };
 
 /* Extract token information from TOKEN, which ought to either be a
diff --git a/gcc/c/c-parser.cc b/gcc/c/c-parser.cc
index b9a1b75ca43..2086f253923 100644
--- a/gcc/c/c-parser.cc
+++ b/gcc/c/c-parser.cc
@@ -115,9 +115,9 @@ c_parse_init (void)
   tree id;
   int mask = 0;
 
-  /* Make sure RID_MAX hasn't grown past the 8 bits used to hold the keyword in
- the c_token structure.  */
-  gcc_assert (RID_MAX <= 255);
+  /* Make sure RID_MAX hasn't grown past the 16 bits used to hold the keyword
+ in the c_token structure.  */
+  gcc_assert (RID_MAX <= 65535);
 
   mask |= D_CXXONLY;
   if (!flag_isoc99)
diff --git a/gcc/c/c-parser.h b/gcc/c/c-parser.h
index 545f0f4d9eb..eed6deaf0f8 100644
--- a/gcc/c/c-parser.h
+++ b/gcc/c/c-parser.h
@@ -51,14 +51,14 @@ enum c_id_kind {
 /* A single C token after string literal concatenation and conversion
of preprocessing tokens to tokens.  */
 struct GTY (()) c_token {
+  /* If this token is a keyword, this value indicates which keyword.
+ Otherwise, this value is RID_MAX.  */
+  ENUM_BITFIELD (rid) keyword : 16;
   /* The kind of token.  */
   ENUM_BITFIELD (cpp_ttype) type : 8;
   /* If this token is a CPP_NAME, this value indicates whether also
  declared as some kind of type.  Otherwise, it is C_ID_NONE.  */
   ENUM_BITFIELD (c_id_kind) id_kind : 8;
-  /* If this token is a keyword, this value indicates which keyword.
- Otherwise, this value is RID_MAX.  */
-  ENUM_BITFIELD (rid) keyword : 8;
   /* If this token is a CPP_PRAGMA, this indicates the pragma that
  was seen.  Otherwise it is PRAGMA_NONE.  */
   ENUM_BITFIELD (pragma_kind) pragma_kind : 8;
diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h
index 6cbb9a8e031..3c3c482c6ce 100644
--- a/gcc/cp/parser.h
+++ b/gcc/cp/parser.h
@@ -40,11 +40,11 @@ struct GTY(()) tree_check {
 /* A C++ token.  */
 
 struct GTY (()) cp_token {
-  /* The kind of token.  */
-  enum cpp_ttype type : 8;
   /* If this token is a keyword, this value indicates which keyword.
  Otherwise, this value is RID_MAX.  */
-  enum rid keyword : 8;
+  enum rid keyword : 16;
+  /* The kind of token.  */
+  enum cpp_ttype type : 8;
   /* Token flags.  */
   unsigned char flags;
   /* True if this token is from a context where it is implicitly extern "C" */
@@ -101,8 +101,8 @@ struct GTY (()) cp_lexer {
   vec GTY ((skip)) saved_tokens;
 
   /* Saved pieces of end token we replaced with the eof token.  */
+  enum rid saved_keyword : 16;
   enum cpp_ttype saved_type : 8;
-  enum rid saved_keyword : 8;
 
   /* The next lexer in a linked list of lexers.  */
   struct cp_lexer *next;
diff --git a/libcpp/include/cpplib.h b/libcpp/include/cpplib.h
index fcdaf082b09..b93899cd364 100644
--- a/libcpp/include/cpplib.h
+++ b/libcpp/include/cpplib.h
@@ -988,7 +988,7 @@ struct GTY(()) cpp_hashnode {
   unsigned int directive_index : 7;/* If is_directive,
   then index into directive table.
   Otherwise, a NODE_OPERATOR.  */
-  unsigned int rid_code : 8;   /* Rid code - for front ends.  */
+  unsigned int rid_code : 16;  /* Rid code - for front ends.  */
   unsigned int flags : 9;  /* CPP flags.  */
   ENUM_BITFIELD(node_type) type : 2;   /* CPP node type.  */
 
-- 
2.42.0