Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
Em Mon, 26 Oct 2020 20:55:35 -0700 Joe Perches escreveu: > On Mon, 2020-10-26 at 08:03 +0100, Mauro Carvalho Chehab wrote: > [] > > Well, this can help: > > my $typedef_type = qr { ((?:\w+\s+){1,}) }x; > > unbounded captures are generally bad, I suggest a limit like {1,5} Ok. 5 is likely too low, if "*" starts to be counted as part of the type. Maybe 8 would be ok. > > > if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ > > || > > $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { > [] > > Fix the regex in order to accept composite types when > > defining a typedef for a function pointer. > [] > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc > [] > > @@ -1438,13 +1438,14 @@ sub dump_typedef($$) { > > $x =~ s@/\*.*?\*/@@gos;# strip comments. > > > > > > # Parse function prototypes > > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > - $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { > > +if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ > > || > > + $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { > > This typedef does not allow * returns like > > const unsigned char *(*string)(args...); > or > unsigned char *const(*fn)(args...); > or > void *(*alloc)(args...); Supporting those shouldn't be hard. See enclosed. > > (not to mention the truly unusual stuff like the typedefs in > tools/testing/selftests/bpf/progs/btf_dump_test_case_syntax.c) > > typedef void (* (*signal_t)(int, void (*)(int)))(int); > typedef char * (*fn_ptr_arr1_t[10])(int **); > typedef char * (* const (* const fn_ptr_arr2_t[5])())(char * (*)(int)); Parsing those using a single regex, though, is a lot more complex. The logic would likely require some loop or a real lexical analyzer in order to properly parse it. In the specific case of userspace tools (and, in special, selftests), it is probably not worth the effort to add support for C expressions that only exists there, as those won't likely gain kernel-doc entries for their source code to become part of the Kernel documentation. Thanks, Mauro [PATH] scripts: kernel-doc: fix typedef parsing The include/linux/genalloc.h file defined this typedef: typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned long size,unsigned long start,unsigned int nr,void *data, struct gen_pool *pool, unsigned long start_addr); Because it has a type composite of two words (unsigned long), the parser gets the typedef name wrong: .. c:macro:: long **Typedef**: Allocation callback function type definition Fix the regex in order to accept composite types when defining a typedef for a function pointer. Signed-off-by: Mauro Carvalho Chehab diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 99cd8418ff8a..f699cf05d409 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1431,20 +1431,25 @@ sub dump_enum($$) { } } +my $typedef_type = qr { ((?:\s+[\w\*]+){1,8})\s* }x; +my $typedef_ident = qr { \*?\s*(\w\S+)\s* }x; +my $typedef_args = qr { \s*\((.*)\); }x; + +my $typedef1 = qr { typedef$typedef_type\($typedef_ident\)$typedef_args }x; +my $typedef2 = qr { typedef$typedef_type$typedef_ident$typedef_args }x; + sub dump_typedef($$) { my $x = shift; my $file = shift; $x =~ s@/\*.*?\*/@@gos;# strip comments. -# Parse function prototypes -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || - $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { - - # Function typedefs +# Parse function typedef prototypes +if ($x =~ $typedef1 || $x =~ $typedef2) { $return_type = $1; $declaration_name = $2; my $args = $3; + $return_type =~ s/^\s+//; create_parameterlist($args, ',', $file, $declaration_name);
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
On Mon, 2020-10-26 at 08:03 +0100, Mauro Carvalho Chehab wrote: [] > Well, this can help: > my $typedef_type = qr { ((?:\w+\s+){1,}) }x; unbounded captures are generally bad, I suggest a limit like {1,5} > if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ || > $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { [] > Fix the regex in order to accept composite types when > defining a typedef for a function pointer. [] > diff --git a/scripts/kernel-doc b/scripts/kernel-doc [] > @@ -1438,13 +1438,14 @@ sub dump_typedef($$) { > $x =~ s@/\*.*?\*/@@gos; # strip comments. > > > # Parse function prototypes > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > - $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { > +if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ || > + $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { This typedef does not allow * returns like const unsigned char *(*string)(args...); or unsigned char *const(*fn)(args...); or void *(*alloc)(args...); (not to mention the truly unusual stuff like the typedefs in tools/testing/selftests/bpf/progs/btf_dump_test_case_syntax.c) typedef void (* (*signal_t)(int, void (*)(int)))(int); typedef char * (*fn_ptr_arr1_t[10])(int **); typedef char * (* const (* const fn_ptr_arr2_t[5])())(char * (*)(int));
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
Em Fri, 23 Oct 2020 11:01:35 -0700 Joe Perches escreveu: > On Fri, 2020-10-23 at 11:22 -0600, Jonathan Corbet wrote: > > On Fri, 23 Oct 2020 18:32:48 +0200 > > Mauro Carvalho Chehab wrote: > > > > > The include/linux/genalloc.h file defined this typedef: > > > > > > typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned > > > long size,unsigned long start,unsigned int nr,void *data, struct gen_pool > > > *pool, unsigned long start_addr); > [] > > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc > [] > > > # Parse function prototypes > > > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > > +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > > > > > > I sure wish we could find a way to make all these regexes more > > understandable and maintainable. Reviewing a change like this is ... fun. > > Perhaps using some of the checkpatch regex definitions like: > > $Type > $Ident > $balanced_parens > > would help improve readability. Well, this can help: my $typedef_type = qr { ((?:\w+\s+){1,}) }x; my $typedef_ident = qr { \*?\s*(\w\S+)\s* }x; my $typedef_args = qr { \s*\((.*)\); }x; my $typedef1 = qr { typedef\s+$typedef_type\($typedef_ident\)$typedef_args }x; my $typedef2 = qr { typedef\s+$typedef_type$typedef_ident$typedef_args }x; # Parse function typedef prototypes if ($x =~ $typedef1 || $x =~ $typedef2) { ... But, IMHO, this is as complicated as before, and makes harder to test the regex outside kernel_doc (like using regex101). A good thing is that it is easier to see the difference between the two typedef regexes. I'll place such optimization on a separate patch. This way, it should be easier to decide later if this is worth or not. Also, if we're willing to take such direction, it could make sense to use the same regexes for matching type, identifier and arguments inside the functions parser. > And the regex above doesn't quite work for spacing after typedef. > The regex should allow space between the open parenthesis and the * > > typedef ( * ) (args...); > > And this regex does not find typedefs that use another typedef as > like: > > arch/s390/include/asm/debug.h:typedef int (debug_header_proc_t) (debug_info_t > *id, True. I guess that, in order to properly handle it, we should use this: if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ || $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { The first check should now parse everything properly: https://regex101.com/r/bPTm18/5 And the second regex should also get multi-word types when parenthesis is not used, like: typedef unsigned int debug_header_proc_t (debug_info_t *id, struct debug_view *view, int area,debug_entry_t *entry, char *out_buf); typedef unsigned int *debug_header_proc_t (debug_info_t *id, struct debug_view *view, int area,debug_entry_t *entry, char *out_buf); https://regex101.com/r/Y56X1X/1 Thanks, Mauro [PATCH] scripts: kernel-doc: fix typedef parsing The include/linux/genalloc.h file defined this typedef: typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned long size,unsigned long start,unsigned int nr,void *data, struct gen_pool *pool, unsigned long start_addr); Because it has a type composite of two words (unsigned long), the parser gets the typedef name wrong: .. c:macro:: long **Typedef**: Allocation callback function type definition Fix the regex in order to accept composite types when defining a typedef for a function pointer. Signed-off-by: Mauro Carvalho Chehab diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 99cd8418ff8a..54832618eea0 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1438,13 +1438,14 @@ sub dump_typedef($$) { $x =~ s@/\*.*?\*/@@gos;# strip comments. # Parse function prototypes -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || - $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { +if ($x =~ /typedef\s+((?:\w+\s+){1,})\(\*?\s*(\w\S+)\s*\)\s*\((.*)\);/ || + $x =~ /typedef\s+((?:\w+\s+){1,})\s*\*?(\w\S+)\s*\s*\((.*)\);/) { # Function typedefs $return_type = $1; $declaration_name = $2; my $args = $3; + $return_type =~ s/\s+$//; create_parameterlist($args, ',', $file, $declaration_name);
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
Em Fri, 23 Oct 2020 11:22:26 -0600 Jonathan Corbet escreveu: > On Fri, 23 Oct 2020 18:32:48 +0200 > Mauro Carvalho Chehab wrote: > > > The include/linux/genalloc.h file defined this typedef: > > > > typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned > > long size,unsigned long start,unsigned int nr,void *data, struct gen_pool > > *pool, unsigned long start_addr); > > > > Because it has a type composite of two words (unsigned long), > > the parser gets the typedef name wrong: > > > > .. c:macro:: long > > > >**Typedef**: Allocation callback function type definition > > > > Fix the regex in order to accept composite types when > > defining a typedef for a function pointer. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > scripts/kernel-doc | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc > > index 99cd8418ff8a..311d213ee74d 100755 > > --- a/scripts/kernel-doc > > +++ b/scripts/kernel-doc > > @@ -1438,7 +1438,7 @@ sub dump_typedef($$) { > > $x =~ s@/\*.*?\*/@@gos;# strip comments. > > > > # Parse function prototypes > > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > I sure wish we could find a way to make all these regexes more > understandable and maintainable. Reviewing a change like this is ... fun. > > Anyway, it seems to work, but it does now include trailing whitespace in > the type portion. So, for example, from include/linux/xarray.h: > > typedef void (*xa_update_node_t)(struct xa_node *node); > > The type is parsed as "void " where it was "void" before. The only ill > effect I can see is that some non-breaking spaces get inserted into the > HTML output, but perhaps it's worth stripping off that trailing space > anyway? Yeah, this is one of the issues. There's another one, tough. While the above regex recognizes the typedef identifier, it only gets the last word of "unsigned long", in the case of something like: typedef unsigned long (*genpool_algo_t)(unsigned long *map); Here, we have no option but to use a hidden group, e. g. using this regex: typedef\s+((?:\w+\s*){1,})\(\*\s*(\w\S+)\s*\)\s*\((.*)\); I'm enclosing a second version with the above. Yeah, reviewing it is even funnier, but regex101 can be used to double-check what the regex is doing: https://regex101.com/r/bPTm18/2 Thanks, Mauro [PATCH] scripts: kernel-doc: fix typedef parsing The include/linux/genalloc.h file defined this typedef: typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned long size,unsigned long start,unsigned int nr,void *data, struct gen_pool *pool, unsigned long start_addr); Because it has a type composite of two words (unsigned long), the parser gets the typedef name wrong: .. c:macro:: long **Typedef**: Allocation callback function type definition Fix the regex in order to accept composite types when defining a typedef for a function pointer. Signed-off-by: Mauro Carvalho Chehab diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 99cd8418ff8a..b37f3cf8a331 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1438,13 +1438,14 @@ sub dump_typedef($$) { $x =~ s@/\*.*?\*/@@gos;# strip comments. # Parse function prototypes -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || +if ($x =~ /typedef\s+((?:\w+\s*){1,})\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { # Function typedefs $return_type = $1; $declaration_name = $2; my $args = $3; + $return_type =~ s/\s+$//; create_parameterlist($args, ',', $file, $declaration_name);
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
Em Fri, 23 Oct 2020 11:22:26 -0600 Jonathan Corbet escreveu: > On Fri, 23 Oct 2020 18:32:48 +0200 > Mauro Carvalho Chehab wrote: > > > The include/linux/genalloc.h file defined this typedef: > > > > typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned > > long size,unsigned long start,unsigned int nr,void *data, struct gen_pool > > *pool, unsigned long start_addr); > > > > Because it has a type composite of two words (unsigned long), > > the parser gets the typedef name wrong: > > > > .. c:macro:: long > > > >**Typedef**: Allocation callback function type definition > > > > Fix the regex in order to accept composite types when > > defining a typedef for a function pointer. > > > > Signed-off-by: Mauro Carvalho Chehab > > --- > > scripts/kernel-doc | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc > > index 99cd8418ff8a..311d213ee74d 100755 > > --- a/scripts/kernel-doc > > +++ b/scripts/kernel-doc > > @@ -1438,7 +1438,7 @@ sub dump_typedef($$) { > > $x =~ s@/\*.*?\*/@@gos;# strip comments. > > > > # Parse function prototypes > > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > I sure wish we could find a way to make all these regexes more > understandable and maintainable. Reviewing a change like this is ... fun. Yeah. Regexes can be take a while to check. Btw, there's a site that is really cool to check things: https://regex101.com/ Unfortunately, it doesn't support Perl flavor. So, you may still need to double-check if Perl will handle the regex the same way[1]. [1] One of the differences I found is with regards to match repetitions https://perldoc.perl.org/perlrequick#Matching-repetitions This works on both Python and Perl: (foo){0,2} But this only works on Python: (foo){,2} > > Anyway, it seems to work, but it does now include trailing whitespace in > the type portion. So, for example, from include/linux/xarray.h: > > typedef void (*xa_update_node_t)(struct xa_node *node); > > The type is parsed as "void " where it was "void" before. The only ill > effect I can see is that some non-breaking spaces get inserted into the > HTML output, but perhaps it's worth stripping off that trailing space > anyway? Ok, I'll work on a second version addressing it. Thanks, Mauro
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
On Fri, 2020-10-23 at 11:22 -0600, Jonathan Corbet wrote: > On Fri, 23 Oct 2020 18:32:48 +0200 > Mauro Carvalho Chehab wrote: > > > The include/linux/genalloc.h file defined this typedef: > > > > typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned > > long size,unsigned long start,unsigned int nr,void *data, struct gen_pool > > *pool, unsigned long start_addr); [] > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc [] > > # Parse function prototypes > > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > > I sure wish we could find a way to make all these regexes more > understandable and maintainable. Reviewing a change like this is ... fun. Perhaps using some of the checkpatch regex definitions like: $Type $Ident $balanced_parens would help improve readability. And the regex above doesn't quite work for spacing after typedef. The regex should allow space between the open parenthesis and the * typedef ( * ) (args...); And this regex does not find typedefs that use another typedef as like: arch/s390/include/asm/debug.h:typedef int (debug_header_proc_t) (debug_info_t *id,
Re: [PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
On Fri, 23 Oct 2020 18:32:48 +0200 Mauro Carvalho Chehab wrote: > The include/linux/genalloc.h file defined this typedef: > > typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned > long size,unsigned long start,unsigned int nr,void *data, struct gen_pool > *pool, unsigned long start_addr); > > Because it has a type composite of two words (unsigned long), > the parser gets the typedef name wrong: > > .. c:macro:: long > >**Typedef**: Allocation callback function type definition > > Fix the regex in order to accept composite types when > defining a typedef for a function pointer. > > Signed-off-by: Mauro Carvalho Chehab > --- > scripts/kernel-doc | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/scripts/kernel-doc b/scripts/kernel-doc > index 99cd8418ff8a..311d213ee74d 100755 > --- a/scripts/kernel-doc > +++ b/scripts/kernel-doc > @@ -1438,7 +1438,7 @@ sub dump_typedef($$) { > $x =~ s@/\*.*?\*/@@gos; # strip comments. > > # Parse function prototypes > -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || > +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || I sure wish we could find a way to make all these regexes more understandable and maintainable. Reviewing a change like this is ... fun. Anyway, it seems to work, but it does now include trailing whitespace in the type portion. So, for example, from include/linux/xarray.h: typedef void (*xa_update_node_t)(struct xa_node *node); The type is parsed as "void " where it was "void" before. The only ill effect I can see is that some non-breaking spaces get inserted into the HTML output, but perhaps it's worth stripping off that trailing space anyway? Thanks, jon
[PATCH v3 01/56] scripts: kernel-doc: fix typedef parsing
The include/linux/genalloc.h file defined this typedef: typedef unsigned long (*genpool_algo_t)(unsigned long *map,unsigned long size,unsigned long start,unsigned int nr,void *data, struct gen_pool *pool, unsigned long start_addr); Because it has a type composite of two words (unsigned long), the parser gets the typedef name wrong: .. c:macro:: long **Typedef**: Allocation callback function type definition Fix the regex in order to accept composite types when defining a typedef for a function pointer. Signed-off-by: Mauro Carvalho Chehab --- scripts/kernel-doc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/scripts/kernel-doc b/scripts/kernel-doc index 99cd8418ff8a..311d213ee74d 100755 --- a/scripts/kernel-doc +++ b/scripts/kernel-doc @@ -1438,7 +1438,7 @@ sub dump_typedef($$) { $x =~ s@/\*.*?\*/@@gos;# strip comments. # Parse function prototypes -if ($x =~ /typedef\s+(\w+)\s*\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || +if ($x =~ /typedef\s+(\w+\s*){1,}\(\*\s*(\w\S+)\s*\)\s*\((.*)\);/ || $x =~ /typedef\s+(\w+)\s*(\w\S+)\s*\s*\((.*)\);/) { # Function typedefs -- 2.26.2