Hi All, I am working on a C code parser, in which one of my requirements is to parse the C source and header files and calculate the Lines of Code. Though there are tools to do so, all of them have a problem wherein they treat function definitions and declarations with argument list on multiple lines as multiple lines rather than single line . For example - 1- /* Definition */ int Test ( int x, int y ) { } or /* Declaration */ int Test( int x, int y ); is treated as 5 and 3 lines respectively rather than 3 and 1 line, i.e it should be treated as
- 2 - /* Definition */ int Test ( int x, int y) { } /* Declaration */ int Test( int x, int y ); To fix this I plan to make some modifications in my perl tool. I wish to use RecDescent to parse the input file, identify such constucts and then use perl script to convert these multiple line construct into single line contructs, so if construct - 1 - is given as input to the script then the output should be - 2 -. I found a script by Damian Conway, Helmut Jarausch and Teodor Zlatanov which uses the RecDescent to seperate comments from the c code.(also attached with the mail) The grammar used is C_code : m{( [^"/]+ # one or more non-delimiters ( # then (optionally)... / # a potential comment delimiter [^*/] # which is not an actual delimiter )? # )+ # all repeated once or more }x { $Code .= $item[1] } comment : m{ \s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x { $Code .= "\n"; $Comments .= $item[1] } | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x { $Code .= " "; $Comments .= $item[1] } I want to use the same metodology but rather than seperating the comments from the C code I want to use a grammar to identify such constructs and if any such construct is found covert them into the required output. So please could someone help me with the grammar than can be used to identify the constructs and the way I can convert it into a single line. Thanks in advance, Regards Rahul Jain HCL Technologies Ltd.
#! /usr/bin/perl -w # stat-comments.pl by Teodor Zlatanov, t...@iglou.com # March 26, 2000 # A script to evaluate the readability of comments # embedded in C++. Utilizes code from demo-decomment.pl, # which is included with the Parse::RecDescent module. # Uses the Lingua::EN::Fathom module to evaluate text # readability. # ORIGINAL BY Helmut Jarausch # EXTENDED BY Damian Conway AND Helmut Jarausch # POLISHED BY Teodor Zlatanov use strict; use Parse::RecDescent; use Lingua::EN::Fathom; use vars qw/ $Grammar /; my $parser = new Parse::RecDescent $Grammar or die "invalid grammar"; undef $/; my $text = @ARGV ? <> : <DATA>; my $parts = $parser->program($text) or die "malformed C program"; # only work with comments of length > 0 die "No comments found in input" unless length $parts->{comments}; # convert every comment mark to a period, so separate comments are # separate sentences, if well-formed. Lingua::EN::Fathom is quite # good at figuring out what sentences are valid, so an extra period # in the text won't affect the overall counts. $parts->{comments} =~ s#//#. #g; $parts->{comments} =~ s#/\*#. #g; $parts->{comments} =~ s#\*/#. #g; # we can now evaluate the comments (stored in $parts->{comments}) my $fathom = new Lingua::EN::Fathom; $fathom->analyse_block($parts->{comments}); # voila, the readability report! print($fathom->report); BEGIN { $Grammar=<<'EOF'; program : <rulevar: local $WithinComment=0> program : <rulevar: local $Comments = ""> /this shouldn't be here :-/ program : <reject> program : <reject> /with prejudice/ program : <rulevar: local $Code = ""> program : <rulevar: local @Strings> program : <skip:''> part(s) { { code=>$Code, comments=>$Comments, strings=>[...@strings]} } part : comment | C_code | string C_code : m{( [^"/]+ # one or more non-delimiters ( # then (optionally)... / # a potential comment delimiter [^*/] # which is not an actual delimiter )? # )+ # all repeated once or more }x { $Code .= $item[1] } string : m{" # a leading delimiter (( # zero or more... \\. # escaped anything | # or [^"] # anything but a delimiter )* ) "}x { $Code .= $item[1]; push @Strings, $1 } comment : m{\s* # optional whitespace // # comment delimiter [^\n]* # anything except a newline \n # then a newline }x { $Code .= "\n"; $Comments .= $item[1] } | m{\s* # optional whitespace /\* # comment opener (?:[^*]+|\*(?!/))* # anything except */ \*/ # comment closer ([ \t]*)? # trailing blanks or tabs }x { $Code .= " "; $Comments .= $item[1] } EOF } __DATA__ program test; // for decomment // using Parse::RecDescent /* We should raise the indices quite a bit with this text section, because it will actually include sentences and structure. See, the problem with most C/C++ programs is that they use comments that are very short and convey little information. */ int main() { /* this should be removed */ char *cp1 = ""; char *cp2 = "cp2"; int i; // a counter // remove this line altogehter int k; int more_indented; // keep indentation int l; /* a loop variable */ // should be completely removed char *str = "/* ceci n'est pas un commentaire */"; return 0; }