Re: a simple formatter
Hi all, I believe, I solved (most of?) the problems in astyle. I am attaching the new version of the formatter.pl I test it only using the astyle version 1.21. This version does simpler hooks than the previous versions. The input filter: 1) Converts patterns unsigned int aparam:1; to unsigned int aparam__FORASTYLE__1; 2) Scans for #endif/#else/#if patterns and adds at input the line: //__ASTYLECOMMENT__\n The comment is important, just a simple newline will not work. Example 1: #endif /*A comment*/ { f++; converted to: #endif /*A comment*/ //__ASTYLECOMMENT__ { f++; Example 2: /* #endif A comment*/ { f++; converted to: /* #endif A comment*/ //__ASTYLECOMMENT__ { f++; The output filter just replaces the __FORASTYLE__ with a : and removes the line //__ASTYLECOMMENT__ Although I looked carefully without finding any bad formated code maybe there are still exists problems. Also I am attaching a simple script I used to check the formated code. This scripts removes any space, tabs and newlines from the original file and the formated file, and compare their md5 signatures.This script requires the tr and md5sum utilities. I test it only on a linux system. This script can not detect bad formated code but maybe can detect bad modifications (eg source code blocks removal) on source code. Running this script I found 5 files with different md5 signatures but all of these files had formating modifications similar to the following: From if (it) /*A comment*/ { ... } To if (it) { /*A comment*/ ... } Regards, Christos formater.pl Description: Perl program md5checker.sh Description: application/shellscript
Re: a simple formatter
On Thu, 2008-02-21 at 20:01 +0200, Tsantilas Christos wrote: I believe, I solved (most of?) the problems in astyle. I am attaching the new version of the formatter.pl I test it only using the astyle version 1.21. Christos, Thanks a lot for polishing this! Any objections to committing Christos' astyle wrapper and then applying it to HEAD and selected branches? If there are no objections, let's set a date for this. How about March, after 3.0.2 is released, with a final 7-day warning? Or is it better to reformat before releasing 3.0.2 so that it is easier to backport future fixes? Also I am attaching a simple script I used to check the formated code. This scripts removes any space, tabs and newlines from the original file and the formated file, and compare their md5 signatures.This script requires the tr and md5sum utilities. I test it only on a linux system. This script can not detect bad formated code but maybe can detect bad modifications (eg source code blocks removal) on source code. Running this script I found 5 files with different md5 signatures but all of these files had formating modifications similar to the following: From if (it) /*A comment*/ { ... } To if (it) { /*A comment*/ ... } Ha! Who would have thought that such transpositions are possible? I guess we need to strip not just whitespace, but comments as well (which is more difficult) for the MD5 test to work 100%. I would not spend time on that now. Thank you, Alex.
Re: a simple formatter
On Wed, 2008-02-13 at 20:35 +0200, Tsantilas Christos wrote: Alex Rousskov wrote: On Wed, 2008-02-13 at 13:52 +1300, Amos Jeffries wrote: Or is it possible to omit the src/tools.cc src/stat.cc files from astyle until they can be cleaned manually to work better. I bet there are other files with problems. I doubt Christos verified each and every source file. I was not found other problematic files but yes it is possible that there are other problematic files too. It is very difficult to examine all the code. Looking the diffs it is possible I am loosing thinks... Did you do the whitespace-free-MD5 test we talked about before? Thank you, Alex.
Re: a simple formatter
Alex Rousskov wrote: On Wed, 2008-02-13 at 20:35 +0200, Tsantilas Christos wrote: Alex Rousskov wrote: On Wed, 2008-02-13 at 13:52 +1300, Amos Jeffries wrote: Or is it possible to omit the src/tools.cc src/stat.cc files from astyle until they can be cleaned manually to work better. I bet there are other files with problems. I doubt Christos verified each and every source file. I was not found other problematic files but yes it is possible that there are other problematic files too. It is very difficult to examine all the code. Looking the diffs it is possible I am loosing thinks... Did you do the whitespace-free-MD5 test we talked about before? No I did not. This test can not detect most of the errors astyle does. Do you think will help?
Re: a simple formatter
On Fri, 2008-02-08 at 23:26 +0200, Tsantilas Christos wrote: Alex Rousskov wrote: Changes in the current code are only bad for outstanding patches and forgotten branches, right? If everybody applies the same formatting to all their branches and HEAD, we should not have many conflicts, right? I don't know, possibly it is OK. Or just format the HEAD, merge HEAD to branches (this will take some hours but...) and then apply formating to the branch. After the first time all will be OK. with CVS you will be much better off formatting all branches simultaneously, then merging. -Rob -- GPG key available at: http://www.robertcollins.net/keys.txt. signature.asc Description: This is a digitally signed message part
Re: a simple formatter
Tsantilas Christos wrote: On Fri, 2008-02-08 at 16:10 +1300, Amos Jeffries wrote: The same() operations used in those tests may be different from a same_MD5s() operation. Try inserting an empty line in a middle of a function and recompile. You will get a different MD5. Stripping the executable does not help in my tests.. Yes this is true. I tried exactly the same tests to evaluate the formatter, but they did not work. I believed that striping the code will solve the problem but nothing... In general I was not able to find a safe way to test it. I am just looking the diffs searching for bad formated code ... (Also I must note here that the bazaar repository was very-very helpful here, branches/revert/diff etc on local repositories are really fast, but OK it is an other story :-) ) ... Stripping empty lines fron sources does not help if the formatter moves brackets and such, changing the number of lines. Does our formatter do that? Yes it does. For example the code: void some_function(){ } convert to: void some_function() { } that would be desired behaviour though, yes? While I am away, the attached file is the set of test cases I have seen enough in the Squid-3 source to note as having bad syntax somehow (probably astyle previously). Along with a few I threw in just because the developer doc page described the syntax specially. If anyone has other please speak up and we'll get the lot testing properly. Amos -- Please use Squid 2.6STABLE17+ or 3.0STABLE1+ There are serious security advisories out on all earlier releases. /* A code-formating tool is wanted fro squid3 However teh squid sources use a number of syntax constructs which it must be capable of handling. This file is a dead-code file containing demos of those syntax which are known to be formatted badly at times It is to be used as the desired control to test formattign output against. */ /// structures with bit-fields struct t1 { unsigned int a:1; unsigned int b:1; unsigned int c:1; }; /// code-blocks inside definition protection #if SOMETHING { ; } #endif /// regular function declarations void main(int argc, char *argv[]) { ; } /// template class explicit Instantiation for some compilers template class ACLStrategisedHttpHeader*; /// global template Instances with macro definition template cbdata_type Listint::CBDATA_List;
Re: a simple formatter
Tested. I have made a (small) file of the structures I've noticed past formatting errors on. With a manual fix-up to what I think it should look like in readable C++ under the published squid formatting description. The output of your script does this: .. It is the --break-blocks option of the astyle. I think it can removed. Also, I noticed that the formatter will accept _anything_ given to it as a filename and create the files. Thats rather nasty when non-valid filename sequences are entered. ie --help OK fixed :-). Now accepts only files with .cc,.h,.cci and .c extensions and ignore all other files. I am re-sending the fixed script . In this version also I removed the --break-blocks astyle option. Formated code looks better without it. -- Christos #!/usr/bin/perl # # Author: Tsantilas Christos # email: [EMAIL PROTECTED] # # Distributed under the terms of the GNU General Public License as published # by the Free Software Foundation; either version 2 of the License, or # (at your option) any later version. # # The ldap_manager library is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU # Library General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program; if not, write to the Free Software # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA # # See LICENSE or http://www.gnu.org/licenses/gpl.html for details . # use IPC::Open2; $ASTYLE_BIN=/usr/bin/astyle; $ASTYLE_ARGS =--mode=c -s4 -O -l; #$ASTYLE_ARGS=--mode=c -s4 -O --break-blocks -l; #$ASTYLE_BIN=/usr/local/src/astyle-1.21/bin/astyle; if(! -e $ASTYLE_BIN){ print \nFile $ASTYLE_BIN not found\n; print Please fix the ASTYLE_BIN variable in this script!\n\n; exit -1; } $ASTYLE_BIN=$ASTYLE_BIN. .$ASTYLE_ARGS; $INDENT = ; $out = shift @ARGV; #read options, currently no options available while($out eq || $out =~ /^-\w+$/){ if($out eq -h) { usage($0); exit 0; } else { usage($0); exit -1; } } while($out){ if( $out !~ /\.cc$|\.cci$|\.h$|\.c$/) { print Unknown suffix for file $out, ignoring\n; $out = shift @ARGV; next; } $in= $out.astylebak; my($new_in) = $in; my($i) = 0; while(-e $new_in) { $new_in=$in...$i; $i++; } $in=$new_in; rename($out, $in); local (*FROM_ASTYLE, *TO_ASTYLE); $pid_style=open2(\*FROM_ASTYLE, \*TO_ASTYLE, $ASTYLE_BIN); if(!$pid_style){ print An error while open2\n; exit -1; } if($pid=fork()){ #do parrent staf close(FROM_ASTYLE); if(!open(IN, $in)){ print Can not open input file: $in\n; exit -1; } my($line) = ''; while(IN){ $line=$line.$_; if(input_filter(\$line)==0){ next; } print TO_ASTYLE $line; $line = ''; } if($line){ print TO_ASTYLE $line; } close(TO_ASTYLE); waitpid($pid,0); } else{ # child staf close(TO_ASTYLE); if(!open(OUT,$out)){ print Can't open output file: $out\n; exit -1; } my($line)=''; while(FROM_ASTYLE){ $line = $line.$_; if(output_filter(\$line)==0){ next; } print OUT $line; $line = ''; } if($line){ print OUT $line; } close(OUT); exit 0; } $out = shift @ARGV; } sub input_filter{ my($line)[EMAIL PROTECTED]; #if we have integer declaration, get it all before processing it.. if($$line =~/.*\s*int\s+.*/ ){ if(index($$line,;) == -1){ return 0; } if($$line =~ /(.*)\s*int\s+([^:]*):\s*(\w+)\s*\;(.*)/s){ # print .$$line.($1)\n; $$line= $1. int .$2.__FORASTYLE__.$3.;.$4; # print -.$$line.\n; } return 1; } return 1; } sub output_filter{ my($line)[EMAIL PROTECTED]; if($$line =~ /^\s*$/){ return 1; } #if line is not preprocessor directive keep indentation if($$line !~ /^(\s*)\#/){ $$line =~ /^(\s*)./; $INDENT = $1; # here we can handle only the case we have a one line C++/C command. # but looks enougth.. } # The #preprocessor_directive { case if($$line =~ /^(\s*)\#(.*){\s*$/ ){ $$line=$1.#.$2.\n.$INDENT.{\n; } # The { #preprocessor directive case if($$line =~ /^(\s*)\{(.*)#(if|else|endif)(.*)$/ ){ # print From :.$$line.\n; $$line= $1.{.$2.\n#.$3.$4.\n; # print --.$$line.\n; } # The unsigned int:1; case . $$line =~ s/__FORASTYLE__/:/; return 1; } sub usage{ my($name)[EMAIL PROTECTED]; print Usage:\n $name file1 file2 file3 \n; }
Re: a simple formatter
Tested. I have made a (small) file of the structures I've noticed past formatting errors on. With a manual fix-up to what I think it should look like in readable C++ under the published squid formatting description. The output of your script does this: .. It is the --break-blocks option of the astyle. I think it can removed. Also, I noticed that the formatter will accept _anything_ given to it as a filename and create the files. Thats rather nasty when non-valid filename sequences are entered. ie --help OK fixed :-). Now accepts only files with .cc,.h,.cci and .c extensions and ignore all other files. I am re-sending the fixed script . In this version also I removed the --break-blocks astyle option. Formated code looks better without it. Excellent. That dropit it down to: --- astyle-test-cases.cc2008-02-08 12:37:14.0 +1300 +++ astyle-test-cases.cc-original 2008-02-07 12:13:07.0 +1300 @@ -10,9 +10,9 @@ /// structures with bit-fields struct t1 { -unsigned int a:1; -unsigned int b:1; -unsigned int c:1; +unsigned int a:1; +unsigned int b:1; +unsigned int c:1; }; I suppose we could live with that. Though I think it may be a problem with the formatter.pl final replacement. Amos
Re: a simple formatter
On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote: Do we need something like that? Any comments/suggestions? Any testers? I believe we do and I appreciate you working on it! Please try to fix the remaining problems Amos pointed out. Also, can you apply it to the entire Squid tree, remove all whitespace, and calculate md5, comparing that with the virgin whitespace-less tree? The MD5s should be the same, right? This is not a perfect check because spaces within strings/etc are stripped and not checked, but it is a pretty good one. Thank you, Alex.
Re: a simple formatter
On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote: Do we need something like that? Any comments/suggestions? Any testers? I believe we do and I appreciate you working on it! Please try to fix the remaining problems Amos pointed out. Also, can you apply it to the entire Squid tree, remove all whitespace, and calculate md5, comparing that with the virgin whitespace-less tree? The MD5s should be the same, right? Oooh. Nice test. This is not a perfect check because spaces within strings/etc are stripped and not checked, but it is a pretty good one. It will also miss the #if 0 { block problem. Though my unit-test for that will catch it any any other whitsspace-exact case we care to write the test for. We may need a compile test for any other similar ones that have not had test-cases written. Just have to figure out the best way to have the test file built into the codebase for automatic testing. Amos
Re: a simple formatter
On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote: On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote: Do we need something like that? Any comments/suggestions? Any testers? I believe we do and I appreciate you working on it! Please try to fix the remaining problems Amos pointed out. Also, can you apply it to the entire Squid tree, remove all whitespace, and calculate md5, comparing that with the virgin whitespace-less tree? The MD5s should be the same, right? Oooh. Nice test. This is not a perfect check because spaces within strings/etc are stripped and not checked, but it is a pretty good one. It will also miss the #if 0 { block problem. Are there free C++ source code obfuscation programs? If they are guaranteed to generate the same source code regardless of formatting, then applying them would catch even more bugs. Too bad compilers produce different output for each execution due to timestamps and such. Perhaps there is a way to avoid that and compare md5s of squid executables? Cheers, Alex. P.S. A basic test file would be good to have anyway, so thank you for bootstrapping that.
Re: a simple formatter
On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote: On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote: Do we need something like that? Any comments/suggestions? Any testers? I believe we do and I appreciate you working on it! Please try to fix the remaining problems Amos pointed out. Also, can you apply it to the entire Squid tree, remove all whitespace, and calculate md5, comparing that with the virgin whitespace-less tree? The MD5s should be the same, right? Oooh. Nice test. This is not a perfect check because spaces within strings/etc are stripped and not checked, but it is a pretty good one. It will also miss the #if 0 { block problem. Are there free C++ source code obfuscation programs? If they are guaranteed to generate the same source code regardless of formatting, then applying them would catch even more bugs. Too bad compilers produce different output for each execution due to timestamps and such. What do you mean by this? A standard entry-level compiler test is that it produces the same output from the same input every time. Beginner students are taught that in the bootstrapping lessons: If it compiles its own code and produces a different binary, do it again until it stops or breaks. If it breaks you started with buggy code and still have work to do. Perhaps there is a way to avoid that and compare md5s of squid executables? Cheers, Alex. P.S. A basic test file would be good to have anyway, so thank you for bootstrapping that. I thought it would be helpful if you and Christos decided on an astyle formatter. Turned out right. Amos
a simple formatter
Hi all, I wrote a small perl script which fixes (some of) the astyle problems. Before use it, you must adjust the $ASTYLE_BIN variable at the beggining of the formatter.pl file. I am running it using the following command: # find . -name *.cc -exec formater.pl \{\} \; but it can take multiple files as arguments. I used it with astyle versions 1.18 and 1.21. I did not do extensive tests, so I am sure it contains bugs :-). Please use it with caution. I tried to implement Alex suggestions. For the problem discussion look at the thread: http://www.squid-cache.org/mail-archive/squid-dev/200801/0019.html This script starts the astyle and attach one filter in its input and one in its output, using pipes. The input filter search for unsigned int X:1; cases and convert them to unsigned int X__FORASTYLE__1;. The output filter search for unsigned int X__FORASTYLE__1; patterns and convert them back to the original text. Also it search for the cases #preprocessor directive { or { #preprocessor_directive and fix them. Do we need something like that? Any comments/suggestions? Any testers? Regards, Christos formater.pl Description: Perl program
Re: a simple formatter
Hi all, I wrote a small perl script which fixes (some of) the astyle problems. Before use it, you must adjust the $ASTYLE_BIN variable at the beggining of the formatter.pl file. I am running it using the following command: # find . -name *.cc -exec formater.pl \{\} \; but it can take multiple files as arguments. I used it with astyle versions 1.18 and 1.21. I did not do extensive tests, so I am sure it contains bugs :-). Please use it with caution. I tried to implement Alex suggestions. For the problem discussion look at the thread: http://www.squid-cache.org/mail-archive/squid-dev/200801/0019.html This script starts the astyle and attach one filter in its input and one in its output, using pipes. The input filter search for unsigned int X:1; cases and convert them to unsigned int X__FORASTYLE__1;. The output filter search for unsigned int X__FORASTYLE__1; patterns and convert them back to the original text. Also it search for the cases #preprocessor directive { or { #preprocessor_directive and fix them. Do we need something like that? Any comments/suggestions? Any testers? Regards, Christos Tested. I have made a (small) file of the structures I've noticed past formatting errors on. With a manual fix-up to what I think it should look like in readable C++ under the published squid formatting description. The output of your script does this: --- astyle-test-cases.cc-original 2008-02-07 12:13:07.0 +1300 +++ astyle-test-cases.cc2008-02-07 12:19:14.0 +1300 @@ -9,10 +9,11 @@ */ /// structures with bit-fields + struct t1 { -unsigned int a:1; -unsigned int b:1; -unsigned int c:1; +unsigned int a:1; +unsigned int b:1; +unsigned int c:1; }; /// code-blocks inside definition protection @@ -20,6 +21,7 @@ { ; } + #endif /// regular function declarations @@ -30,6 +32,7 @@ } /// template class explicit Instantiation for some compilers + template class ACLStrategisedHttpHeader*; /// global template Instances with macro definition NP: breaking the auto-docs away from the documented object is not that good a thing to do. Amos