Re: a simple formatter

2008-02-21 Thread Tsantilas Christos
Hi all,
  I believe, I solved (most of?) the problems in astyle.
I am attaching the new version of the formatter.pl
I test it only using the astyle version 1.21.

This version does simpler hooks than the previous versions.
The input filter:
 1) Converts patterns
unsigned int aparam:1;
  to
unsigned int aparam__FORASTYLE__1;

 2) Scans for #endif/#else/#if patterns and adds at input the line:
  //__ASTYLECOMMENT__\n
   The comment is important, just a simple newline will not work.

   Example 1:
   #endif /*A comment*/
   {
   f++;

 converted to:

   #endif /*A comment*/
   //__ASTYLECOMMENT__
   {
   f++;

   Example 2:
   /* #endif A comment*/
   {
   f++;

 converted to:

   /* #endif A comment*/
   //__ASTYLECOMMENT__
   {
   f++;

The output filter just replaces the __FORASTYLE__ with a  :  and
removes the line //__ASTYLECOMMENT__

Although I looked carefully without finding any bad formated code maybe
there are still exists problems.

 Also I am attaching a simple script I used to check the formated code.
This scripts removes any space, tabs and newlines from the original file
and the formated file, and compare their md5 signatures.This script
requires the tr and md5sum utilities. I test it only on a linux system.

This script can not detect bad formated code but maybe can detect bad
modifications (eg source code blocks removal) on source code.

Running this script I found 5 files with different md5 signatures but
all of these files had formating modifications similar to the following:
From
if (it) /*A comment*/
{
 ...
}
To
if (it) { /*A comment*/
 ...
}

Regards,
   Christos


formater.pl
Description: Perl program


md5checker.sh
Description: application/shellscript


Re: a simple formatter

2008-02-21 Thread Alex Rousskov
On Thu, 2008-02-21 at 20:01 +0200, Tsantilas Christos wrote:

   I believe, I solved (most of?) the problems in astyle.
 I am attaching the new version of the formatter.pl
 I test it only using the astyle version 1.21.

Christos,

Thanks a lot for polishing this!

Any objections to committing Christos' astyle wrapper and then
applying it to HEAD and selected branches? If there are no objections,
let's set a date for this. How about March, after 3.0.2 is released,
with a final 7-day warning?

Or is it better to reformat before releasing 3.0.2 so that it is easier
to backport future fixes?

 Also I am attaching a simple script I used to check the formated code.
 This scripts removes any space, tabs and newlines from the original file
 and the formated file, and compare their md5 signatures.This script
 requires the tr and md5sum utilities. I test it only on a linux system.
 
 This script can not detect bad formated code but maybe can detect bad
 modifications (eg source code blocks removal) on source code.
 
 Running this script I found 5 files with different md5 signatures but
 all of these files had formating modifications similar to the following:
 From
 if (it) /*A comment*/
 {
  ...
 }
 To
 if (it) { /*A comment*/
  ...
 }

Ha! Who would have thought that such transpositions are possible? I
guess we need to strip not just whitespace, but comments as well (which
is more difficult) for the MD5 test to work 100%. I would not spend time
on that now.

Thank you,

Alex.




Re: a simple formatter

2008-02-13 Thread Alex Rousskov
On Wed, 2008-02-13 at 20:35 +0200, Tsantilas Christos wrote:
 Alex Rousskov wrote:
  On Wed, 2008-02-13 at 13:52 +1300, Amos Jeffries wrote:
  
  Or is it possible to omit the src/tools.cc src/stat.cc files from astyle
  until they can be cleaned manually to work better.
  
  I bet there are other files with problems. I doubt Christos verified
  each and every source file.
  
 I was not found other problematic files but yes it is possible that
 there are other problematic files too. It is very difficult to examine
 all the code. Looking the diffs it is possible I am loosing thinks...

Did you do the whitespace-free-MD5 test we talked about before?

Thank you,

Alex.




Re: a simple formatter

2008-02-13 Thread Tsantilas Christos
Alex Rousskov wrote:
 On Wed, 2008-02-13 at 20:35 +0200, Tsantilas Christos wrote:
 Alex Rousskov wrote:
 On Wed, 2008-02-13 at 13:52 +1300, Amos Jeffries wrote:

 Or is it possible to omit the src/tools.cc src/stat.cc files from astyle
 until they can be cleaned manually to work better.
 I bet there are other files with problems. I doubt Christos verified
 each and every source file.

 I was not found other problematic files but yes it is possible that
 there are other problematic files too. It is very difficult to examine
 all the code. Looking the diffs it is possible I am loosing thinks...
 
 Did you do the whitespace-free-MD5 test we talked about before?


No I did not. This test can not detect most of the errors astyle does.
Do you think will help?



Re: a simple formatter

2008-02-11 Thread Robert Collins

On Fri, 2008-02-08 at 23:26 +0200, Tsantilas Christos wrote:
 Alex Rousskov wrote:
  
  Changes in the current code are only bad for outstanding patches and
  forgotten branches, right? If everybody applies the same formatting to
  all their branches and HEAD, we should not have many conflicts, right?
 
 I don't know, possibly it is OK.
 Or just format the HEAD, merge HEAD to branches (this will take some
 hours but...) and then apply formating to the branch. After the first
 time all will be OK.

with CVS you will be much better off formatting all branches
simultaneously, then merging.

-Rob
-- 
GPG key available at: http://www.robertcollins.net/keys.txt.


signature.asc
Description: This is a digitally signed message part


Re: a simple formatter

2008-02-08 Thread Amos Jeffries

Tsantilas Christos wrote:

On Fri, 2008-02-08 at 16:10 +1300, Amos Jeffries wrote:

The same() operations used in those tests may be different from a
same_MD5s() operation.

Try inserting an empty line in a middle of a function and recompile. You
will get a different MD5. Stripping the executable does not help in my
tests..


Yes this is true. I tried exactly  the same tests  to evaluate  the
formatter, but they did not work.  I believed that striping the code will
solve the problem but nothing...
In general I was not able to find a safe way to test it. I am just looking
the diffs searching for  bad formated code ...
(Also I must note here that the bazaar repository was very-very  helpful
here, branches/revert/diff etc on local repositories are really fast, but
OK it is an other story :-) )


... Stripping empty lines fron sources does not help if

the formatter

moves brackets and such, changing the number of lines. Does our
formatter do that?


Yes it does. For example the code:
  void some_function(){
  }
convert to:
  void some_function()
 {
  }


that would be desired behaviour though, yes?

While I am away, the attached file is the set of test cases I have seen 
enough in the Squid-3 source to note as having bad syntax somehow 
(probably astyle previously). Along with a few I threw in just because 
the developer doc page described the syntax specially.


If anyone has other please speak up and we'll get the lot testing properly.

Amos
--
Please use Squid 2.6STABLE17+ or 3.0STABLE1+
There are serious security advisories out on all earlier releases.
/*
  A code-formating tool is wanted fro squid3

  However teh squid sources use a number of syntax constructs which it must be 
capable of handling.

  This file is a dead-code file containing demos of those syntax which are 
known to be formatted badly at times

  It is to be used as the desired control to test formattign output against.
*/

/// structures with bit-fields
struct t1 {
unsigned int a:1;
unsigned int b:1;
unsigned int c:1;
};

/// code-blocks inside definition protection
#if SOMETHING
{
;
}
#endif

/// regular function declarations
void
main(int argc, char *argv[])
{
;
}

/// template class explicit Instantiation for some compilers
template class ACLStrategisedHttpHeader*;

/// global template Instances with macro definition
template cbdata_type Listint::CBDATA_List;



Re: a simple formatter

2008-02-07 Thread Tsantilas Christos

 Tested.

 I have made a (small) file of the structures I've noticed past formatting
 errors on. With a manual fix-up to what I think it should look like in
 readable C++ under the published squid formatting description.

 The output of your script does this:
 ..

It is the --break-blocks option of the astyle. I think it can removed.

 Also, I noticed that the formatter will accept _anything_ given to it as a
 filename and create the files. Thats rather nasty when non-valid filename
 sequences are entered. ie --help


OK fixed :-).  Now accepts only files with .cc,.h,.cci and .c extensions
and ignore all other files.
I am re-sending the fixed script . In this version also I removed the
--break-blocks astyle option. Formated code looks better without it.

--
  Christos

#!/usr/bin/perl
#
# Author: Tsantilas Christos
# email:  [EMAIL PROTECTED]
#
# Distributed under the terms of the GNU General Public License as published
# by the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# The ldap_manager library is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
# Library General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
#
# See LICENSE or http://www.gnu.org/licenses/gpl.html for details .
#


use IPC::Open2;

$ASTYLE_BIN=/usr/bin/astyle;
$ASTYLE_ARGS =--mode=c -s4 -O -l;

#$ASTYLE_ARGS=--mode=c -s4 -O --break-blocks -l;
#$ASTYLE_BIN=/usr/local/src/astyle-1.21/bin/astyle;



if(! -e $ASTYLE_BIN){
print \nFile $ASTYLE_BIN not found\n;
print Please fix the ASTYLE_BIN variable in this script!\n\n;
exit -1;
}
$ASTYLE_BIN=$ASTYLE_BIN. .$ASTYLE_ARGS;

$INDENT = ;

$out = shift @ARGV;
#read options, currently no options available
while($out eq  ||  $out =~ /^-\w+$/){
   if($out eq -h) {
usage($0);
exit 0;
   }
   else {
   usage($0);
   exit -1;
   }
}


while($out){

if( $out !~ /\.cc$|\.cci$|\.h$|\.c$/) {
 print Unknown suffix for file $out, ignoring\n;
 $out = shift @ARGV;
 next;
}

$in= $out.astylebak;
my($new_in) = $in;
my($i) = 0;
while(-e $new_in) {
	$new_in=$in...$i;
	$i++;
}
$in=$new_in;
rename($out, $in);

local (*FROM_ASTYLE, *TO_ASTYLE);
$pid_style=open2(\*FROM_ASTYLE, \*TO_ASTYLE, $ASTYLE_BIN);

if(!$pid_style){
	print An error while open2\n;
	exit -1;
}


if($pid=fork()){
	#do parrent staf
	close(FROM_ASTYLE);
	
	if(!open(IN, $in)){
	print Can not open input file: $in\n;
	exit -1;
	}
	my($line) = '';
	while(IN){
	$line=$line.$_;
	if(input_filter(\$line)==0){
		next;
	}
	print TO_ASTYLE $line;
	$line = '';
	}
	if($line){
	print TO_ASTYLE $line;
	}
	close(TO_ASTYLE);
	waitpid($pid,0);
}
else{
	# child staf
	close(TO_ASTYLE);
	
	if(!open(OUT,$out)){
	print Can't open output file: $out\n;
	exit -1;
	}
	my($line)='';
	while(FROM_ASTYLE){
	$line = $line.$_;
	if(output_filter(\$line)==0){
		next;
	}
	print OUT $line;
	$line = '';
	}
	if($line){
	print OUT $line;
	}
	close(OUT);
	exit 0;
}

$out = shift @ARGV;
}


sub input_filter{
my($line)[EMAIL PROTECTED];
 #if we have integer declaration, get it all before processing it..
if($$line =~/.*\s*int\s+.*/  ){ 
	if(index($$line,;) == -1){
	return 0;
	}
	if($$line =~ /(.*)\s*int\s+([^:]*):\s*(\w+)\s*\;(.*)/s){
#	print  .$$line.($1)\n;
	$$line= $1. int .$2.__FORASTYLE__.$3.;.$4;
#	print -.$$line.\n;
	}
	return 1;
}

return 1;
}

sub output_filter{
my($line)[EMAIL PROTECTED];

if($$line =~ /^\s*$/){
	return 1;
}

#if line is not preprocessor directive keep indentation
if($$line !~  /^(\s*)\#/){
	$$line =~ /^(\s*)./;
	$INDENT = $1;
# here we can handle only the case we have a one line C++/C command.
# but looks enougth..
}

#   The  #preprocessor_directive { case
 if($$line =~ /^(\s*)\#(.*){\s*$/ ){
	$$line=$1.#.$2.\n.$INDENT.{\n;
}
	
#   The  { #preprocessor directive case
if($$line =~ /^(\s*)\{(.*)#(if|else|endif)(.*)$/ ){
#  print From :.$$line.\n;
  $$line= $1.{.$2.\n#.$3.$4.\n;
#  print --.$$line.\n;
}

   # The unsigned int:1; case .
$$line =~ s/__FORASTYLE__/:/;

return 1;
}

sub usage{
my($name)[EMAIL PROTECTED];
print Usage:\n   $name file1 file2 file3 \n;
}

Re: a simple formatter

2008-02-07 Thread Amos Jeffries

 Tested.

 I have made a (small) file of the structures I've noticed past
 formatting
 errors on. With a manual fix-up to what I think it should look like in
 readable C++ under the published squid formatting description.

 The output of your script does this:
 ..

 It is the --break-blocks option of the astyle. I think it can removed.

 Also, I noticed that the formatter will accept _anything_ given to it as
 a
 filename and create the files. Thats rather nasty when non-valid
 filename
 sequences are entered. ie --help


 OK fixed :-).  Now accepts only files with .cc,.h,.cci and .c extensions
 and ignore all other files.
 I am re-sending the fixed script . In this version also I removed the
 --break-blocks astyle option. Formated code looks better without it.


Excellent. That dropit it down to:

--- astyle-test-cases.cc2008-02-08 12:37:14.0 +1300
+++ astyle-test-cases.cc-original   2008-02-07 12:13:07.0 +1300
@@ -10,9 +10,9 @@

 /// structures with bit-fields
 struct t1 {
-unsigned  int a:1;
-unsigned  int b:1;
-unsigned  int c:1;
+unsigned int a:1;
+unsigned int b:1;
+unsigned int c:1;
 };


I suppose we could live with that. Though I think it may be a problem with
the formatter.pl final replacement.

Amos



Re: a simple formatter

2008-02-07 Thread Alex Rousskov
On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
 Do we need something like that? Any comments/suggestions? Any testers?

I believe we do and I appreciate you working on it! Please try to fix
the remaining problems Amos pointed out.

Also, can you apply it to the entire Squid tree, remove all whitespace,
and calculate md5, comparing that with the virgin whitespace-less tree?
The MD5s should be the same, right?

This is not a perfect check because spaces within strings/etc are
stripped and not checked, but it is a pretty good one.

Thank you,

Alex.




Re: a simple formatter

2008-02-07 Thread Amos Jeffries
 On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
 Do we need something like that? Any comments/suggestions? Any testers?

 I believe we do and I appreciate you working on it! Please try to fix
 the remaining problems Amos pointed out.

 Also, can you apply it to the entire Squid tree, remove all whitespace,
 and calculate md5, comparing that with the virgin whitespace-less tree?
 The MD5s should be the same, right?

Oooh. Nice test.


 This is not a perfect check because spaces within strings/etc are
 stripped and not checked, but it is a pretty good one.

It will also miss the #if 0 { block problem.
Though my unit-test for that will catch it any any other whitsspace-exact
case we care to write the test for.

We may need a compile test for any other similar ones that have not had
test-cases written.

Just have to figure out the best way to have the test file built into the
codebase for automatic testing.

Amos



Re: a simple formatter

2008-02-07 Thread Alex Rousskov
On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote:
  On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
  Do we need something like that? Any comments/suggestions? Any testers?
 
  I believe we do and I appreciate you working on it! Please try to fix
  the remaining problems Amos pointed out.
 
  Also, can you apply it to the entire Squid tree, remove all whitespace,
  and calculate md5, comparing that with the virgin whitespace-less tree?
  The MD5s should be the same, right?
 
 Oooh. Nice test.
 
 
  This is not a perfect check because spaces within strings/etc are
  stripped and not checked, but it is a pretty good one.
 
 It will also miss the #if 0 { block problem.

Are there free C++ source code obfuscation programs? If they are
guaranteed to generate the same source code regardless of formatting,
then applying them would catch even more bugs.

Too bad compilers produce different output for each execution due to
timestamps and such. Perhaps there is a way to avoid that and compare
md5s of squid executables?

Cheers,

Alex.
P.S. A basic test file would be good to have anyway, so thank you for
bootstrapping that.



Re: a simple formatter

2008-02-07 Thread Amos Jeffries
 On Fri, 2008-02-08 at 14:31 +1300, Amos Jeffries wrote:
  On Wed, 2008-02-06 at 21:26 +0200, Tsantilas Christos wrote:
  Do we need something like that? Any comments/suggestions? Any
 testers?
 
  I believe we do and I appreciate you working on it! Please try to fix
  the remaining problems Amos pointed out.
 
  Also, can you apply it to the entire Squid tree, remove all
 whitespace,
  and calculate md5, comparing that with the virgin whitespace-less
 tree?
  The MD5s should be the same, right?

 Oooh. Nice test.

 
  This is not a perfect check because spaces within strings/etc are
  stripped and not checked, but it is a pretty good one.

 It will also miss the #if 0 { block problem.

 Are there free C++ source code obfuscation programs? If they are
 guaranteed to generate the same source code regardless of formatting,
 then applying them would catch even more bugs.

 Too bad compilers produce different output for each execution due to
 timestamps and such.

What do you mean by this? A standard entry-level compiler test is that it
produces the same output from the same input every time.

Beginner students are taught that in the bootstrapping lessons: If it
compiles its own code and produces a different binary, do it again until
it stops or breaks. If it breaks you started with buggy code and still
have work to do.

 Perhaps there is a way to avoid that and compare
 md5s of squid executables?

 Cheers,

 Alex.
 P.S. A basic test file would be good to have anyway, so thank you for
 bootstrapping that.

I thought it would be helpful if you and Christos decided on an astyle
formatter. Turned out right.

Amos



a simple formatter

2008-02-06 Thread Tsantilas Christos
Hi all,

I wrote a small perl script which fixes (some of) the astyle problems.

Before use it, you must adjust the $ASTYLE_BIN variable at the beggining
of the formatter.pl file.

I am running it using the following command:
  # find . -name *.cc  -exec formater.pl \{\} \;
but it can take multiple files as arguments.

I used it with astyle versions 1.18 and 1.21. I did not do extensive
tests, so I am sure it contains bugs :-).  Please use it with caution.

I tried to implement Alex suggestions.  For the problem discussion look
at the thread:
http://www.squid-cache.org/mail-archive/squid-dev/200801/0019.html

This script starts the astyle and attach one filter in its input and one
in its output, using pipes.

The input filter search for unsigned int X:1; cases and convert
them to  unsigned int X__FORASTYLE__1;.

The output filter search for unsigned int X__FORASTYLE__1;
patterns and convert them back to the original text.
Also it search for the cases #preprocessor directive { or {
#preprocessor_directive and fix them.


Do we need something like that? Any comments/suggestions? Any testers?

Regards,
Christos



formater.pl
Description: Perl program


Re: a simple formatter

2008-02-06 Thread Amos Jeffries
 Hi all,

 I wrote a small perl script which fixes (some of) the astyle problems.

 Before use it, you must adjust the $ASTYLE_BIN variable at the beggining
 of the formatter.pl file.

 I am running it using the following command:
   # find . -name *.cc  -exec formater.pl \{\} \;
 but it can take multiple files as arguments.

 I used it with astyle versions 1.18 and 1.21. I did not do extensive
 tests, so I am sure it contains bugs :-).  Please use it with caution.

 I tried to implement Alex suggestions.  For the problem discussion look
 at the thread:
 http://www.squid-cache.org/mail-archive/squid-dev/200801/0019.html

 This script starts the astyle and attach one filter in its input and one
 in its output, using pipes.

 The input filter search for unsigned int X:1; cases and convert
 them to  unsigned int X__FORASTYLE__1;.

 The output filter search for unsigned int X__FORASTYLE__1;
 patterns and convert them back to the original text.
 Also it search for the cases #preprocessor directive { or {
 #preprocessor_directive and fix them.


 Do we need something like that? Any comments/suggestions? Any testers?

 Regards,
 Christos


Tested.

I have made a (small) file of the structures I've noticed past formatting
errors on. With a manual fix-up to what I think it should look like in
readable C++ under the published squid formatting description.

The output of your script does this:
--- astyle-test-cases.cc-original   2008-02-07 12:13:07.0 +1300
+++ astyle-test-cases.cc2008-02-07 12:19:14.0 +1300
@@ -9,10 +9,11 @@
 */

 /// structures with bit-fields
+
 struct t1 {
-unsigned int a:1;
-unsigned int b:1;
-unsigned int c:1;
+unsigned  int a:1;
+unsigned  int b:1;
+unsigned  int c:1;
 };

 /// code-blocks inside definition protection
@@ -20,6 +21,7 @@
 {
 ;
 }
+
 #endif

 /// regular function declarations
@@ -30,6 +32,7 @@
 }

 /// template class explicit Instantiation for some compilers
+
 template class ACLStrategisedHttpHeader*;

 /// global template Instances with macro definition





NP: breaking the auto-docs away from the documented object is not that
good a thing to do.

Amos