On Tue, Jun 21, 2011 at 08:27:45PM -0400, Andrew Dunstan wrote:
> 
> Attached is a WIP possible replacement for pgindent. Instead of a
> shell script invoking a mishmash of awk and sed, some of which is
> pretty impenetrable, it uses a single engine (perl) to do all the
> pre and post indent processing. Of course, if your regex-fu and
> perl-fu is not up the scratch this too might be impenetrable, but
> all but a couple of the recipes are reduced to single lines, and I'd
> argue that they are all at least as comprehensible as what they
> replace.
> 
> Attached also is a diff file showing what it does differently from
> the existing script. I think that these are all things where the new
> script is more correct than the existing script. Most of the changes
> come into two categories:
> 
>    * places where the existing script fails to combine the function
>      return type and the function name on a single line in function
>      prototypes.
>    * places where unwanted blank lines are removed by the new script
>      but not by the existing script.
> 
> Features include:
> 
>    * command line compatibility with the existing script, so you can do:
>      find ../../.. -name '*.[ch]' -type f -print | egrep -v -f
>      exclude_file_patterns | xargs -n100 ./pgindent.pl typedefs.list
>    * a new way of doing the same thing much more nicely:
>      ./pgindent.pl --search-base=../../.. --typedefs=typedefs.list
>      --excludes=exclude_file_patterns
>    * only passes relevant typedefs to indent, not the whole huge list
>    * should in principle be runnable on Windows, unlike existing script
>      (I haven't tested yet)
>    * no semantic tab literals; tabs are only generated using \t and
>      tested for using \t, \h or \s as appropriate. This makes debugging
>      the script much less frustrating. If something looks like a space
>      it should be a space.
> 
> In one case I used perl's extended regex mode to comment a fairly
> hairy regex. This should probably be done a bit more, maybe for all
> of them.
> 
> If anybody is so inclined, this could be used as a basis for
> removing the use of bsd indent altogether, as has been suggested
> before, as well as external entab/detab.

Thirteen months after Andrew posted this WIP, I have restructured and
tested this code, and it is now ready to replace the pgindent shell
script as pgindent.pl, attached.

I have tested this version by re-running the 9.1 and 9.2 pgindent runs
and comparing the output, and it is just like Andrew said --- it is the
same, except for the two improvements he mentioned.

A Perl version of pgindent has several advantages:

*  more portable;  less dependent on utility command variances
*  able to run on Windows, assuming someone makes entab and
   pg_bsd_indent Windows binaries
*  able to fix more limitations of pgindent

I will add documentation about the arguments.

Many thanks to Andrew for his fine work on this.  Any objections?

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + It's impossible for everything to be true. +
#!/usr/bin/perl

use strict;
use warnings;

use Cwd qw(abs_path getcwd);
use File::Find;
use File::Spec qw(devnull);
use File::Temp;
use IO::Handle;
use Getopt::Long;
use Readonly;

# Update for pg_bsd_indent version
Readonly my $INDENT_VERSION => "1.1";
Readonly my $devnull        => File::Spec->devnull;

# Common indent settings
my $indent_opts =
  "-bad -bap -bc -bl -d0 -cdb -nce -nfc1 -di12 -i4 -l79 -lp -nip -npro -bbb";

# indent-dependant settings
my $extra_opts = "";

my ($typedefs_file, $code_base, $excludes, $indent, $build);

my %options = (
        "typedefs=s"  => \$typedefs_file,
        "code-base=s" => \$code_base,
        "excludes=s"  => \$excludes,
        "indent=s"    => \$indent,
        "build"       => \$build,);
GetOptions(%options) || die "bad command line";

run_build($code_base) if ($build);

# command line option wins, then first non-option arg,
# then environment (which is how --build sets it) ,
# then locations. based on current dir, then default location
$typedefs_file ||= shift if @ARGV && $ARGV[0] !~ /\\.[ch]$/;
$typedefs_file ||= $ENV{PGTYPEDEFS};

# build mode sets PGINDENT and PGENTAB
$indent ||= $ENV{PGINDENT} || $ENV{INDENT} || "pg_bsd_indent";
my $entab = $ENV{PGENTAB} || "entab";

# no non-option arguments given. so do everything in the current directory
$code_base ||= '.' unless @ARGV;

# if it's the base of a postgres tree, we will exclude the files
# postgres wants excluded
$excludes ||= "$code_base/src/tools/pgindent/exclude_file_patterns"
  if $code_base && -f "$code_base/src/tools/pgindent/exclude_file_patterns";

# globals
my @files;
my $filtered_typedefs_fh;


sub check_indent
{
        system("entab < $devnull");
        if ($?)
        {
                print STDERR
"Go to the src/tools/entab directory and do 'make' and 'make install'.\n",
                  "This will put the 'entab' command in your path.\n",
                  "Then run $0 again.\n";
                exit 1;
        }

        system("$indent -? < $devnull > $devnull 2>&1");
        if ($? >> 8 != 1)
        {
                print STDERR
                  "You do not appear to have 'indent' installed on your 
system.\n";
                exit 1;
        }

        if (`$indent -V` !~ m/ $INDENT_VERSION$/)
        {
                print STDERR
"You do not appear to have $indent version $INDENT_VERSION installed on your 
system.\n";
                exit 1;
        }

        system("$indent -gnu < $devnull > $devnull 2>&1");
        if ($? == 0)
        {
                print STDERR
                  "You appear to have GNU indent rather than BSD indent.\n",
                  "See the pgindent/README file for a description of its 
problems.\n";
                $extra_opts = "-cdb -bli0 -npcs -cli4 -sc";
        }
        else
        {
                $extra_opts = "-cli1";
        }
}


sub load_typedefs
{
        # try fairly hard to find the typedefs file if it's not set

        foreach my $try ('.', 'src/tools/pgindent', '/usr/local/etc')
        {
                $typedefs_file ||= "$try/typedefs.list"
                  if (-f "$try/typedefs.list");
        }

        # try to find typedefs by moving up directory levels
        my $tdtry = "..";
        foreach (1 .. 5)
        {
                $typedefs_file ||= "$tdtry/src/tools/pgindent/typedefs.list"
                  if (-f "$tdtry/src/tools/pgindent/typedefs.list");
                $tdtry = "$tdtry/..";
        }
        die "no typedefs file" unless $typedefs_file && -f $typedefs_file;

        open(my $typedefs_fh, '<', $typedefs_file) || die "opening 
$typedefs_file: $!";
        my @typedefs = <$typedefs_fh>;
        close($typedefs_fh);

        # remove certain entries
        @typedefs = grep { ! m/^(FD_SET|date|interval|timestamp|ANY)\n?$/ } 
@typedefs;

        # write filtered typedefs
        my $filter_typedefs_fh = new File::Temp(TEMPLATE => "pgtypedefXXXXX");
        print $filter_typedefs_fh @typedefs;
        $filter_typedefs_fh->close();

        # temp file remains because we return a file handle reference
        return $filter_typedefs_fh;
}


sub process_exclude
{
        if ($excludes && @files)
        {
                open(my $eh, '<', $excludes) || die "opening $excludes";
                while (my $line = <$eh>)
                {
                        chomp $line;
                        my $rgx;
                        eval " \$rgx = qr!$line!;";
                        @files = grep { $_ !~ /$rgx/ } @files if $rgx;
                }
                close($eh);
        }
}


sub read_source
{
        my $source_filename = shift;
        my $source;

        open(my $src_fd, '<', $source_filename)
          || die "opening $source_filename: $!";
        local ($/) = undef;
        $source = <$src_fd>;
        close($src_fd);

        return $source;
}


sub write_source
{
        my $source          = shift;
        my $source_filename = shift;

        open(my $src_fh, '>', $source_filename)
          || die "opening $source_filename: $!";
        print $src_fh $source;
        close($src_fh);
}


sub pre_indent
{
        my $source = shift;

        # remove trailing whitespace
        $source =~ s/\h+$//gm;

        ## Comments

        # Convert // comments to /* */
        $source =~ s!^(\h*)//(.*)$!$1/* $2 */!gm;

        # 'else' followed by a single-line comment, followed by
        # a brace on the next line confuses BSD indent, so we push
        # the comment down to the next line, then later pull it
        # back up again.  Add space before _PGMV or indent will add
        # it for us.
        # AMD: A symptom of not getting this right is that you see errors like:
        # FILE: ../../../src/backend/rewrite/rewriteHandler.c
        # Error@2259:
        # Stuff missing from end of file
        $source =~ s!(\}|\h)else\h*(/\*)(.*\*/)\h*$!$1else\n    $2 _PGMV$3!gm;

        # Indent multi-line after-'else' comment so BSD indent will move it
        # properly. We already moved down single-line comments above.
        # Check for '*' to make sure we are not in a single-line comment that
        # has other text on the line.
        $source =~ s!(\}|\h)else\h*(/\*[^*]*)\h*$!$1else\n    $2!gm;

        # Mark some comments for special treatment later
        $source =~ s!/\* +---!/*---X_X!g;

        ## Other

        # Work around bug where function that defines no local variables
        # misindents switch() case lines and line after #else.  Do not do
        # for struct/enum.
        my @srclines = split(/\n/, $source);
        foreach my $lno (1 .. $#srclines)
        {
                my $l2 = $srclines[$lno];

                # Line is only a single open brace in column 0
                next unless $l2 =~ /^\{\h*$/;

                # previous line has a closing paren
                next unless $srclines[ $lno - 1 ] =~ /\)/;

                # previous line was struct, etc.
                next
                  if $srclines[ $lno - 1 ] =~
                          m!=|^(struct|enum|\h*typedef|extern\h+"C")!;

                $srclines[$lno] = "$l2\nint pgindent_func_no_var_fix;";
        }
        $source = join("\n", @srclines) . "\n";    # make sure there's a final 
\n

        # Prevent indenting of code in 'extern "C"' blocks.
        # we replace the braces with comments which we'll reverse later
        my $extern_c_start = '/* Open extern "C" */';
        my $extern_c_stop  = '/* Close extern "C" */';
        $source =~
s!(^#ifdef\h+__cplusplus.*\nextern\h+"C"\h*\n)\{\h*$!$1$extern_c_start!gm;
        $source =~ s!(^#ifdef\h+__cplusplus.*\n)\}\h*$!$1$extern_c_stop!gm;

        return $source;
}


sub post_indent
{
        my $source          = shift;
        my $source_filename = shift;

        # put back braces for extern "C"
        $source =~ s!^/\* Open extern "C" \*/$!{!gm;
        $source =~ s!^/\* Close extern "C" \*/$!}!gm;

        ## Comments

        # remove special comment marker
        $source =~ s!/\*---X_X!/* ---!g;

        # Pull up single-line comment after 'else' that was pulled down above
        $source =~ s!else\n\h+/\* _PGMV!else\t/*!g;

        # Indent single-line after-'else' comment by only one tab.
        $source =~ s!(\}|\h)else\h+(/\*.*\*/)\h*$!$1else\t$2!gm;

        # Add tab before comments with no whitespace before them (on a tab stop)
        $source =~ s!(\S)(/\*.*\*/)$!$1\t$2!gm;

        # Remove blank line between opening brace and block comment.
        $source =~ s!(\t*\{\n)\n(\h+/\*)$!$1$2!gm;

        # cpp conditionals

        # Reduce whitespace between #endif and comments to one tab
        $source =~ s!^\#endif\h+/\*!#endif   /*!gm;

        # Remove blank line(s) before #else, #elif, and #endif
        $source =~ s!\n\n+(\#else|\#elif|\#endif)!\n$1!g;

        # Add blank line before #endif if it is the last line in the file
        $source =~ s!\n(#endif.*)\n\z!\n\n$1\n!;

        ## Functions

        # Work around misindenting of function with no variables defined.
        $source =~ s!^\h*int\h+pgindent_func_no_var_fix;\h*\n{1,2}!!gm;

        # Use a single space before '*' in function return types
        $source =~ s!^([A-Za-z_]\S*)\h+\*$!$1 *!gm;

        #  Move prototype names to the same line as return type.  Useful
        # for ctags.  Indent should do this, but it does not.  It formats
        # prototypes just like real functions.

        my $ident   = qr/[a-zA-Z_][a-zA-Z_0-9]*/;
        my $comment = qr!/\*.*\*/!;

        $source =~ s!
                        (\n$ident[^(\n]*)\n                  # e.g. static void
                        (
                                $ident\(\n?                      # func_name( 
                                (.*,(\h*$comment)?\n)*           # args b4 
final ln
                                .*\);(\h*$comment)?$             # final line
                        )
                !$1 . (substr($1,-1,1) eq '*' ? '' : ' ') . $2!gmxe;

        ## Other

        # Remove too much indenting after closing brace.
        $source =~ s!^\}\t\h+!}\t!gm;

        # Workaround indent bug that places excessive space before 'static'.
        $source =~ s!^static\h+!static !gm;

        # Remove leading whitespace from typedefs
        $source =~ s!^\h+typedef enum!typedef enum!gm
          if $source_filename =~ 'libpq-(fe|events).h$';

        # Remove trailing blank lines
        $source =~ s!\n+\z!\n!;

        return $source;
}


sub run_indent
{
        my $source        = shift;
        my $error_message = shift;

        my $cmd = "$indent $indent_opts $extra_opts -U" .
                        $filtered_typedefs_fh->filename;

        my $tmp_fh = new File::Temp(TEMPLATE => "pgsrcXXXXX");
        my $filename = $tmp_fh->filename;
        print $tmp_fh $source;
        $tmp_fh->close();

        $$error_message = `$cmd $filename 2>&1`;

        return "" if ($? || length($$error_message) > 0);

        unlink "$filename.BAK";

        open(my $src_out, '<', $filename);
        local ($/) = undef;
        $source = <$src_out>;
        close($src_out);

        return $source;

}

# XXX Ideally we'd implement entab/detab in pure perl.

sub detab
{
        my $source = shift;

        my $tmp_fh = new File::Temp(TEMPLATE => "pgdetXXXXX");
        print $tmp_fh $source;
        $tmp_fh->close();

        open(my $entab, '-|', "$entab -d -t4 -qc " . $tmp_fh->filename);
        local ($/) = undef;
        $source = <$entab>;
        close($entab);

        return $source;
}


sub entab
{
        my $source = shift;

        my $tmp_fh = new File::Temp(TEMPLATE => "pgentXXXXX");
        print $tmp_fh $source;
        $tmp_fh->close();

        open(my $entab, '-|',
                "$entab -d -t8 -qc " . $tmp_fh->filename . " | $entab -t4 -qc");
        local ($/) = undef;
        $source = <$entab>;
        close($entab);

        return $source;
}


# for development diagnostics
sub diff
{
        my $pre   = shift;
        my $post  = shift;
        my $flags = shift || "";

        print STDERR "running diff\n";

        my $pre_fh  = new File::Temp(TEMPLATE => "pgdiffbXXXXX");
        my $post_fh = new File::Temp(TEMPLATE => "pgdiffaXXXXX");

        print $pre_fh $pre;
        print $post_fh $post;

        $pre_fh->close();
        $post_fh->close();

        system("diff $flags " . $pre_fh->filename . " " . 
                                $post_fh->filename . " >&2");
}


sub run_build
{
        eval "use LWP::Simple;";

        my $code_base = shift || '.';
        my $save_dir = getcwd();

        # look for the code root
        foreach (1 .. 5)
        {
                last if -d "$code_base/src/tools/pgindent";
                $code_base = "$code_base/..";
        }

        die "no src/tools/pgindent directory in $code_base"
          unless -d "$code_base/src/tools/pgindent";

        chdir "$code_base/src/tools/pgindent";

        my $rv = getstore("http://buildfarm.postgresql.org/cgi-bin/typedefs.pl";,
                "tmp_typedefs.list");

        die "fetching typedefs.list" unless is_success($rv);

        $ENV{PGTYPEDEFS} = abs_path('tmp_typedefs.list');

        $rv =
          getstore("ftp://ftp.postgresql.org/pub/dev/indent.netbsd.patched.tgz";,
                "indent.netbsd.patched.tgz");

        die "fetching indent.netbsd.patched.tgz" unless is_success($rv);

        # XXX add error checking here

        mkdir "bsdindent";
        chdir "bsdindent";
        system("tar -z -xf ../indent.netbsd.patched.tgz");
        system("make > $devnull 2>&1");

        $ENV{PGINDENT} = abs_path('indent');

        chdir "../../entab";

        system("make > $devnull 2>&1");

        $ENV{PGENTAB} = abs_path('entab');

        chdir $save_dir;

}


sub build_clean
{
        my $code_base = shift || '.';

        # look for the code root
        foreach (1 .. 5)
        {
                last if -d "$code_base/src/tools/pgindent";
                $code_base = "$code_base/..";
        }

        die "no src/tools/pgindent directory in $code_base"
          unless -d "$code_base/src/tools/pgindent";

        chdir "$code_base";

        system("rm -rf src/tools/pgindent/bsdindent");
        system("git clean -q -f src/tools/entab src/tools/pgindent");
}


# main

# get the list of files under code base, if it's set
File::Find::find(
        {   wanted => sub {
                        my ($dev, $ino, $mode, $nlink, $uid, $gid);
                        (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
                          && -f _
                          && /^.*\.[ch]\z/s
                          && push(@files, $File::Find::name);
                  }
        },
        $code_base) if $code_base;

process_exclude();

$filtered_typedefs_fh = load_typedefs();

check_indent();

# make sure we process any non-option arguments.
push(@files, @ARGV);

foreach my $source_filename (@files)
{
        my $source        = read_source($source_filename);
        my $error_message = '';

        $source = pre_indent($source);
        # Protect backslashes in DATA() and wrapping in CATALOG()

        $source = detab($source);
        $source =~ s!^((DATA|CATALOG)\(.*)$!/*$1*/!gm;

        $source = run_indent($source, \$error_message);
        if ($source eq "")
        {
                print STDERR "Failure in $source_filename: " . $error_message . 
"\n";
                next;
        }

        # Restore DATA/CATALOG lines; must be done here so tab alignment is 
preserved
        $source =~ s!^/\*((DATA|CATALOG)\(.*)\*/$!$1!gm;
        $source = entab($source);

        $source = post_indent($source, $source_filename);

        write_source($source, $source_filename);
}

build_clean($code_base) if $build;
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to