Re: [HACKERS] Radix tree for character conversion

Kyotaro HORIGUCHI Tue, 08 Nov 2016 03:23:16 -0800

Hello, this is the revising patch applies on top of the previous
patch.

Differences on map files are enormous but useless for discussion
so they aren't included in this. (but can be generated)


This still doesn't remove three .txt/.xml files since it heavily
bloats the patch. I'm planning that they are removed in the final
shape. All authority files including the removed files are
automatically downloaded by the Makefile in this patch.

At Tue, 08 Nov 2016 10:43:56 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20161108.104356.265607041.horiguchi.kyot...@lab.ntt.co.jp>
> https://www.postgresql.org/docs/devel/static/install-requirements.html
> 
> | Perl 5.8 or later is needed to build from a Git checkout, or if
> | you changed the input files for any of the build steps that use
> | Perl scripts. If building on Windows you will need Perl in any
> | case. Perl is also required to run some test suites.
> 
> So, we should assume Perl 5.8 (released in 2002!) on build
> time. And actually 5.10 on RedHat 6.4, 5.16 on my
> environment(ContOS 7.2), and the official doc is at 5.24. Active
> perl is 5.24. According to this, we should use syntax supported
> as of 5.8 and/but not obsolete until 5.24, then to follow the
> latest convention. But not OO. (But I can't squeeze out a
> concrete syntax set out of this requirements :( )
...(forget this for a while..)

Finally the attached patch contains most of (virtually all of)
Daniel's suggestion and some modification by pgperltidy.

> Addition to this, I'll remove existing authority files and modify
> radix generator so that it can read plain map files in the next
> patch.

So, I think the attached are in rather modern shape.

At Tue, 08 Nov 2016 11:02:58 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
<horiguchi.kyot...@lab.ntt.co.jp> wrote in 
<20161108.110258.59832499.horiguchi.kyot...@lab.ntt.co.jp>
> Hmm.  Somehow perl-mode on my Emacs is stirring with
> ununderstandable indentation and I manually correct them so it is
> highly probable that the style of this patch is not compatible
> with the defined style. Anyway it is better that pgindent
> generates smaller patch so I'll try it.

The attached are applied pgperltidy. Several regions such like
additional character list are marked not to be edited.

One concern is what to leave by 'make distclen' and 'make
maintainer-clean'. The former should remove authority *.TXT files
since it shouldn't be in source archive. On the other hand it is
more convenient that the latter leaves them. This seems somewhat
strange but I can't come up with better behavior for now.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

>From 02a19deb3eb74069936ced0dbea4693322ad9ec8 Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Tue, 8 Nov 2016 18:22:36 +0900
Subject: [PATCH 1/2] Edit perl scripts into modern style

Part of existing and added scripts are in obsolete style. This path
edits them into modern style as the suggestion by Daniel Gustafsson.
---
 src/backend/utils/mb/Unicode/UCS_to_BIG5.pl        |  31 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl      |  33 +-
 .../utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl        |  66 ++-
 src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl      |  36 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl      |   9 +-
 src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl      |  20 +-
 src/backend/utils/mb/Unicode/UCS_to_GB18030.pl     |  29 +-
 src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl       |   7 +-
 .../utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl      |  92 ++--
 src/backend/utils/mb/Unicode/UCS_to_SJIS.pl        |  27 +-
 src/backend/utils/mb/Unicode/UCS_to_UHC.pl         |  35 +-
 src/backend/utils/mb/Unicode/UCS_to_most.pl        |  14 +-
 src/backend/utils/mb/Unicode/convutils.pm          | 579 +++++++++++----------
 src/backend/utils/mb/Unicode/make_mapchecker.pl    |  18 +-
 14 files changed, 529 insertions(+), 467 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl b/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
index 7b14817..0412723 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_BIG5.pl
@@ -24,10 +24,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
-
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Load BIG5.TXT
 my $all = &read_source("BIG5.TXT");
@@ -35,9 +35,10 @@ my $all = &read_source("BIG5.TXT");
 # Load CP950.TXT
 my $cp950txt = &read_source("CP950.TXT");
 
-foreach my $i (@$cp950txt) {
+foreach my $i (@$cp950txt)
+{
 	my $code = $i->{code};
-	my $ucs = $i->{ucs};
+	my $ucs  = $i->{ucs};
 
 	# Pick only the ETEN extended characters in the range 0xf9d6 - 0xf9dc
 	# from CP950.TXT
@@ -46,20 +47,22 @@ foreach my $i (@$cp950txt) {
 		&& $code >= 0xf9d6
 		&& $code <= 0xf9dc)
 	{
-		push @$all, {code => $code,
-					 ucs => $ucs,
-					 comment => $i->{comment},
-					 direction => "both"};
+		push @$all,
+		  { code      => $code,
+			ucs       => $ucs,
+			comment   => $i->{comment},
+			direction => "both" };
 	}
 }
 
-foreach my $i (@$all) {
+foreach my $i (@$all)
+{
 	my $code = $i->{code};
-	my $ucs = $i->{ucs};
+	my $ucs  = $i->{ucs};
 
-	# BIG5.TXT maps several BIG5 characters to U+FFFD. The UTF-8 to BIG5 mapping can
-	# contain only one of them. XXX: Doesn't really make sense to include any of them,
-	# but for historical reasons, we map the first one of them.
+# BIG5.TXT maps several BIG5 characters to U+FFFD. The UTF-8 to BIG5 mapping can
+# contain only one of them. XXX: Doesn't really make sense to include any of them,
+# but for historical reasons, we map the first one of them.
 	if ($i->{ucs} == 0xFFFD && $i->{code} != 0xA15A)
 	{
 		$i->{direction} = "to_unicode";
@@ -67,6 +70,6 @@ foreach my $i (@$all) {
 }
 
 # Output
-print_tables("BIG5", $all);
+print_tables($this_script, "BIG5", $all);
 print_radix_trees($this_script, "BIG5", $all);
 
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
index 8c6039f..76ad502 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
@@ -13,34 +13,36 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for GB18030
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "gb-18030-2000.xml";
+my $in_file = "gb-18030-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$in>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my ($u, $c) = ($1, $2);
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 
 	# The GB-18030 character set, which we use as the source, contains
 	# a lot of extra characters on top of the GB2312 character set that
 	# EUC_CN encodes. Filter out those extra characters.
+
 	next if (($code & 0xFF) < 0xA1);
+#<<< do not let perltidy touch this
 	next if (!($code >= 0xA100 && $code <= 0xA9FF ||
 			   $code >= 0xB000 && $code <= 0xF7FF));
-
+#>>>
 	next if ($code >= 0xA2A1 && $code <= 0xA2B0);
 	next if ($code >= 0xA2E3 && $code <= 0xA2E4);
 	next if ($code >= 0xA2EF && $code <= 0xA2F0);
@@ -67,13 +69,12 @@ while (<FILE>)
 		$ucs = 0x2015;
 	}
 
-	push @mapping, {
-		ucs => $ucs,
-		code => $code,
-		direction => 'both'
-	}
+	push @mapping,
+	  { ucs       => $ucs,
+		code      => $code,
+		direction => 'both' };
 }
-close(FILE);
+close($in);
 
-print_tables("EUC_CN", \@mapping);
+print_tables($this_script, "EUC_CN", \@mapping);
 print_radix_trees($this_script, "EUC_CN", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
index 1b4e99f..a0f61e7 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JIS_2004.pl
@@ -7,56 +7,54 @@
 # Generate UTF-8 <--> EUC_JIS_2004 code conversion tables from
 # "euc-jis-2004-std.txt" (http://x0213.org)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # first generate UTF-8 --> EUC_JIS_2004 table
 
-$in_file = "euc-jis-2004-std.txt";
+my $in_file = "euc-jis-2004-std.txt";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @all;
 
-while ($line = <FILE>)
+while (my $line = <$in>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
-		$c              = $1;
-		$u1             = $2;
-		$u2             = $3;
-		$rest           = "U+" . $u1 . "+" . $u2 . $4;
-		$code           = hex($c);
-		$ucs1           = hex($u1);
-		$ucs2           = hex($u2);
-
-		push @all, { direction => 'both',
-					 ucs => $ucs1,
-					 ucs_second => $ucs2,
-					 code => $code,
-					 comment => $rest };
-		next;
+		# combined characters
+		my ($c, $u1, $u2) = ($1, $2, $3);
+		my $rest = "U+" . $u1 . "+" . $u2 . $4;
+		my $code = hex($c);
+		my $ucs1 = hex($u1);
+		my $ucs2 = hex($u2);
+
+		push @all,
+		  { direction  => 'both',
+			ucs        => $ucs1,
+			ucs_second => $ucs2,
+			code       => $code,
+			comment    => $rest };
 	}
 	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
 	{
-		$c    = $1;
-		$u    = $2;
-		$rest = "U+" . $u . $3;
+		# non-combined characters
+		my ($c, $u, $rest) = ($1, $2, "U+" . $2 . $3);
+		my $ucs  = hex($u);
+		my $code = hex($c);
+
+		next if ($code < 0x80 && $ucs < 0x80);
+
+		push @all,
+		  { direction => 'both',
+			ucs       => $ucs,
+			code      => $code,
+			comment   => $rest };
 	}
-	else
-	{
-		next;
-	}
-
-	$ucs  = hex($u);
-	$code = hex($c);
-
-	next if ($code < 0x80 && $ucs < 0x80);
-
-	push @all, { direction => 'both', ucs => $ucs, code => $code, comment => $rest };
 }
-close(FILE);
+close($in);
 
-print_tables("EUC_JIS_2004", \@all, 1);
+print_tables($this_script, "EUC_JIS_2004", \@all, 1);
 print_radix_trees($this_script, "EUC_JIS_2004", \@all);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
index 0f3fedc..7f2d228 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_JP.pl
@@ -21,7 +21,8 @@ my $jis0212 = &read_source("JIS0212.TXT");
 
 my @mapping;
 
-foreach my $i (@$jis0212) {
+foreach my $i (@$jis0212)
+{
 	# We have a different mapping for this in the EUC_JP to UTF-8 direction.
 	if ($i->{code} == 0x2243)
 	{
@@ -48,13 +49,14 @@ foreach my $i (@$jis0212) {
 # Load CP932.TXT.
 my $ct932 = &read_source("CP932.TXT");
 
-foreach my $i (@$ct932) {
+foreach my $i (@$ct932)
+{
 	my $sjis = $i->{code};
 
 	# We have a different mapping for this in the EUC_JP to UTF-8 direction.
-	if ($sjis == 0xeefa ||
-		$sjis == 0xeefb ||
-		$sjis == 0xeefc)
+	if (   $sjis == 0xeefa
+		|| $sjis == 0xeefb
+		|| $sjis == 0xeefc)
 	{
 		next;
 	}
@@ -63,9 +65,10 @@ foreach my $i (@$ct932) {
 	{
 		my $jis = &sjis2jis($sjis);
 
-		$i->{code} = $jis | ($jis < 0x100 ? 0x8e00 :
-							 ($sjis >= 0xeffd  ? 0x8f8080 : 0x8080));
-
+#<<< do not let perltidy touch this
+		$i->{code} = $jis | ($jis < 0x100 ? 0x8e00:
+							 ($sjis >= 0xeffd ? 0x8f8080 : 0x8080));
+#>>>
 		# Remember the SJIS code for later.
 		$i->{sjis} = $sjis;
 
@@ -73,13 +76,14 @@ foreach my $i (@$ct932) {
 	}
 }
 
-foreach my $i (@mapping) {
+foreach my $i (@mapping)
+{
 	my $sjis = $i->{sjis};
 
 	# These SJIS characters are excluded completely.
-	if ($sjis >= 0xed00 && $sjis <= 0xeef9 ||
-		$sjis >= 0xfa54 && $sjis <= 0xfa56 ||
-		$sjis >= 0xfa58 && $sjis <= 0xfc4b)
+	if (   $sjis >= 0xed00 && $sjis <= 0xeef9
+		|| $sjis >= 0xfa54 && $sjis <= 0xfa56
+		|| $sjis >= 0xfa58 && $sjis <= 0xfc4b)
 	{
 		$i->{direction} = "none";
 		next;
@@ -92,6 +96,7 @@ foreach my $i (@mapping) {
 		next;
 	}
 
+#<<< do not let perltidy touch this
 	if ($sjis == 0x8790 || $sjis == 0x8791 || $sjis == 0x8792 ||
 		$sjis == 0x8795 || $sjis == 0x8796 || $sjis == 0x8797 ||
 		$sjis == 0x879a || $sjis == 0x879b || $sjis == 0x879c ||
@@ -192,8 +197,9 @@ push @mapping, (
 	 {direction => 'to_unicode', ucs => 0x2121, code => 0x8ff4ad, comment => '# TELEPHONE SIGN'},
 	 {direction => 'to_unicode', ucs => 0x3231, code => 0x8ff4ab, comment => '# PARENTHESIZED IDEOGRAPH STOCK'}
 	);
+#>>>
 
-print_tables("EUC_JP", \@mapping);
+print_tables($this_script, "EUC_JP", \@mapping);
 print_radix_trees($this_script, "EUC_JP", \@mapping);
 
 
@@ -217,12 +223,12 @@ sub sjis2jis
 	if ($pos >= 114 * 0x5e && $pos <= 115 * 0x5e + 0x1b)
 	{
 		# This region (115-ku) is out of range of JIS code but for
-		# convenient to generate code in EUC CODESET 3, move this to
+		# convenience to generate code in EUC CODESET 3, move this to
 		# seemingly duplicate region (83-84-ku).
 		$pos = $pos - ((31 * 0x5e) + 12);
 
 		# after 85-ku 82-ten needs to be moved 2 codepoints
-		$pos = $pos - 2 if ($pos >= 84 * 0x5c + 82)
+		$pos = $pos - 2 if ($pos >= 84 * 0x5c + 82);
 	}
 
 	my $hi2 = $pos / 0x5e;
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
index a00d25c..040c5fe 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_KR.pl
@@ -16,9 +16,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Load the source file.
 
@@ -30,11 +31,13 @@ foreach my $i (@$mapping)
 }
 
 # Some extra characters that are not in KSX1001.TXT
-push @$mapping, (
+#<<< do not let perltidy touch this
+push @$mapping,(
 	{direction => 'both', ucs => 0x20AC, code => 0xa2e6, comment => '# EURO SIGN'},
 	{direction => 'both', ucs => 0x00AE, code => 0xa2e7, comment => '# REGISTERED SIGN'},
 	{direction => 'both', ucs => 0x327E, code => 0xa2e8, comment => '# CIRCLED HANGUL IEUNG U'}
 	);
+#>>>
 
-print_tables("EUC_KR", $mapping);
+print_tables($this_script, "EUC_KR", $mapping);
 print_radix_trees($this_script, "EUC_KR", $mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
index 995657e..046a8a3 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_EUC_TW.pl
@@ -17,9 +17,10 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 my $mapping = &read_source("CNS11643.TXT");
 
@@ -27,8 +28,8 @@ my @extras;
 
 foreach my $i (@$mapping)
 {
-	my $ucs = $i->{ucs};
-	my $code = $i->{code};
+	my $ucs      = $i->{ucs};
+	my $code     = $i->{code};
 	my $origcode = $i->{code};
 
 	my $plane = ($code & 0x1f0000) >> 16;
@@ -51,16 +52,15 @@ foreach my $i (@$mapping)
 	# Some codes are mapped twice in the EUC_TW to UTF-8 table.
 	if ($origcode >= 0x12121 && $origcode <= 0x20000)
 	{
-		push @extras, {
-			ucs => $i->{ucs},
-			code => ($i->{code} + 0x8ea10000),
-			rest => $i->{rest},
-			direction => 'to_unicode'
-		}
+		push @extras,
+		  { ucs       => $i->{ucs},
+			code      => ($i->{code} + 0x8ea10000),
+			rest      => $i->{rest},
+			direction => 'to_unicode' };
 	}
 }
 
 push @$mapping, @extras;
 
-print_tables("EUC_TW", $mapping);
+print_tables($this_script, "EUC_TW", $mapping);
 print_radix_trees($this_script, "EUC_TW", $mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
index aaa8302..61e31af 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
@@ -13,36 +13,35 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for GB18030
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "gb-18030-2000.xml";
+my $in_file = "gb-18030-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$in>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my ($u, $c) = ($1, $2);
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 	if ($code >= 0x80 && $ucs >= 0x0080)
 	{
-		push @mapping, {
-			ucs => $ucs,
-			code => $code,
-			direction => 'both'
-		}
+		push @mapping,
+		  { ucs       => $ucs,
+			code      => $code,
+			direction => 'both' };
 	}
 }
-close(FILE);
+close($in);
 
-print_tables("GB18030", \@mapping);
+print_tables($this_script, "GB18030", \@mapping);
 print_radix_trees($this_script, "GB18030", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
index 50735eb..b60858e 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_JOHAB.pl
@@ -15,20 +15,23 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Load the source file.
 
 my $mapping = &read_source("JOHAB.TXT");
 
 # Some extra characters that are not in JOHAB.TXT
+#<<< do not let perltidy touch this
 push @$mapping, (
 	{direction => 'both', ucs => 0x20AC, code => 0xd9e6, comment => '# EURO SIGN'},
 	{direction => 'both', ucs => 0x00AE, code => 0xd9e7, comment => '# REGISTERED SIGN'},
 	{direction => 'both', ucs => 0x327E, code => 0xd9e8, comment => '# CIRCLED HANGUL IEUNG U'}
 	);
+#>>>
 
-print_tables("JOHAB", $mapping);
+print_tables($this_script, "JOHAB", $mapping);
 print_radix_trees($this_script, "JOHAB", $mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
index a9641e4..2e66e63 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SHIFT_JIS_2004.pl
@@ -7,78 +7,68 @@
 # Generate UTF-8 <--> SHIFT_JIS_2004 code conversion tables from
 # "sjis-0213-2004-std.txt" (http://x0213.org)
 
+use strict;
 require "convutils.pm";
 
 # first generate UTF-8 --> SHIFT_JIS_2004 table
 
-$this_script = $0;
+my $this_script = $0;
 
-$in_file = "sjis-0213-2004-std.txt";
+my $in_file = "sjis-0213-2004-std.txt";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while ($line = <FILE>)
+while (my $line = <$in>)
 {
 	if ($line =~ /^0x(.*)[ \t]*U\+(.*)\+(.*)[ \t]*#(.*)$/)
 	{
-		$c              = $1;
-		$u1             = $2;
-		$u2             = $3;
-		$rest           = "U+" . $u1 . "+" . $u2 . $4;
-		$code           = hex($c);
-		$ucs1           = hex($u1);
-		$ucs2           = hex($u2);
+		my ($c, $u1, $u2) = ($1, $2, $3);
+		my $rest = "U+" . $u1 . "+" . $u2 . $4;
+		my $code = hex($c);
+		my $ucs1 = hex($u1);
+		my $ucs2 = hex($u2);
 
-		push @mapping, {
-			code => $code,
-			ucs => $ucs1,
+		push @mapping,
+		  { code       => $code,
+			ucs        => $ucs1,
 			ucs_second => $ucs2,
-			comment => $rest,
-			direction => 'both'
-		};
+			comment    => $rest,
+			direction  => 'both' };
 		next;
 	}
 	elsif ($line =~ /^0x(.*)[ \t]*U\+(.*)[ \t]*#(.*)$/)
 	{
-		$c    = $1;
-		$u    = $2;
-		$rest = "U+" . $u . $3;
-	}
-	else
-	{
-		next;
-	}
+		my ($c, $u, $rest) = ($1, $2, "U+" . $2 . $3);
+		my ($ucs, $code) = (hex($u), hex($c));
+		my $direction;
 
-	$ucs  = hex($u);
-	$code = hex($c);
+		if ($code < 0x80 && $ucs < 0x80)
+		{
+			next;
+		}
+		elsif ($code < 0x80)
+		{
+			$direction = 'from_unicode';
+		}
+		elsif ($ucs < 0x80)
+		{
+			$direction = 'to_unicode';
+		}
+		else
+		{
+			$direction = 'both';
+		}
 
-	if ($code < 0x80 && $ucs < 0x80)
-	{
-		next;
+		push @mapping,
+		  { code      => $code,
+			ucs       => $ucs,
+			comment   => $rest,
+			direction => $direction };
 	}
-	elsif ($code < 0x80)
-	{
-		$direction = 'from_unicode';
-	}
-	elsif ($ucs < 0x80)
-	{
-		$direction = 'to_unicode';
-	}
-	else
-	{
-		$direction = 'both';
-	}
-
-	push @mapping, {
-		code => $code,
-		ucs => $ucs,
-		comment => $rest,
-		direction => $direction
-	};
 }
-close(FILE);
+close($in);
 
-print_tables("SHIFT_JIS_2004", \@mapping, 1);
+print_tables($this_script, "SHIFT_JIS_2004", \@mapping, 1);
 print_radix_trees($this_script, "SHIFT_JIS_2004", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
index 410fc54..9a1cc52 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl
@@ -18,35 +18,38 @@ my $this_script = $0;
 my $mapping = read_source("CP932.TXT");
 
 # Drop these SJIS codes from the source for UTF8=>SJIS conversion
-my @reject_sjis =(
+#<<< do not let perltidy touch this
+my @reject_sjis = (
 	0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782,
-	0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797,
+	0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792,	0x8795..0x8797,
 	0x879a..0x879c
-);
+	);
+#>>>
 
 foreach my $i (@$mapping)
 {
 	my $code = $i->{code};
-	my $ucs = $i->{ucs};
+	my $ucs  = $i->{ucs};
 
-	if (grep {$code == $_} @reject_sjis)
+	if (grep { $code == $_ } @reject_sjis)
 	{
 		$i->{direction} = "to_unicode";
 	}
 }
 
 # Add these UTF8->SJIS pairs to the table.
+#<<< do not let perltidy touch this
 push @$mapping, (
-	{direction => "from_unicode", ucs => 0x00a2,   code => 0x8191, comment => '# CENT SIGN'},
-	{direction => "from_unicode", ucs => 0x00a3,   code => 0x8192, comment => '# POUND SIGN'},
-	{direction => "from_unicode", ucs => 0x00a5,   code => 0x5c,   comment => '# YEN SIGN'},
-	{direction => "from_unicode", ucs => 0x00ac,   code => 0x81ca, comment => '# NOT SIGN'},
+	{direction => "from_unicode", ucs => 0x00a2, code => 0x8191, comment => '# CENT SIGN'},
+	{direction => "from_unicode", ucs => 0x00a3, code => 0x8192, comment => '# POUND SIGN'},
+	{direction => "from_unicode", ucs => 0x00a5, code => 0x5c,   comment => '# YEN SIGN'},
+	{direction => "from_unicode", ucs => 0x00ac, code => 0x81ca, comment => '# NOT SIGN'},
 	{direction => "from_unicode", ucs => 0x2016, code => 0x8161, comment => '# DOUBLE VERTICAL LINE'},
 	{direction => "from_unicode", ucs => 0x203e, code => 0x7e,   comment => '# OVERLINE'},
 	{direction => "from_unicode", ucs => 0x2212, code => 0x817c, comment => '# MINUS SIGN'},
 	{direction => "from_unicode", ucs => 0x301c, code => 0x8160, comment => '# WAVE DASH'}
-);
+	);
+#>>>
 
-
-print_tables("SJIS", $mapping);
+print_tables($this_script, "SJIS", $mapping);
 print_radix_trees($this_script, "SJIS", $mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
index 6f61df4..d297d9a 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_UHC.pl
@@ -13,42 +13,45 @@
 # where the "u" field is the Unicode code point in hex,
 # and the "b" field is the hex byte sequence for UHC
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
 # Read the input
 
-$in_file = "windows-949-2000.xml";
+my $in_file = "windows-949-2000.xml";
 
-open(FILE, $in_file) || die("cannot open $in_file");
+open(my $in, '<', $in_file) || die("cannot open $in_file");
 
 my @mapping;
 
-while (<FILE>)
+while (<$in>)
 {
 	next if (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
-	$u = $1;
-	$c = $2;
+	my ($u, $c) = ($1, $2);
 	$c =~ s/ //g;
-	$ucs  = hex($u);
-	$code = hex($c);
+	my $ucs  = hex($u);
+	my $code = hex($c);
 
 	next if ($code == 0x0080 || $code == 0x00FF);
 
 	if ($code >= 0x80 && $ucs >= 0x0080)
 	{
-		push @mapping, {
-			ucs => $ucs,
-			code => $code,
-			direction => 'both'
-		}
+		push @mapping,
+		  { ucs       => $ucs,
+			code      => $code,
+			direction => 'both' };
 	}
 }
-close(FILE);
+close($in);
 
 # One extra character that's not in the source file.
-push @mapping, { direction => 'both', code => 0xa2e8, ucs => 0x327e, comment => 'CIRCLED HANGUL IEUNG U' };
+push @mapping,
+  { direction => 'both',
+	code      => 0xa2e8,
+	ucs       => 0x327e,
+	comment   => 'CIRCLED HANGUL IEUNG U' };
 
-print_tables("UHC", \@mapping);
+print_tables($this_script, "UHC", \@mapping);
 print_radix_trees($this_script, "UHC", \@mapping);
diff --git a/src/backend/utils/mb/Unicode/UCS_to_most.pl b/src/backend/utils/mb/Unicode/UCS_to_most.pl
index 631214e..9503076 100755
--- a/src/backend/utils/mb/Unicode/UCS_to_most.pl
+++ b/src/backend/utils/mb/Unicode/UCS_to_most.pl
@@ -15,11 +15,12 @@
 #		 UCS-2 code in hex
 #		 # and Unicode name (not used in this script)
 
+use strict;
 require "convutils.pm";
 
-$this_script = $0;
+my $this_script = $0;
 
-%filename = (
+my %filename = (
 	'WIN866'     => 'CP866.TXT',
 	'WIN874'     => 'CP874.TXT',
 	'WIN1250'    => 'CP1250.TXT',
@@ -48,12 +49,13 @@ $this_script = $0;
 	'KOI8U'      => 'KOI8-U.TXT',
 	'GBK'        => 'CP936.TXT');
 
-@charsets = keys(%filename);
-@charsets = @ARGV if scalar(@ARGV);
-foreach $charset (@charsets)
+# make maps for all encodings if not specfied
+my @charsets = (scalar(@ARGV) > 0) ? @ARGV : keys(%filename);
+
+foreach my $charset (@charsets)
 {
 	my $mapping = &read_source($filename{$charset});
 
-	print_tables($charset, $mapping);
+	print_tables($this_script, $charset, $mapping);
 	print_radix_trees($this_script, $charset, $mapping);
 }
diff --git a/src/backend/utils/mb/Unicode/convutils.pm b/src/backend/utils/mb/Unicode/convutils.pm
index 35ba423..c2a1565 100644
--- a/src/backend/utils/mb/Unicode/convutils.pm
+++ b/src/backend/utils/mb/Unicode/convutils.pm
@@ -3,13 +3,15 @@
 #
 # src/backend/utils/mb/Unicode/convutils.pm
 
+use strict;
+
 #######################################################################
 # convert UCS-4 to UTF-8
 #
 sub ucs2utf
 {
-	local ($ucs) = @_;
-	local $utf;
+	my ($ucs) = @_;
+	my $utf;
 
 	if ($ucs <= 0x007f)
 	{
@@ -44,29 +46,33 @@ sub read_source
 	my ($fname) = @_;
 	my @r;
 
-	open(my $in, $fname) || die("cannot open $fname");
+	open(my $in, '<', $fname) || die("cannot open $fname");
 
 	while (<$in>)
 	{
 		next if (/^#/);
 		chop;
 
-		next if (/^$/); # Ignore empty lines
+		next if (/^$/);    # Ignore empty lines
 
 		next if (/^0x([0-9A-F]+)\s+(#.*)$/);
 
 		# Skip the first column for JIS0208.TXT
+		#<<< do not let perltidy touch this
 		if (!/^0x([0-9A-Fa-f]+)\s+0x([0-9A-Fa-f]+)\s+(?:0x([0-9A-Fa-f]+)\s+)?(#.*)$/)
 		{
 			print STDERR "READ ERROR at line $. in $fname: $_\n";
 			exit;
 		}
-		my $out = {f => $fname, l => $.,
-				   code => hex($1),
-				   ucs => hex($2),
-				   comment => $4,
-				   direction => "both"
-				};
+		#>>>
+
+		my $out = {
+			f         => $fname,
+			l         => $.,
+			code      => hex($1),
+			ucs       => hex($2),
+			comment   => $4,
+			direction => "both" };
 
 		# Ignore pure ASCII mappings. PostgreSQL character conversion code
 		# never even passes these to the conversion code.
@@ -83,6 +89,7 @@ sub read_source
 # print_tables : output mapping tables
 #
 # Arguments:
+#  this_script - the name of the *caller script* of this feature
 #  charset - string name of the character set.
 #  table   - mapping table (see format below)
 #  verbose - if 1, output comment on each line,
@@ -104,7 +111,7 @@ sub read_source
 #
 sub print_tables
 {
-	my ($charset, $table, $verbose) = @_;
+	my ($this_script, $charset, $table, $verbose) = @_;
 
 	# Build an array with only the to-UTF8 direction mappings
 	my @to_unicode;
@@ -116,167 +123,185 @@ sub print_tables
 	{
 		if (defined $i->{ucs_second})
 		{
-			my $entry = {utf8 => ucs2utf($i->{ucs}),
-						 utf8_second => ucs2utf($i->{ucs_second}),
-						 code => $i->{code},
-						 comment => $i->{comment},
-						 f => $i->{f}, l => $i->{l}};
+			my $entry = {
+				utf8        => ucs2utf($i->{ucs}),
+				utf8_second => ucs2utf($i->{ucs_second}),
+				code        => $i->{code},
+				comment     => $i->{comment},
+				f           => $i->{f},
+				l           => $i->{l} };
 			if ($i->{direction} eq "both" || $i->{direction} eq "to_unicode")
 			{
 				push @to_unicode_combined, $entry;
 			}
-			if ($i->{direction} eq "both" || $i->{direction} eq "from_unicode")
+			if (   $i->{direction} eq "both"
+				|| $i->{direction} eq "from_unicode")
 			{
 				push @from_unicode_combined, $entry;
 			}
 		}
 		else
 		{
-			my $entry = {utf8 => ucs2utf($i->{ucs}),
-						 code => $i->{code},
-						 comment => $i->{comment},
-						 f => $i->{f}, l => $i->{l}};
+			my $entry = {
+				utf8    => ucs2utf($i->{ucs}),
+				code    => $i->{code},
+				comment => $i->{comment},
+				f       => $i->{f},
+				l       => $i->{l} };
 			if ($i->{direction} eq "both" || $i->{direction} eq "to_unicode")
 			{
 				push @to_unicode, $entry;
 			}
-			if ($i->{direction} eq "both" || $i->{direction} eq "from_unicode")
+			if (   $i->{direction} eq "both"
+				|| $i->{direction} eq "from_unicode")
 			{
 				push @from_unicode, $entry;
 			}
 		}
 	}
 
-	print_to_utf8_map($charset, \@to_unicode, $verbose);
-	print_to_utf8_combined_map($charset, \@to_unicode_combined, $verbose) if (scalar @to_unicode_combined > 0);
-	print_from_utf8_map($charset, \@from_unicode, $verbose);
-	print_from_utf8_combined_map($charset, \@from_unicode_combined, $verbose) if (scalar @from_unicode_combined > 0);
+	print_to_utf8_map($this_script, $charset, \@to_unicode, $verbose);
+	if (scalar @to_unicode_combined > 0)
+	{
+		print_to_utf8_combined_map($this_script, $charset,
+			\@to_unicode_combined, $verbose);
+	}
+	print_from_utf8_map($this_script, $charset, \@from_unicode, $verbose);
+	if (scalar @from_unicode_combined > 0)
+	{
+		print_from_utf8_combined_map($this_script, $charset,
+			\@from_unicode_combined, $verbose);
+	}
 }
 
 sub print_from_utf8_map
 {
-	my ($charset, $table, $verbose) = @_;
+	my ($this_script, $charset, $table, $verbose) = @_;
 
 	my $last_comment = "";
 
 	my $fname = lc("utf8_to_${charset}.map");
 	print "- Writing UTF8=>${charset} conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
-	printf($out "/* src/backend/utils/mb/Unicode/$fname */\n\n".
-		   "static const pg_utf_to_local ULmap${charset}[ %d ] = {",
-		   scalar(@$table));
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
+	printf $out "/* src/backend/utils/mb/Unicode/$fname */\n"
+	  . "/* This file is generated by $this_script */\n\n"
+	  . "static const pg_utf_to_local ULmap${charset}[ %d ] = {",
+	  scalar(@$table);
 	my $first = 1;
-	foreach my $i (sort {$a->{utf8} <=> $b->{utf8}} @$table)
-    {
-		print($out ",") if (!$first);
+	foreach my $i (sort { $a->{utf8} <=> $b->{utf8} } @$table)
+	{
+		print $out "," if (!$first);
 		$first = 0;
-		print($out "\t/* $last_comment */") if ($verbose);
+		print $out "\t/* $last_comment */" if ($verbose);
 
-		printf($out "\n  {0x%04x, 0x%04x}", $i->{utf8}, $i->{code});
+		printf $out "\n  {0x%04x, 0x%04x}", $i->{utf8}, $i->{code};
 		if ($verbose >= 2)
 		{
 			$last_comment =
-				sprintf("%s:%d %s", $i->{f}, $i->{l}, $i->{comment});
+			  sprintf("%s:%d %s", $i->{f}, $i->{l}, $i->{comment});
 		}
 		else
 		{
 			$last_comment = $i->{comment};
 		}
 	}
-	print($out "\t/* $last_comment */") if ($verbose);
+	print $out "\t/* $last_comment */" if ($verbose);
 	print $out "\n};\n";
 	close($out);
 }
 
 sub print_from_utf8_combined_map
 {
-	my ($charset, $table, $verbose) = @_;
+	my ($this_script, $charset, $table, $verbose) = @_;
 
 	my $last_comment = "";
 
 	my $fname = lc("utf8_to_${charset}_combined.map");
 	print "- Writing UTF8=>${charset} conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
-	printf($out "/* src/backend/utils/mb/Unicode/$fname */\n\n".
-		   "static const pg_utf_to_local_combined ULmap${charset}_combined[ %d ] = {",
-		   scalar(@$table));
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
+	printf $out "/* src/backend/utils/mb/Unicode/$fname */\n"
+	  . "/* This file is generated by $this_script */\n\n"
+	  . "static const pg_utf_to_local_combined ULmap${charset}_combined[ %d ] = {",
+	  scalar(@$table);
 	my $first = 1;
-	foreach my $i (sort {$a->{utf8} <=> $b->{utf8}} @$table)
-    {
-		print($out ",") if (!$first);
+	foreach my $i (sort { $a->{utf8} <=> $b->{utf8} } @$table)
+	{
+		print $out "," if (!$first);
 		$first = 0;
-		print($out "\t/* $last_comment */") if ($verbose);
+		print $out "\t/* $last_comment */" if ($verbose);
 
-		printf($out "\n  {0x%08x, 0x%08x, 0x%04x}",
-			   $i->{utf8}, $i->{utf8_second}, $i->{code});
+		printf $out "\n  {0x%08x, 0x%08x, 0x%04x}",
+		  $i->{utf8}, $i->{utf8_second}, $i->{code};
 		$last_comment = $i->{comment};
 	}
-	print($out "\t/* $last_comment */") if ($verbose);
+	print $out "\t/* $last_comment */" if ($verbose);
 	print $out "\n};\n";
 	close($out);
 }
 
 sub print_to_utf8_map
 {
-	my ($charset, $table, $verbose) = @_;
+	my ($this_script, $charset, $table, $verbose) = @_;
 
 	my $last_comment = "";
 
 	my $fname = lc("${charset}_to_utf8.map");
 
 	print "- Writing ${charset}=>UTF8 conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
-	printf($out "/* src/backend/utils/mb/Unicode/${fname} */\n\n".
-		   "static const pg_local_to_utf LUmap${charset}[ %d ] = {",
-		   scalar(@$table));
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
+	printf $out "/* src/backend/utils/mb/Unicode/$fname */\n"
+	  . "/* This file is generated by $this_script */\n\n"
+	  . "static const pg_local_to_utf LUmap${charset}[ %d ] = {",
+	  scalar(@$table);
 	my $first = 1;
-	foreach my $i (sort {$a->{code} <=> $b->{code}} @$table)
-    {
-		print($out ",") if (!$first);
+	foreach my $i (sort { $a->{code} <=> $b->{code} } @$table)
+	{
+		print $out "," if (!$first);
 		$first = 0;
-		print($out "\t/* $last_comment */") if ($verbose);
+		print $out "\t/* $last_comment */" if ($verbose);
 
-		printf($out "\n  {0x%04x, 0x%x}", $i->{code}, $i->{utf8});
+		printf $out "\n  {0x%04x, 0x%x}", $i->{code}, $i->{utf8};
 		if ($verbose >= 2)
 		{
 			$last_comment =
-				sprintf("%s:%d %s", $i->{f}, $i->{l}, $i->{comment});
+			  sprintf("%s:%d %s", $i->{f}, $i->{l}, $i->{comment});
 		}
 		else
 		{
 			$last_comment = $i->{comment};
 		}
 	}
-	print($out "\t/* $last_comment */") if ($verbose);
+	print $out "\t/* $last_comment */" if ($verbose);
 	print $out "\n};\n";
 	close($out);
 }
 
 sub print_to_utf8_combined_map
 {
-	my ($charset, $table, $verbose) = @_;
+	my ($this_script, $charset, $table, $verbose) = @_;
 
 	my $last_comment = "";
 
 	my $fname = lc("${charset}_to_utf8_combined.map");
 
 	print "- Writing ${charset}=>UTF8 conversion table: $fname\n";
-	open(my $out, "> $fname") || die "cannot open output file : $fname\n";
-	printf($out "/* src/backend/utils/mb/Unicode/${fname} */\n\n".
-		   "static const pg_local_to_utf_combined LUmap${charset}_combined[ %d ] = {",
-		   scalar(@$table));
+	open(my $out, '>', $fname) || die "cannot open output file : $fname\n";
+	printf $out "/* src/backend/utils/mb/Unicode/$fname */\n"
+	  . "/* This file is generated by $this_script */\n\n"
+	  . "static const pg_local_to_utf_combined LUmap${charset}_combined[ %d ] = {",
+	  scalar(@$table);
 	my $first = 1;
-	foreach my $i (sort {$a->{code} <=> $b->{code}} @$table)
-    {
-		print($out ",") if (!$first);
+	foreach my $i (sort { $a->{code} <=> $b->{code} } @$table)
+	{
+		print $out "," if (!$first);
 		$first = 0;
-		print($out "\t/* $last_comment */") if ($verbose);
+		print $out "\t/* $last_comment */" if ($verbose);
 
-		printf($out "\n  {0x%04x, 0x%08x, 0x%08x}",
-			   $i->{code}, $i->{utf8}, $i->{utf8_second});
+		printf $out "\n  {0x%04x, 0x%08x, 0x%08x}",
+		  $i->{code}, $i->{utf8}, $i->{utf8_second};
 		$last_comment = $i->{comment};
 	}
-	print($out "\t/* $last_comment */") if ($verbose);
+	print $out "\t/* $last_comment */" if ($verbose);
 	print $out "\n};\n";
 	close($out);
 }
@@ -285,29 +310,30 @@ sub print_to_utf8_combined_map
 # RADIX TREE STUFF
 
 # C struct type names : see wchar.h
-my $radix_type = "pg_mb_radix_tree";
+my $radix_type      = "pg_mb_radix_tree";
 my $radix_node_type = "pg_mb_radix_index";
 
 #########################################
-# load_chartable(<map file name>)
+# read_maptable(<map file name>)
 #
-# extract data from map files and returns a character table.
+# extract data from map files and returns a character map table.
 # returns a reference to a hash <in code> => <out code>
-sub load_maptable
+sub read_maptable
 {
-	my($fname) = @_;
+	my ($fname) = @_;
 	my %c;
 
-	open(my $in, $fname) || die("cannot open $fname");
+	open(my $in, '<', $fname) || die("cannot open $fname");
 
-	while(<$in>)
+	while (<$in>)
 	{
 		if (/^[ \t]*{0x([0-9a-f]+), *0x([0-9a-f]+)},?/)
 		{
-			$c{hex($1)} = hex($2);
+			$c{ hex($1) } = hex($2);
 		}
 	}
 
+	close($in);
 	return \%c;
 }
 
@@ -383,121 +409,128 @@ sub generate_index
 	my ($c) = @_;
 	my (%csegs, %b2idx, %b3idx1, %b3idx2, %b4idx1, %b4idx2, %b4idx3);
 	my @all_tables =
-		(\%csegs, \%b2idx, \%b3idx1, \%b3idx2, \%b4idx1, \%b4idx2, \%b4idx3);
+	  (\%csegs, \%b2idx, \%b3idx1, \%b3idx2, \%b4idx1, \%b4idx2, \%b4idx3);
 	my $si;
 
 	# initialize attributes of index tables
-	$csegs{attr} = {name=> "csegs", chartbl => 1, segmented => 1,
-					is32bit => 0, has0page => 0};
-	$b2idx{attr} = {name => "b2idx", segmented => 0, nextidx => \%csegs};
-	$b3idx1{attr} = {name => "b3idx1", segmented => 0, nextidx => \%b3idx2};
-	$b3idx2{attr} = {name => "b3idx2", segmented => 1, nextidx => \%csegs};
-	$b4idx1{attr} = {name => "b4idx1", segmented => 0, nextidx => \%b4idx2};
-	$b4idx2{attr} = {name => "b4idx2", segmented => 1, nextidx => \%b4idx3};
-	$b4idx3{attr} = {name => "b4idx3", segmented => 1, nextidx => \%csegs};
+	#<<< do not let perltidy touch this
+	$csegs{attr} = {name => "csegs", chartbl => 1, segmented => 1,
+					is32bit   => 0,	has0page  => 0};
+	#>>>
+	$csegs{attr} = {
+		name      => "csegs",
+		chartbl   => 1,
+		segmented => 1,
+		is32bit   => 0,
+		has0page  => 0 };
+	$b2idx{attr}  = { name => "b2idx",  segmented => 0, nextidx => \%csegs };
+	$b3idx1{attr} = { name => "b3idx1", segmented => 0, nextidx => \%b3idx2 };
+	$b3idx2{attr} = { name => "b3idx2", segmented => 1, nextidx => \%csegs };
+	$b4idx1{attr} = { name => "b4idx1", segmented => 0, nextidx => \%b4idx2 };
+	$b4idx2{attr} = { name => "b4idx2", segmented => 1, nextidx => \%b4idx3 };
+	$b4idx3{attr} = { name => "b4idx3", segmented => 1, nextidx => \%csegs };
 
 	foreach my $in (keys %$c)
 	{
 		if ($in < 0x100)
 		{
 			my $b1 = $in;
+
 			# 1 byte code doesn't have index. the first segment #0 of
 			# character table stores them
 			$csegs{attr}{has0page} = 1;
-			$si = {segid => 0, off => $in, label => "1byte-", char => $c->{$in}};
+			$si = {
+				segid => 0,
+				off   => $in,
+				label => "1byte-",
+				char  => $c->{$in} };
 		}
 		elsif ($in < 0x10000)
 		{
 			# 2-byte code index consists of just one flat table
-			my $b1 = $in >> 8;
-			my $b2 = $in & 0xff;
+			my $b1     = $in >> 8;
+			my $b2     = $in & 0xff;
 			my $csegid = $in >> 8;
 
-			if (! defined $b2idx{i}{$b1})
+			if (!defined $b2idx{i}{$b1})
 			{
 				&set_min_max($b2idx{attr}, $b1);
 				$b2idx{i}{$b1}{segid} = $csegid;
 			}
 			$si = {
 				segid => $csegid,
-				off => $b2,
+				off   => $b2,
 				label => sprintf("%02x", $b1),
-				char => $c->{$in}
-			};
+				char  => $c->{$in} };
 		}
 		elsif ($in < 0x1000000)
 		{
 			# 3-byte code index consists of one flat table and one
 			# segmented table
-			my $b1 = $in >> 16;
-			my $b2 = ($in >> 8) & 0xff;
-			my $b3 = $in & 0xff;
-			my $l1id = $in >> 16;
+			my $b1     = $in >> 16;
+			my $b2     = ($in >> 8) & 0xff;
+			my $b3     = $in & 0xff;
+			my $l1id   = $in >> 16;
 			my $csegid = $in >> 8;
 
-			if (! defined $b3idx1{i}{$b1})
+			if (!defined $b3idx1{i}{$b1})
 			{
 				&set_min_max($b3idx1{attr}, $b1);
 				$b3idx1{i}{$b1}{segid} = $l1id;
 			}
-			if (! defined $b3idx2{i}{$l1id}{d}{$b2})
+			if (!defined $b3idx2{i}{$l1id}{d}{$b2})
 			{
 				&set_min_max($b3idx2{attr}, $b2);
 				$b3idx2{i}{$l1id}{label} = sprintf("%02x", $b1);
 				$b3idx2{i}{$l1id}{d}{$b2} = {
 					segid => $csegid,
-					label => sprintf("%02x%02x", $b1, $b2)
-				}
+					label => sprintf("%02x%02x", $b1, $b2) };
 			}
 
 			$si = {
 				segid => $csegid,
 				off   => $b3,
 				label => sprintf("%02x%02x", $b1, $b2),
-				char  => $c->{$in}
-			};
+				char  => $c->{$in} };
 		}
 		elsif ($in < 0x100000000)
 		{
 			# 4-byte code index consists of one flat table, and two
 			# segmented tables
-			my $b1 = $in >> 24;
-			my $b2 = ($in >> 16) & 0xff;
-			my $b3 = ($in >> 8) & 0xff;
-			my $b4 = $in & 0xff;
-			my $l1id = $in >> 24;
-			my $l2id = $in >> 16;
+			my $b1     = $in >> 24;
+			my $b2     = ($in >> 16) & 0xff;
+			my $b3     = ($in >> 8) & 0xff;
+			my $b4     = $in & 0xff;
+			my $l1id   = $in >> 24;
+			my $l2id   = $in >> 16;
 			my $csegid = $in >> 8;
 
-			if (! defined $b4idx1{i}{$b1})
+			if (!defined $b4idx1{i}{$b1})
 			{
 				&set_min_max($b4idx1{attr}, $b1);
 				$b4idx1{i}{$b1}{segid} = $l1id;
 			}
 
-			if (! defined $b4idx2{i}{$l1id}{d}{$b2})
+			if (!defined $b4idx2{i}{$l1id}{d}{$b2})
 			{
 				&set_min_max($b4idx2{attr}, $b2);
 				$b4idx2{i}{$l1id}{d}{$b2} = {
 					segid => $l2id,
-					label => sprintf("%02x", $b1)
-				}
+					label => sprintf("%02x", $b1) };
 			}
-			if (! defined $b4idx3{i}{$l2id}{d}{$b3})
+			if (!defined $b4idx3{i}{$l2id}{d}{$b3})
 			{
 				&set_min_max($b4idx3{attr}, $b3);
 				$b4idx3{i}{$l2id}{d}{$b3} = {
 					segid => $csegid,
-					label => sprintf("%02x%02x", $b1, $b2)
-				}
+					label => sprintf("%02x%02x", $b1, $b2) };
 			}
 
 			$si = {
 				segid => $csegid,
 				off   => $b4,
 				label => sprintf("%02x%02x%02x", $b1, $b2, $b3),
-				char  => $c->{$in}
-			};
+				char  => $c->{$in} };
 		}
 		else
 		{
@@ -505,8 +538,8 @@ sub generate_index
 		}
 
 		&set_min_max($csegs{attr}, $si->{off});
-		$csegs{i}{$si->{segid}}{d}{$si->{off}} = $si->{char};
-		$csegs{i}{$si->{segid}}{label} = $si->{label};
+		$csegs{i}{ $si->{segid} }{d}{ $si->{off} } = $si->{char};
+		$csegs{i}{ $si->{segid} }{label} = $si->{label};
 		$csegs{attr}{is32bit} = 1 if ($si->{char} >= 0x10000);
 		&update_width($csegs{attr}, $si->{char});
 		if ($si->{char} >= 0x100000000)
@@ -518,22 +551,23 @@ sub generate_index
 	# calcualte segment attributes
 	foreach my $t (@all_tables)
 	{
-		next if (! defined $t->{i} || ! $t->{attr}{segmented});
+		next if (!defined $t->{i} || !$t->{attr}{segmented});
 
 		# segments are to be aligned in the numerical order of segment id
-		my @keylist = sort {$a <=> $b} keys $t->{i};
+		my @keylist = sort { $a <=> $b } keys $t->{i};
 		next if ($#keylist < 0);
-		my $offset = 1;
+		my $offset  = 1;
 		my $segsize = $t->{attr}{max} - $t->{attr}{min} + 1;
 
 		for my $k (@keylist)
 		{
 			my $seg = $t->{i}{$k};
-			$seg->{lower} = $t->{attr}{min};
-			$seg->{upper} = $t->{attr}{max};
+			$seg->{lower}  = $t->{attr}{min};
+			$seg->{upper}  = $t->{attr}{max};
 			$seg->{offset} = $offset;
 			$offset += $segsize;
 		}
+
 		# overlapping successive zeros between segments
 		&overlap_segments($t);
 	}
@@ -544,12 +578,13 @@ sub generate_index
 		&make_index_link($t, $t->{attr}{nextidx});
 	}
 
-	return { name_prefix => "",
-			 csegs => \%csegs,
-			 b2idx => [\%b2idx],
-			 b3idx => [\%b3idx1, \%b3idx2],
-			 b4idx => [\%b4idx1, \%b4idx2, \%b4idx3],
-			 all => \@all_tables};
+	return {
+		name_prefix => "",
+		csegs       => \%csegs,
+		b2idx       => [ \%b2idx ],
+		b3idx       => [ \%b3idx1, \%b3idx2 ],
+		b4idx       => [ \%b4idx1, \%b4idx2, \%b4idx3 ],
+		all         => \@all_tables };
 }
 
 
@@ -559,8 +594,8 @@ sub set_min_max
 {
 	my ($a, $v) = @_;
 
-	$a->{min} = $v if (! defined $a->{min} || $v < $a->{min});
-	$a->{max} = $v if (! defined $a->{max} || $v > $a->{max});
+	$a->{min} = $v if (!defined $a->{min} || $v < $a->{min});
+	$a->{max} = $v if (!defined $a->{max} || $v > $a->{max});
 }
 
 #########################################
@@ -569,9 +604,11 @@ sub update_width
 {
 	my ($a, $v) = @_;
 
-	my $nnibbles =  int((int(log($v)/log(16)) + 1) / 2) * 2;
-	$a->{width} = $nnibbles
-		if (! defined $a->{width} || $nnibbles > $a->{width});
+	my $nnibbles = int((int(log($v) / log(16)) + 1) / 2) * 2;
+	if (!defined $a->{width} || $nnibbles > $a->{width})
+	{
+		$a->{width} = $nnibbles;
+	}
 }
 
 #########################################
@@ -584,11 +621,11 @@ sub overlap_segments
 	my ($h) = @_;
 
 	# don't touch if undefined
-	return if (! defined $h->{i} || !$h->{attr}{segmented});
+	return if (!defined $h->{i} || !$h->{attr}{segmented});
 	my $index = $h->{i};
 	my ($min, $max) = ($h->{attr}{min}, $h->{attr}{max});
 	my ($prev, $first);
-	my @segids = sort {$a <=> $b} keys $index;
+	my @segids = sort { $a <=> $b } keys $index;
 	return if ($#segids < 1);
 
 	$first = 1;
@@ -599,7 +636,7 @@ sub overlap_segments
 		my $seg = $index->{$segid};
 
 		# smin and smax is range excluded preceeding and trailing zeros
-		my @keys = sort {$a <=> $b} keys $seg->{d};
+		my @keys = sort { $a <=> $b } keys $seg->{d};
 		my $smin = $keys[0];
 		my $smax = $keys[-1];
 
@@ -607,23 +644,23 @@ sub overlap_segments
 		{
 			# first segment doesn't have a preceding segment
 			$seg->{offset} = 1;
-			$seg->{lower} = $min;
-			$seg->{upper} = $smax;
+			$seg->{lower}  = $min;
+			$seg->{upper}  = $smax;
 		}
 		else
 		{
 			# calculate overlap and shift segment location
-			my $prefix		= $smin - $min;
-			my $postfix		= $max  - $smax;
-			my $prevpostfix	= $max - $prev->{upper};
-			my $overlap = $prevpostfix < $prefix ? $prevpostfix : $prefix;
+			my $prefix      = $smin - $min;
+			my $postfix     = $max - $smax;
+			my $prevpostfix = $max - $prev->{upper};
+			my $overlap     = $prevpostfix < $prefix ? $prevpostfix : $prefix;
 
 			$seg->{lower}  = $min + $overlap;
-			$seg->{upper} = $smax;
-			$seg->{offset} =	$prev->{offset} + ($max - $min + 1) - $overlap;
+			$seg->{upper}  = $smax;
+			$seg->{offset} = $prev->{offset} + ($max - $min + 1) - $overlap;
 			$prev->{upper} = $max;
 		}
-		$prev = $seg;
+		$prev  = $seg;
 		$first = 0;
 	}
 
@@ -635,15 +672,15 @@ sub overlap_segments
 #
 # Fills out target pointers in non-leaf index tables.
 #
-# from_table : table to set links
-# to_table   : target table of from_table
+# from_table - table to set links
+# to_table   - target table of from_table
 
 sub make_index_link
 {
 	my ($s, $t) = @_;
-	return if (! defined $s->{i} || ! defined $t->{i});
+	return if (!defined $s->{i} || !defined $t->{i});
 
-	my @tkeys = sort {$a <=> $b} keys $t->{i};
+	my @tkeys = sort { $a <=> $b } keys $t->{i};
 
 	if ($s->{attr}{segmented})
 	{
@@ -652,10 +689,12 @@ sub make_index_link
 			foreach my $k2 (keys $s->{i}{$k1}{d})
 			{
 				my $tsegid = $s->{i}{$k1}{d}{$k2}{segid};
-				if (! defined $tsegid)
+				if (!defined $tsegid)
 				{
-					die sprintf("segid is not set in %s{i}{%x}{d}{%x}{segid}",
-								$s->{attr}{name}, $k1, $k2);
+					die sprintf(
+						"segid is not set in %s{i}{%x}{d}{%x}{segid}",
+						$s->{attr}{name},
+						$k1, $k2);
 				}
 				$s->{i}{$k1}{d}{$k2}{segoffset} = $t->{i}{$tsegid}{offset};
 			}
@@ -666,10 +705,10 @@ sub make_index_link
 		foreach my $k (keys $s->{i})
 		{
 			my $tsegid = $s->{i}{$k}{segid};
-			if (! defined $tsegid)
+			if (!defined $tsegid)
 			{
 				die sprintf("segid is not set in %s{i}{%x}{segid}",
-							$s->{attr}{name}, $k);
+					$s->{attr}{name}, $k);
 			}
 			$s->{i}{$k}{segoffset} = $t->{i}{$tsegid}{offset};
 		}
@@ -682,16 +721,16 @@ sub make_index_link
 # print_radix_table(hd, table, tblname, width)
 # returns 1 if the table is written
 #
-# hd      : file handle to write
-# table   : ref to an index table
-# tblname : C symbol name for the table
-# width   : width in characters of this table
+# hd      - file handle to write
+# table   - ref to an index table
+# tblname - C symbol name for the table
+# width   - width in characters of this table
 
 sub print_radix_table
 {
-	my($hd, $table, $tblname, $width) = @_;
+	my ($hd, $table, $tblname, $width) = @_;
 
-	return 0 if (! defined $table->{i});
+	return 0 if (!defined $table->{i});
 
 	if ($table->{attr}{chartbl})
 	{
@@ -714,53 +753,54 @@ sub print_radix_table
 # print_chars_table(hd, table, tblname, width)
 # this is usually called via writ_table
 #
-# hd      : file handle to write
-# table   : ref to an index table
-# tblname : C symbol name for the table
-# tblwidth: width in characters of this table
+# hd      - file handle to write
+# table   - ref to an index table
+# tblname - C symbol name for the table
+# tblwidth- width in characters of this table
 
 sub print_chars_table
 {
-	my($hd, $table, $tblname, $width) = @_;
-	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
-	my($type) = $table->{attr}{is32bit} ? "uint32" : "uint16";
+	my ($hd, $table, $tblname, $width) = @_;
+	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
+	my ($type) = $table->{attr}{is32bit} ? "uint32" : "uint16";
 
-	printf(OUT "static const %s %s[] =\n{", $type, $tblname);
-	printf(OUT " /* chars content - index range = [%02x, %02x] */", $st, $ed);
+	printf $hd "static const %s %s[] =\n{", $type, $tblname;
+	printf $hd " /* chars content - index range = [%02x, %02x] */", $st, $ed;
 
 	# values in character table are written in fixedwidth
 	# hexadecimals.  calculate the number of columns in a line. 13 is
 	# the length of line header.
 
-	my $colwidth = $table->{attr}{width};
-	my $colseplen = 4; # the length of  ", 0x"
+	my $colwidth     = $table->{attr}{width};
+	my $colseplen    = 4;                       # the length of  ", 0x"
 	my $headerlength = 13;
-	my $colnum = int(($width - $headerlength)  / ($colwidth + $colseplen));
+	my $colnum = int(($width - $headerlength) / ($colwidth + $colseplen));
 
 	# round down to multiples of 4. don't bother by too small table width
 	my $colnum = int($colnum / 4) * 4;
-	my $line = "";
+	my $line   = "";
 	my $first0 = 1;
+
 	# output all segments in segment id order
-	foreach my $k (sort {$a <=> $b} keys $table->{i})
+	foreach my $k (sort { $a <=> $b } keys $table->{i})
 	{
 		my $s = $table->{i}{$k};
 		if (!$first0)
 		{
-			$line =~ s/\s+$//;		# remove trailing space
+			$line =~ s/\s+$//;    # remove trailing space
 			print $hd $line, ",\n";
 			$line = "";
 		}
 		$first0 = 0;
 
 		# write segment header
-		printf($hd "\n  /*** %4sxx - offset 0x%05x ***/",
-			   $s->{label}, $s->{offset});
+		printf $hd "\n  /*** %4sxx - offset 0x%05x ***/",
+		  $s->{label}, $s->{offset};
 
 		# write segment content
 		my $first1 = 1;
 		my ($segstart, $segend) = ($s->{lower}, $s->{upper});
-		my($xpos, $nocomma) = (0, 0);
+		my ($xpos, $nocomma) = (0, 0);
 
 		foreach my $j (($segstart - ($segstart % $colnum)) .. $segend)
 		{
@@ -770,7 +810,7 @@ sub print_chars_table
 			# new line if this is the first time or this line is full
 			if ($xpos >= $colnum || $first1)
 			{
-				$line =~ s/\s+$//;	# remove trailing space
+				$line =~ s/\s+$//;    # remove trailing space
 				print $hd $line, "\n";
 				$line = sprintf("  /* %02x */ ", $j);
 				$xpos = 0;
@@ -808,35 +848,34 @@ sub print_chars_table
 # print_flat_table(hd, table, tblname, width)
 # this is usually called via writ_table
 #
-# hd      : file handle to write
-# table   : ref to an index table
-# tblname : C symbol name for the table
-# width   : width in characters of this table
+# hd      - file handle to write
+# table   - ref to an index table
+# tblname - C symbol name for the table
+# width   - width in characters of this table
 
 sub print_flat_table
 {
-	my($hd, $table, $tblname, $width) = @_;
-	my($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
+	my ($hd, $table, $tblname, $width) = @_;
+	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
-	print $hd "static const $radix_node_type $tblname =\n{";
-	printf($hd "\n  0x%x, 0x%x, /* table range */\n", $st, $ed);
-	print $hd "  {";
+	print  $hd "static const $radix_node_type $tblname =\n{";
+	printf $hd "\n  0x%x, 0x%x, /* table range */\n", $st, $ed;
+	print  $hd "  {";
 
 	my $first = 1;
-	my $line = "";
+	my $line  = "";
 
 	foreach my $i ($st .. $ed)
 	{
 		$line .= "," if (!$first);
 		my $newitem = sprintf("%d",
-							  defined $table->{i}{$i} ?
-							  $table->{i}{$i}{segoffset} : 0);
+			defined $table->{i}{$i} ? $table->{i}{$i}{segoffset} : 0);
 
 		# flush current line and feed a line if the current line
 		# exceeds a limit
-		if ($first || length($line.$newitem) > $width)
+		if ($first || length($line . $newitem) > $width)
 		{
-			$line =~ s/\s+$//;		# remove trailing space
+			$line =~ s/\s+$//;    # remove trailing space
 			print $hd "$line\n";
 			$line = "    ";
 		}
@@ -857,46 +896,49 @@ sub print_flat_table
 # print_segmented_table(hd, table, tblname, width)
 # this is usually called via writ_table
 #
-# hd      : file handle to write
-# table   : ref to an index table
-# tblname : C symbol name for the table
-# width   : width in characters of this table
+# hd      - file handle to write
+# table   - ref to an index table
+# tblname - C symbol name for the table
+# width   - width in characters of this table
 
 sub print_segmented_table
 {
-	my($hd, $table, $tblname, $width) = @_;
+	my ($hd, $table, $tblname, $width) = @_;
 	my ($st, $ed) = ($table->{attr}{min}, $table->{attr}{max});
 
 	# write the variable definition
 	print $hd "static const $radix_node_type $tblname =\n{";
-	printf($hd "\n  0x%02x, 0x%02x,		/*index range */\n  {",  $st, $ed);
+	printf $hd "\n  0x%02x, 0x%02x,		/*index range */\n  {", $st, $ed;
 
 	my $first0 = 1;
-	foreach my $k (sort {$a <=> $b} keys $table->{i})
+	foreach my $k (sort { $a <=> $b } keys $table->{i})
 	{
 		print $hd ",\n" if (!$first0);
 		$first0 = 0;
-		printf($hd "\n  /*** %sxxxx - offset 0x%05x ****/",
-			   $table->{i}{$k}{label}, $table->{i}{$k}{offset});
+		printf $hd "\n  /*** %sxxxx - offset 0x%05x ****/",
+		  $table->{i}{$k}{label}, $table->{i}{$k}{offset};
 
 		my $segstart = $table->{i}{$k}{lower};
-		my $segend	 = $table->{i}{$k}{upper};
+		my $segend   = $table->{i}{$k}{upper};
 
-		my $line = "";
-		my $first1 = 1;
+		my $line    = "";
+		my $first1  = 1;
 		my $newitem = "";
 
 		foreach my $j ($segstart .. $segend)
 		{
 			$line .= "," if (!$first1);
-			$newitem = sprintf("%d", $table->{i}{$k}{d}{$j} ?
-							   $table->{i}{$k}{d}{$j}{segoffset} : 0);
+			$newitem = sprintf("%d",
+				  $table->{i}{$k}{d}{$j}
+				? $table->{i}{$k}{d}{$j}{segoffset}
+				: 0);
 
-			if ($first1 || length($line.$newitem) > $width)
+			if ($first1 || length($line . $newitem) > $width)
 			{
 				$line =~ s/\s+$//;
-				print OUT "$line\n";
-				$line = sprintf("  /* %2s%02x */ ", $table->{i}{$k}{label}, $j);
+				print $hd "$line\n";
+				$line =
+				  sprintf("  /* %2s%02x */ ", $table->{i}{$k}{label}, $j);
 			}
 			else
 			{
@@ -919,7 +961,7 @@ sub make_table_refname
 {
 	my ($table, $prefix) = @_;
 
-	return "NULL" if (! defined $table->{i});
+	return "NULL" if (!defined $table->{i});
 	return "&" . $prefix . $table->{attr}{name};
 }
 
@@ -928,23 +970,25 @@ sub make_table_refname
 #
 # write main radix tree table
 #
-# hd         : file handle to write this table
-# tblname    : variable name of this struct
-# trie       : ref to a radix tree
-# name_prefix: prefix for subtables.
+# hd         - file handle to write this table
+# tblname    - variable name of this struct
+# trie       - ref to a radix tree
+# name_prefix- prefix for subtables.
 
 sub print_radix_main
 {
 	my ($hd, $tblname, $trie, $name_prefix) = @_;
-	my $ctblname = $name_prefix.$trie->{csegs}{attr}{name};
+	my $ctblname = $name_prefix . $trie->{csegs}{attr}{name};
 	my ($ctbl16name, $ctbl32name);
 	if ($trie->{csegs}{attr}{is32bit})
 	{
-		$ctbl16name = "NULL";  $ctbl32name = $ctblname;
+		$ctbl16name = "NULL";
+		$ctbl32name = $ctblname;
 	}
 	else
 	{
-		$ctbl16name = $ctblname;  $ctbl32name = "NULL";
+		$ctbl16name = $ctblname;
+		$ctbl32name = "NULL";
 	}
 
 	my $b2iname  = make_table_refname($trie->{b2idx}[0], $name_prefix);
@@ -954,12 +998,13 @@ sub print_radix_main
 	my $b4i2name = make_table_refname($trie->{b4idx}[1], $name_prefix);
 	my $b4i3name = make_table_refname($trie->{b4idx}[2], $name_prefix);
 
+	#<<< do not let perltidy touch this
 	print  $hd "static const $radix_type $tblname =\n{\n";
 	print  $hd "	/* final character table offset and body */\n";
-	printf($hd "	0x%x, 0x%x, %s, %s, %s,\n",
-		   $trie->{csegs}{attr}{min}, $trie->{csegs}{attr}{max},
-		   $trie->{csegs}{attr}{has0page} ? 'true' : 'false',
-		   $ctbl16name, $ctbl32name);
+	printf $hd "	0x%x, 0x%x, %s, %s, %s,\n",
+	  $trie->{csegs}{attr}{min}, $trie->{csegs}{attr}{max},
+	  $trie->{csegs}{attr}{has0page} ? 'true' : 'false',
+	  $ctbl16name, $ctbl32name;
 
 	print  $hd "	/* 2-byte code table */\n";
 	print  $hd "	$b2iname,\n";
@@ -968,6 +1013,7 @@ sub print_radix_main
 	print  $hd "	/* 4-byte code table */\n";
 	print  $hd "	{$b4i1name, $b4i2name, $b4i3name},\n";
 	print  $hd "};\n";
+	#>>>
 }
 
 ######################################################
@@ -975,15 +1021,15 @@ sub print_radix_main
 #     with checking duplicate source code
 #
 # make_charmap(\@charset, $direction)
-# charset     : ref to charset table : see print_tables
-# direction   : conversion direction
+# charset     - ref to charset table : see print_tables
+# direction   - conversion direction
 
 sub make_charmap
 {
 	my ($charset, $direction) = @_;
 
 	die "unacceptable direction : %direction"
-		if ($direction ne "to_unicode" && $direction ne "from_unicode");
+	  if ($direction ne "to_unicode" && $direction ne "from_unicode");
 
 	my %charmap;
 	foreach my $c (@$charset)
@@ -994,14 +1040,15 @@ sub make_charmap
 		next if (defined $c->{ucs_second});
 
 		my ($src, $dst) =
-			$direction eq "to_unicode" ?
-			($c->{code}, $c->{ucs}) : ($c->{ucs}, $c->{code});
+		  $direction eq "to_unicode"
+		  ? ($c->{code}, $c->{ucs})
+		  : ($c->{ucs}, $c->{code});
 
-		if (defined $c{$src})
+		if (defined $c->{$src})
 		{
-			printf(STDERR
-				   "Error: duplicate source code: 0x%04x => 0x%04x, 0x%04x\n",
-				   $src, $c{$src}, $dst);
+			printf STDERR
+			  "Error: duplicate source code: 0x%04x => 0x%04x, 0x%04x\n",
+			  $src, $c->{$src}, $dst;
 			exit;
 		}
 		if ($direction eq "to_unicode")
@@ -1010,7 +1057,7 @@ sub make_charmap
 		}
 		else
 		{
-			$charmap{ucs2utf($src)} = $dst;
+			$charmap{ ucs2utf($src) } = $dst;
 		}
 
 	}
@@ -1024,11 +1071,11 @@ sub make_charmap
 #
 # print_radix_map($this_script, $csname, $direction, \%charset, $tblwidth)
 #
-# this_script : the name of the *caller script* of this feature
-# csname      : character set name other than ucs
-# direction   : desired direction "to_unicode" or "from_unicode"
-# charset     : ref to character set array
-# tblwidth    : width in characters of output source file
+# this_script - the name of the *caller script* of this feature
+# csname      - character set name other than ucs
+# direction   - desired direction "to_unicode" or "from_unicode"
+# charset     - ref to character set array
+# tblwidth    - width in characters of output source file
 
 sub print_radix_map
 {
@@ -1037,11 +1084,11 @@ sub print_radix_map
 	my $charmap = &make_charmap($charset, $direction);
 	my $trie = &generate_index($charmap);
 	my $fname =
-		$direction eq "to_unicode" ?
-		lc("${csname}_to_utf8_radix.map") :
-		lc("utf8_to_${csname}_radix.map");
+	  $direction eq "to_unicode"
+	  ? lc("${csname}_to_utf8_radix.map")
+	  : lc("utf8_to_${csname}_radix.map");
 
-	my $tblname =  lc("${csname}_${direction}_tree");
+	my $tblname     = lc("${csname}_${direction}_tree");
 	my $name_prefix = lc("${csname}_${direction}_");
 
 	if ($direction eq "to_unicode")
@@ -1053,22 +1100,23 @@ sub print_radix_map
 		print "- Writing UTF8=>${csname} conversion radix index: $fname\n";
 	}
 
-	open(OUT, "> $fname") || die("cannot open $fname");
+	open(my $out, '>', $fname) || die("cannot open $fname");
 
-	print OUT "/* This file is generated by $this_script */\n\n";
+	print $out "/* src/backend/utils/mb/Unicode/$fname */\n"
+	  . "/* This file is generated by $this_script */\n\n";
 
-	foreach my $t (@{$trie->{all}})
+	foreach my $t (@{ $trie->{all} })
 	{
-		my $table_name = $name_prefix.$t->{attr}{name};
+		my $table_name = $name_prefix . $t->{attr}{name};
 
-		if (&print_radix_table(*OUT, $t, $table_name, $tblwidth))
+		if (&print_radix_table($out, $t, $table_name, $tblwidth))
 		{
-			print OUT "\n";
+			print $out "\n";
 		}
 	}
 
-	&print_radix_main(*OUT, $tblname, $trie, $name_prefix);
-	close(OUT);
+	&print_radix_main($out, $tblname, $trie, $name_prefix);
+	close($out);
 }
 
 
@@ -1077,27 +1125,28 @@ sub print_radix_map
 #
 # print_radix_trees($this_script, $csname, \%charset)
 #
-# this_script : the name of the *caller script* of this feature
-# csname      : character set name other than ucs
-# charset     : ref to character set array
+# this_script - the name of the *caller script* of this feature
+# csname      - character set name other than ucs
+# charset     - ref to character set array
 sub print_radix_trees
 {
 	my ($this_script, $csname, $charset) = @_;
 
 	&print_radix_map($this_script, $csname, "from_unicode", $charset, 78);
-	&print_radix_map($this_script, $csname, "to_unicode", $charset, 78);
+	&print_radix_map($this_script, $csname, "to_unicode",   $charset, 78);
 }
 
 sub dump_charset
 {
 	my ($list, $filt) = @_;
 
-	foreach my $i (@$list) {
+	foreach my $i (@$list)
+	{
 		next if (defined $filt && !&$filt($i));
 		if (!defined $i->{ucs}) { $i->{ucs} = &utf2ucs($i->{utf8}); }
-		printf("ucs=%x, code=%x, direction=%s %s:%d %s\n",
-			   $i->{ucs}, $i->{code}, $i->{direction},
-			   $i->{f}, $i->{l}, $i->{comment});
+		printf "ucs=%x, code=%x, direction=%s %s:%d %s\n",
+		  $i->{ucs}, $i->{code}, $i->{direction},
+		  $i->{f},   $i->{l},    $i->{comment};
 	}
 }
 
diff --git a/src/backend/utils/mb/Unicode/make_mapchecker.pl b/src/backend/utils/mb/Unicode/make_mapchecker.pl
index 0e1cbb6..c96457c 100755
--- a/src/backend/utils/mb/Unicode/make_mapchecker.pl
+++ b/src/backend/utils/mb/Unicode/make_mapchecker.pl
@@ -3,7 +3,7 @@
 use strict;
 
 opendir(my $dh, ".") || die "failed to open directory: .";
-my @radixmaps = grep {/_radix\.map$/} readdir($dh);
+my @radixmaps = grep { /_radix\.map$/ } readdir($dh);
 closedir($dh);
 
 my %plainmaps;
@@ -13,7 +13,7 @@ foreach my $rmap (@radixmaps)
 {
 	my $pmap = $rmap;
 	$pmap =~ s/_radix//;
-	if (! -e $pmap)
+	if (!-e $pmap)
 	{
 		die("radix map \"$rmap\" has no corresponding plain map\n");
 	}
@@ -22,15 +22,15 @@ foreach my $rmap (@radixmaps)
 
 # Generate sanity checker source
 my $out;
-open($out, ">map_checker.h") ||
-	die "cannot open file to write: map_checker.c";
+open($out, ">map_checker.h")
+  || die "cannot open file to write: map_checker.h";
 foreach my $i (sort @radixmaps)
 {
 	print $out "#include \"$i\"\n";
 	print $out "#include \"$plainmaps{$i}\"\n";
 }
 
-my @mapnames = map {s/\.map//;$_} values %plainmaps;
+my @mapnames = map { s/\.map//; $_ } values %plainmaps;
 
 print $out <<'EOF';
 
@@ -49,12 +49,14 @@ foreach my $m (@mapnames)
 	if ($m =~ /^utf8_to_(.*)$/)
 	{
 		my $e = uc($1);
-		print $out "	{\"$m\", lengthof(ULmap$e), NULL, ULmap$e, &$1_from_unicode_tree}";
+		print $out
+"	{\"$m\", lengthof(ULmap$e), NULL, ULmap$e, &$1_from_unicode_tree}";
 	}
-	elsif  ($m =~ /^(.*)_to_utf8$/)
+	elsif ($m =~ /^(.*)_to_utf8$/)
 	{
 		my $e = uc($1);
-		print $out "	{\"$m\", lengthof(LUmap$e), LUmap$e, NULL, &$1_to_unicode_tree}";
+		print $out
+		  "	{\"$m\", lengthof(LUmap$e), LUmap$e, NULL, &$1_to_unicode_tree}";
 	}
 	else
 	{
-- 
2.9.2

>From 6a64c014efdc801f567d5c962c160ca0442c147a Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horiguchi.kyot...@lab.ntt.co.jp>
Date: Tue, 8 Nov 2016 19:20:06 +0900
Subject: [PATCH 2/2] Modify makefile.

Change makefile so that 'make distclean' removes authority files since
they should not be contained in source archive. On the other hand
'make maintainer-clean' leaves them and removes all map files. This
seems somewhat strange but it comes from the special characteristics
of this directory.
---
 src/backend/utils/mb/Unicode/Makefile | 19 +++++++++----------
 1 file changed, 9 insertions(+), 10 deletions(-)

diff --git a/src/backend/utils/mb/Unicode/Makefile b/src/backend/utils/mb/Unicode/Makefile
index b3f681c..f184f65 100644
--- a/src/backend/utils/mb/Unicode/Makefile
+++ b/src/backend/utils/mb/Unicode/Makefile
@@ -79,7 +79,8 @@ SPECIALTEXTS = BIG5.TXT CNS11643.TXT \
 	CP932.TXT CP950.TXT \
 	JIS0201.TXT JIS0208.TXT JIS0212.TXT SHIFTJIS.TXT \
 	JOHAB.TXT KSX1001.TXT windows-949-2000.xml \
-	euc-jis-2004-std.txt sjis-0213-2004-std.txt
+	euc-jis-2004-std.txt sjis-0213-2004-std.txt \
+	gb-18030-2000.xml
 
 GENERICTEXTS = $(ISO8859TEXTS) $(WINTEXTS) \
 	KOI8-R.TXT KOI8-U.TXT
@@ -135,11 +136,12 @@ euc_jis_2004_to_utf8.map euc_jis_2004_to_utf8_radix.map euc_jis_2004_to_utf8_com
 shift_jis_2004_to_utf8.map shift_jis_2004_to_utf8_radix.map shift_jis_2004_to_utf8_combined.map utf8_to_shift_jis_2004.map utf8_to_shift_jis_2004_radix.map utf8_to_shift_jis_2004_combined.map: UCS_to_SHIFT_JIS_2004.pl sjis-0213-2004-std.txt
 	$(PERL) $<
 
-distclean: clean
-	rm -f $(TEXTS) $(GENERICMAPS) $(SPECIALMAPS)
+distclean:
+	rm -f $(TEXTS) $(GENERICMAPS) $(SPECIALMAPS) $(OBJS) $(BINS) map_checker.h
 
-maintainer-clean: distclean
-	rm -f $(MAPS) $(RADIXMAPS) $(OBJS) $(BINS) map_checker.h
+# maintainer-clean intentionally leaves $(TEXTS)
+maintainer-clean:
+	rm -f $(MAPS) $(RADIXMAPS) $(GENERICMAPS) $(SPECIALMAPS) $(OBJS) $(BINS) map_checker.h
 
 mapcheck: $(MAPS) $(RADIXMAPS) map_checker
 	./map_checker
@@ -150,15 +152,12 @@ DOWNLOAD = wget -O $@ --no-use-server-timestamps
 BIG5.TXT CNS11643.TXT:
 	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/$(@F)
 
-gb-18030-2000.xml:
+gb-18030-2000.xml windows-949-2000.xml:
 	$(DOWNLOAD) http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
 
 euc-jis-2004-std.txt sjis-0213-2004-std.txt:
 	$(DOWNLOAD) http://x0213.org/codetable/$(@F)
 
-gb-18030-2000.xml:
-	$(DOWNLOAD) https://ssl.icu-project.org/repos/icu/data/trunk/charset/data/xml/$(@F)
-
 GB2312.TXT:
 	$(DOWNLOAD) 'http://trac.greenstone.org/browser/trunk/gsdl/unicode/MAPPINGS/EASTASIA/GB/GB2312.TXT?rev=1842&format=txt'
 
@@ -174,7 +173,7 @@ KOI8-R.TXT KOI8-U.TXT:
 $(ISO8859TEXTS):
 	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/ISO8859/$(@F)
 
-$(filter-out CP8%,$(WINTEXTS)):
+$(filter-out CP8%,$(WINTEXTS)) $(filter CP9%, $(SPECIALTEXTS)):
 	$(DOWNLOAD) http://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/$(@F)
 
 $(filter CP8%,$(WINTEXTS)):
-- 
2.9.2

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Radix tree for character conversion

Reply via email to