Hi, this is an intermediate report without a patch. At Thu, 26 Jan 2017 21:42:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote in <20170126.214212.111556326.horiguchi.kyot...@lab.ntt.co.jp> > > > 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > > > > > > Before adding radix tree stuff, applied pgperltidy and inserted > > > format-skipping pragma for the parts where perltidy seems to do > > > too much. > > > > Which version of perltidy did you use? Looking at the archives, the > > perl code is cleaned up with a specific version, v20090616. See > > https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com > > for example on the matter. As perltidy changes over time, this may be > > a sensitive change if done this way. > > Hmm. I will make a confirmation on that.. tomorrow.
My perltidy -v said "v20121207'. Anyway, I gave up to apply perltidy by myself. So I'll just drop 0003 and new 0004 (name changed to 0003) is made immediately on 0002. > > > 0004-Use-radix-tree-for-character-conversion.patch > > > > > > Radix tree body. > > > > Well, here a lot of diffs could have been saved. > > > > > The unattached fifth patch is generated by the following steps. > > > > > > [$(TOP)]$ ./configure > > > [Unicode]$ make > > > [Unicode]$ make distclean > > > [Unicode]$ git add . > > > [Unicode]$ commit > > > === COMMITE MESSSAGE > > > Replace map files with radix tree files. > > > > > > These encodings no longer uses the former map files and uses new radix > > > tree files. > > > === > > > > OK, I can see that working, with 200k of maps generated.. So going > > through the important bits of this jungle.. > > Many thaks for the exploration. > > > +/* > > + * radix tree conversion function - this should be identical to the > > function in > > + * ../conv.c with the same name .. > > This is not nice. Having a duplication like that is a recipe to forget > > about it as this patch introduces a dependency with conv.c and the > > radix tree generation. In the attatched patch, mb/char_conveter.c which contains one inline function is created and it is includ'ed from mb/conv.c and mb/Unicode/map_checker.c. > > Having a .gitignore in Unicode/ would be nice, particularly to avoid > > committing map_checker. I missed this. I added .gitignore to ignore map_checker stuff and authority files and old-style map files. > > A README documenting things may be welcome, or at least comments at > > the top of map_checker.c. Why is map_checker essential? What does it > > do? There is no way to understand that easily, except that it includes > > a "radix tree conversion function", and that it performs sanity checks > > on the radix trees to be sure that they are on a good shape. But as > > this something that one would guess only after looking at your patch > > and the code (at least I will sleep less stupid tonight after reading > > this stuff). > > Okay, I'll do that. The patch has not been provided yet, I'm going to put the following comment just before the main() in map_checker.c. /* * The old-style plain map files were error-resistant due to its * straight-forward way for generation from authority files. In contrast the * radix tree maps are generated by a rather complex calculation and have a * complex, hard-to-confirm format. * * This program runs sanity check of the radix tree maps by confirming all * characters in the plain map files to be converted to the same code by the * corresponding radix tree map. * * All map files are included by map_checker.h that is generated by the script * make_mapchecker.pl as the variable mappairs. * */ I'll do the following things later. > > --- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl > > +++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl > > # Drop these SJIS codes from the source for UTF8=>SJIS conversion > > #<<< do not let perltidy touch this > > -my @reject_sjis =( > > +my @reject_sjis = ( > > 0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782, > > - 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, > > + 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, > > 0x879a..0x879c > > -); > > + ); > > This is not generated, it would be nice to drop the noise from the patch. > > Mmm. I'm not sure how this is generated but I'll care for that. > > > Here is another one: > > - $i->{code} = $jis | ( > > - $jis < 0x100 > > - ? 0x8e00 > > - : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); > > - > > +#<<< do not let perltidy touch this > > + $i->{code} = $jis | ($jis < 0x100 ? 0x8e00: > > + ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); > > +#>>> > > Ok. Will revert this. > > > if (l == 2) > > { > > - iutf = *utf++ << 8; > > - iutf |= *utf++; > > + b3 = *utf++; > > + b4 = *utf++; > > } > > Ah, OK. This conversion is important so as it performs a minimum of > > bitwise operations. Yes let's keep that. That's pretty cool to get a > > faster operation. > > It is Heikki's work:p > > I'll address them and repost the next version sooner. -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers