Thank you for looking this. At Thu, 26 Jan 2017 16:28:16 +0900, Michael Paquier <michael.paqu...@gmail.com> wrote in <cab7npqrel1fsdbgv4zrvaxy+ukts0wzkamjcnyhx0--ozvp...@mail.gmail.com> > On Tue, Jan 10, 2017 at 8:22 PM, Kyotaro HORIGUCHI > <horiguchi.kyot...@lab.ntt.co.jp> wrote: > > [...patch...] > > Nobody has showed up yet to review this patch, so I am giving it a shot. > > The patch file sizes are scary at first sight, but after having a look: > 36 files changed, 1411 insertions(+), 54398 deletions(-) > Yes that's a surprise, something like git diff --irreversible-delete > would have helped as most of the diffs are just caused by 3 files > being deleted in patch 0004, making 50k lines going to the abyss of > deletion.
Thank you. Good to hear that. I'll try that at the next chance. > > Hello, I found a bug in my portion while rebasing. > > Right, that's 0001. Nice catch. > > > The attached files are the following. This patchset is not > > complete missing changes of map files. The change is tremendously > > large but generatable. > > > > 0001-Add-missing-semicolon.patch > > > > UCS_to_EUC_JP.pl has a line missing teminating semicolon. This > > doesn't harm but surely a syntax error. This patch fixes it. > > This might should be a separate patch. > > This requires a back-patch. This makes me wonder how long this script > has actually not run... > > > 0002-Correct-reference-resolution-syntax.patch > > > > convutils.pm has lines with different syntax of reference > > resolution. This unifies the syntax. > > Yes that looks right to me. Yes, I thoght that the three patches can be back-patched, a kind of bug fix. > I am the best perl guru on this list but > looking around $$var{foo} is bad, ${$var}{foo} is better, and > $var->{foo} is even better. This also generates no diffs when running > make in src/backend/utils/mb/Unicode/. So no objections to that. Thank you for the explanation. I think no '$$'s is left alone. > > 0003-Apply-pgperltidy-on-src-backend-utils-mb-Unicode.patch > > > > Before adding radix tree stuff, applied pgperltidy and inserted > > format-skipping pragma for the parts where perltidy seems to do > > too much. > > Which version of perltidy did you use? Looking at the archives, the > perl code is cleaned up with a specific version, v20090616. See > https://www.postgresql.org/message-id/20151204054322.ga2070...@tornado.leadboat.com > for example on the matter. As perltidy changes over time, this may be > a sensitive change if done this way. Hmm. I will make a confirmation on that.. tomorrow. > > 0004-Use-radix-tree-for-character-conversion.patch > > > > Radix tree body. > > Well, here a lot of diffs could have been saved. > > > The unattached fifth patch is generated by the following steps. > > > > [$(TOP)]$ ./configure > > [Unicode]$ make > > [Unicode]$ make distclean > > [Unicode]$ git add . > > [Unicode]$ commit > > === COMMITE MESSSAGE > > Replace map files with radix tree files. > > > > These encodings no longer uses the former map files and uses new radix > > tree files. > > === > > OK, I can see that working, with 200k of maps generated.. So going > through the important bits of this jungle.. Many thaks for the exploration. > +/* > + * radix tree conversion function - this should be identical to the function > in > + * ../conv.c with the same name > + */ > +static inline uint32 > +pg_mb_radix_conv(const pg_mb_radix_tree *rt, > + int l, > + unsigned char b1, > + unsigned char b2, > + unsigned char b3, > + unsigned char b4) > This is not nice. Having a duplication like that is a recipe to forget > about it as this patch introduces a dependency with conv.c and the > radix tree generation. Mmmmm. I agree to you, but conv.c contains unwanted reference to elog or other sutff of the core. Separating the function in a dedicate source file named such as "../pg_mb_radix_conv.c" will work. If it is not so bad, I'll do that in the next version. > Having a .gitignore in Unicode/ would be nice, particularly to avoid > committing map_checker. > > A README documenting things may be welcome, or at least comments at > the top of map_checker.c. Why is map_checker essential? What does it > do? There is no way to understand that easily, except that it includes > a "radix tree conversion function", and that it performs sanity checks > on the radix trees to be sure that they are on a good shape. But as > this something that one would guess only after looking at your patch > and the code (at least I will sleep less stupid tonight after reading > this stuff). Okay, I'll do that. > --- a/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl > +++ b/src/backend/utils/mb/Unicode/UCS_to_SJIS.pl > # Drop these SJIS codes from the source for UTF8=>SJIS conversion > #<<< do not let perltidy touch this > -my @reject_sjis =( > +my @reject_sjis = ( > 0xed40..0xeefc, 0x8754..0x875d, 0x878a, 0x8782, > - 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, > + 0x8784, 0xfa5b, 0xfa54, 0x8790..0x8792, 0x8795..0x8797, > 0x879a..0x879c > -); > + ); > This is not generated, it would be nice to drop the noise from the patch. Mmm. I'm not sure how this is generated but I'll care for that. > Here is another one: > - $i->{code} = $jis | ( > - $jis < 0x100 > - ? 0x8e00 > - : ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); > - > +#<<< do not let perltidy touch this > + $i->{code} = $jis | ($jis < 0x100 ? 0x8e00: > + ($sjis >= 0xeffd ? 0x8f8080 : 0x8080)); > +#>>> Ok. Will revert this. > if (l == 2) > { > - iutf = *utf++ << 8; > - iutf |= *utf++; > + b3 = *utf++; > + b4 = *utf++; > } > Ah, OK. This conversion is important so as it performs a minimum of > bitwise operations. Yes let's keep that. That's pretty cool to get a > faster operation. It is Heikki's work:p I'll address them and repost the next version sooner. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers