FYI, the self-generated source code from the Unicode Character Database
and some related algorithms were updated with latest official version of
the standard (5.1), released in April 4, 2008.
BTW, I think would be good to write some documentation explanining how
to use the build-aux/ programs to update the autogenerated Unicode
source code.
What do you think? A README.Unicode file in build-aux/ could contain
those explanations.
I was planning to explain it in the Wiki. But anyway, the way to update
it is just by calling `pdf-text-download-and-generate-ucd.sh' with no
additional arguments. The script will download the required Unicode
Character Database files and will generate the source code. The real
thing that should be documented is where (in which file) should be
placed each self-generated source. As soon as I finish the unit tests
for the text module, I will provide a full explanation both in the wiki
and in a README.Unicode as you suggested.