On 26.08.18 21:00, Heinrich Schuchardt wrote: > On 08/26/2018 08:22 PM, Alexander Graf wrote: >> >> >> On 11.08.18 17:28, Heinrich Schuchardt wrote: >>> This patch provides a define to initialize a table that maps lower to >>> capital letters for Unicode code point 0x0000 - 0xffff. >>> >>> Signed-off-by: Heinrich Schuchardt <xypron.g...@gmx.de> >>> --- >>> MAINTAINERS | 1 + >>> include/capitalization.h | 1909 ++++++++++++++++++++++++++++++++++++++ >>> 2 files changed, 1910 insertions(+) >>> create mode 100644 include/capitalization.h >>> >>> diff --git a/MAINTAINERS b/MAINTAINERS >>> index a324139471..0a543309f2 100644 >>> --- a/MAINTAINERS >>> +++ b/MAINTAINERS >>> @@ -368,6 +368,7 @@ F: doc/DocBook/efi.tmpl >>> F: doc/README.uefi >>> F: doc/README.iscsi >>> F: Documentation/efi.rst >>> +F: include/capitalization.h >>> F: include/efi* >>> F: include/pe.h >>> F: include/asm-generic/pe.h >>> diff --git a/include/capitalization.h b/include/capitalization.h >>> new file mode 100644 >>> index 0000000000..50d5108f98 >>> --- /dev/null >>> +++ b/include/capitalization.h >>> @@ -0,0 +1,1909 @@ >>> +/* SPDX-License-Identifier: Unicode-DFS-2016 */ >>> +/* >>> + * Correspondence table for small and capital Unicode letters in the range >>> of >>> + * 0x0000 - 0xffff based on >>> http://www.unicode.org/Public/UCA/11.0.0/allkeys.txt >>> + */ >>> + >>> +struct capitalization_table { >>> + u16 upper; >>> + u16 lower; >>> +}; >>> + >>> +#define UNICODE_CAPITALIZATION_TABLE { \ >> >> Ugh, that is a *lot* of data. How much does the binary size grow with >> the table compiled in? >> >> Is there any slightly more sophisticated pattern in the table maybe that >> we could just express as code? Would that turn out smaller maybe? > > This is 3792 bytes of data. Unicode capitalization is quite random in > arranging lower and upper letters. > > We could resort to zlib or gzip. But these libraries are not built by > default.
Yeah, and that only adds to more overhead. > Most urgently we will need the capitalization table for generating and > checking short FAT filenames, so we could create a configuration switch > that would reduce this table to codepage 437 or codepage 1250 letters > depending on the chosen native character set. I think that's a great idea. There probably is a lot of overlap even between the two, so maybe just make it a config option for "non-latin upper/lower case conversion". > In EDK2 I only found code for codepage 1250. Yeah, I'd be surprised if people really needed more. In fact, how about you just default the config option to =n by default? Alex _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot