On Fri, 26 Nov 2010 08:40:42 -0500, Jean-Christophe Deschamps <j...@q-e-d.org> wrote:
> At 14:26 26/11/2010, [Samuel Adam <a...@certifound.com>] wrote: > >> N.b., there is a severe bug (pointers calculated based on truncated >> 16-bit >> values above plane-0) in a popular Unicode-properties SQLite extension. >> […] > > I believe you refer to Ioannis code. Yes. > I found this 16-bit truncation > and decided to expand that trie to 32-bit in order to support those > characters correctly. With due regard to the fact that Mr. Deschamps evidently wrote working code and I thus far apparently have not, I have a suggestion as to space/time tradeoffs. 32 bits to cover Unicode’s 21-bit space always irked me. 24 bits won’t do due to alignment issues, and 16 bits is just too small. However: (a) 99% of usage in 99% of apps is confined to the Basic Multilingual Plane (Plane 0). [Source: The same fundament as from which springs the majority of published statistics.] (b) Modern operating systems typically load executables (including libraries) using memory mapping. If RAM is constrained, an intelligent virtual memory subsystem will leave any unused tables on disk most of the time, only to be faulted-in for the 1% cases. (c) A code path which uses 16-bit-based tables for the BMP, and only invokes a separate path through 32-bit-based tables for Planes 1–16, will permit *smaller, less-wasteful* tables to be the ones kept in RAM for the 99% cases. (No) thanks to contemporary chip architects, the problem thence becomes how best to effect these in-practice space savings without unacceptable time loss (usually in a tight loop) for extra branching. For now, all I can say is that goto is a smart programmer’s intimate companion. Unicode properties and characteristically similar data being quite commonly needed, I suspect such a method would have uses far beyond SQLite. (Perhaps I should patent it sometime within the next 365 days. <g>) Very truly, Samuel Adam <a...@certifound.com> 763 Montgomery Road Hillsborough, NJ 08844-1304 United States http://certifound.com/ _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users