On Fri, 26 Nov 2010 08:40:42 -0500, Jean-Christophe Deschamps  
<j...@q-e-d.org> wrote:

> At 14:26 26/11/2010, [Samuel Adam <a...@certifound.com>] wrote:
>
>> N.b., there is a severe bug (pointers calculated based on truncated
>> 16-bit
>> values above plane-0) in a popular Unicode-properties SQLite extension.
>> […]
>
> I believe you refer to Ioannis code.

Yes.

> I found this 16-bit truncation
> and decided to expand that trie to 32-bit in order to support those
> characters correctly.

With due regard to the fact that Mr. Deschamps evidently wrote working  
code and I thus far apparently have not, I have a suggestion as to  
space/time tradeoffs.

32 bits to cover Unicode’s 21-bit space always irked me.  24 bits won’t do  
due to alignment issues, and 16 bits is just too small.  However:

        (a) 99% of usage in 99% of apps is confined to the Basic Multilingual  
Plane (Plane 0).  [Source: The same fundament as from which springs the  
majority of published statistics.]

        (b) Modern operating systems typically load executables (including  
libraries) using memory mapping.  If RAM is constrained, an intelligent  
virtual memory subsystem will leave any unused tables on disk most of the  
time, only to be faulted-in for the 1% cases.

        (c) A code path which uses 16-bit-based tables for the BMP, and only  
invokes a separate path through 32-bit-based tables for Planes 1–16, will  
permit *smaller, less-wasteful* tables to be the ones kept in RAM for the  
99% cases.

(No) thanks to contemporary chip architects, the problem thence becomes  
how best to effect these in-practice space savings without unacceptable  
time loss (usually in a tight loop) for extra branching.  For now, all I  
can say is that goto is a smart programmer’s intimate companion.

Unicode properties and characteristically similar data being quite  
commonly needed, I suspect such a method would have uses far beyond  
SQLite.  (Perhaps I should patent it sometime within the next 365 days.  
<g>)

Very truly,

Samuel Adam <a...@certifound.com>
763 Montgomery Road
Hillsborough, NJ  08844-1304
United States
http://certifound.com/
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to