On Feb 12, 2014, at 5:08 PM, Tito Ciuro <[email protected]> wrote:

> This is taken verbatim from the "Getting Started with CouchDB" book, page 49:

Hm, I have not seen that book. But I agree that the general documentation 
situation is not good. At least the online docs are better than they used to be.

> [...] If we want to restrict it to those starting with Apricot, we can use 
> the UTF-8 sorting to our advantage. If we add the UTF-8 character 007F to 
> ‘Apricot’, the range will only include recipes with the title starting with 
> Apricot, even if the document ID contains other characters.

I see what they're getting at — it's the same trick as adding a "z" as a suffix 
(endkey="apricots") to stop at a key that starts with anything beyond 
"apricot", except that they're intending \u007F as a sort of "super-z" that 
sorts greater than anything else.

But that's wrong, because CouchDB doesn't use UTF-8 sorting, it uses Unicode 
sorting. From the wiki: "Comparison of strings is done using ICU which 
implements the Unicode Collation Algorithm…" [1] So a \u007f character isn't a 
particularly high value; it's greater than any ASCII character but lower than 
anything else including other non-English Roman characters.

On the same page[2] the wiki suggests using the character \ufff0 as a suffix 
for this purpose. That sounds more reasonable, although the details depend on 
whether the collation is really being done on true Unicode code points or a 
UTF-16 encoding. If the former, \ufff0 isn't at the top of the range and things 
like emoji will sort after it.

I hope you see that this is simply a trick of string range comparisons, not a 
special CouchDB feature — you could use the same trick in a SQL query (if the 
database were sufficiently Unicode-savvy.)

> Let’s see that in action:
> http://127.0.0.1:5984/recipes/_design/simple/_view/by_title?startkey=%22Apricot
>  %22&endkey=%22Apricot%007F%22

This looks like a typo: they must have meant %7F but wrote %007F instead.

—Jens

[1] http://wiki.apache.org/couchdb/View_collation#Collation_Specification
[2] http://wiki.apache.org/couchdb/View_collation#String_Ranges

Reply via email to