On Feb 12, 2014, at 5:08 PM, Tito Ciuro <[email protected]> wrote: > This is taken verbatim from the "Getting Started with CouchDB" book, page 49:
Hm, I have not seen that book. But I agree that the general documentation situation is not good. At least the online docs are better than they used to be. > [...] If we want to restrict it to those starting with Apricot, we can use > the UTF-8 sorting to our advantage. If we add the UTF-8 character 007F to > ‘Apricot’, the range will only include recipes with the title starting with > Apricot, even if the document ID contains other characters. I see what they're getting at — it's the same trick as adding a "z" as a suffix (endkey="apricots") to stop at a key that starts with anything beyond "apricot", except that they're intending \u007F as a sort of "super-z" that sorts greater than anything else. But that's wrong, because CouchDB doesn't use UTF-8 sorting, it uses Unicode sorting. From the wiki: "Comparison of strings is done using ICU which implements the Unicode Collation Algorithm…" [1] So a \u007f character isn't a particularly high value; it's greater than any ASCII character but lower than anything else including other non-English Roman characters. On the same page[2] the wiki suggests using the character \ufff0 as a suffix for this purpose. That sounds more reasonable, although the details depend on whether the collation is really being done on true Unicode code points or a UTF-16 encoding. If the former, \ufff0 isn't at the top of the range and things like emoji will sort after it. I hope you see that this is simply a trick of string range comparisons, not a special CouchDB feature — you could use the same trick in a SQL query (if the database were sufficiently Unicode-savvy.) > Let’s see that in action: > http://127.0.0.1:5984/recipes/_design/simple/_view/by_title?startkey=%22Apricot > %22&endkey=%22Apricot%007F%22 This looks like a typo: they must have meant %7F but wrote %007F instead. —Jens [1] http://wiki.apache.org/couchdb/View_collation#Collation_Specification [2] http://wiki.apache.org/couchdb/View_collation#String_Ranges
