Re: Document ID naming: random UUIDs or structured?

Alexander Harm Tue, 05 Jan 2016 01:10:40 -0800

Hello Ronan,

my two cents:


I tend to incorporate the type and possible parent into my id, so in your case 
that would look like

case_1234
finding_1234_f2ac2351
finding_1234_aa928399
note_1234_22933cf5
measure_1234_928dca87

However, I tend to normalise the type and all “ids" into a fixed length e. g. 
case_1234
fndg_1234_f2ac2351
fndg_1234_aa928399
note_1234_22933cf5
msre_1234_928dca87

That enables me to pull an overview of all cases with all docs
startkey case_
endkey case_\uffff
and then access all details by type
startkey fndg_1234_
endkey fndg_1234_\uffff

That works pretty well for my use case (querying all cases and details only 
when needed). By adding the type to the start I make sure the docs are stored 
in order (your 3.1 c). Whether or not to use UUID depends. In the example of a 
people directory each person has a unique incremental UUID:
person_<person-uuid>
the telephone number could be shortened to the type
telphn_<person-uuid>_home
telphn_<person-uuid>_work
telphn_<person-uuid>_fax
telphn_<person-uuid>_mobile

If there is a chance of conflicts I would always go for a UUID.

Regards,

Alexander





> On 24. Dec. 2015, at 20:05, Ronan Jouchet <[email protected]> 
> wrote:
> 
> Hi.
> 
> I'm coming back on an already much debated subject, with a few questions I 
> couldn't find answers for.
> 
> I started working on a new system backed by CouchDB, and am questioning our 
> choice to use "meaningful"/structured IDs (as opposed to UUIDs). Our data 
> revolves around documents called "cases", which can relate to various 
> documents, like notes, findings, measures. So we build IDs looking like:
> - 1234_case
> - 1234_finding_f2ac2351
> - 1234_finding_aa928399
> - 1234_note_22933cf5
> - 1234_measure_928dca87
> 
> Colleagues say they initially went for UUIDs, then moved on to a meaningful 
> scheme for guess-ability, which enabled easier replication, as well as a few 
> views referencing IDs (thanks to knowledge of the naming structure), which 
> expand to full documents with include_docs=true.
> 
> On my side, as a NoSQL freshman and without the project history, I can't help 
> wanting to move back to UUIDs, because:
> 
> 1. As we're leaning heavily on the *naming* of our documents, I have the 
> feeling we're hiding ourselves we're not properly structuring our data in a 
> way that is view-friendly. Feels like it's going to come back and bite us 
> later on.
> 
> 2. As we are adding logic, we're starting to see unwieldy IDs 
> (hash1_thing1_hash2_thing2_hash3_thing3_hash4)
> 
> 3. Currently, the information contained in the ID (in the above example: 
> caseId, type, hash) is currently *only* here. So to "extract" this 
> information we have repetitive-but-slightly-different "splitId" functions 
> that extract and type these ids (for example: "1234_finding_f2ac2351" -> 
> {"caseId": 1234, "type": "finding", "contentId": "f2ac2351"}, which is 
> painful.
> 
>   3.1. The obvious solution is be to repeat {caseId, type, hash} as document 
> properties. Then I can use them without having to call splitId(doc._id). But 
> then there's duplicated data, which will have to be updated jointly. Is it a 
> problem or is it just the time for me to learn to stop worrying and not care 
> about this kind of minor duplication in NoSQL land?
> 
> Then, looking at what the internet says (see references below),
> 
> a. Both [PDB] and [DC] say non-uuid IDs are convenient for bare-bones 
> _all_docs querying (e.g. for "all of Bob Dylan's albums released between 1964 
> and 1965", just {startkey: 'album_dylan_1965_', endkey: 
> 'album_dylan_1964_\uffff'}).
> True, but how often will I be able to use such simple queries? I feel like 
> I'm going to need views anyway.
> 
> b. Both [PDB] and [DC] say that a structured ID naming means usable indexes 
> "for free", taking no additional space compared to a solution with random 
> UUIDs complemented with views.
>  - Also, both note that using UUIDs (thus, needing views) means failing to 
> use the built-anyway index on _id. True.
>  - [DC] goes as far as saying that "getting rid of as many views (relying on 
> _all_docs instead) as you can is a worthwhile goal". Is this a shared opinion?
> 
> c. [INOI] and [GUIDE] note that incremental IDs will yield better performance 
> on bulk document inserts. Okay.
> 
> d. [SO] proposes to "use UUIDs unless you have a good reason not to", and 
> recommends to base your choice on "Cost of changing ID vs. How likely the ID 
> is to change" (if the ID is likely to change a lot, use a UUID to force 
> yourself to not rely on it).
> 
> What do you think? What do you use in your own projects?
> 
> Thanks for your help, thanks for CouchDB, and happy end-of-year :)
> 
> References ----
> 
> [PDB] (section "Use and abuse your doc IDs") 
> http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html
> 
> [DC] 
> http://davidcaylor.com/2012/05/26/can-i-see-your-id-please-the-importance-of-couchdb-record-ids/
> 
> [GUIDE] http://guide.couchdb.org/draft/performance.html#bulk
> 
> [INOI] http://blog.inoi.fi/2010/11/impact-of-document-ids-on-performance.html
> 
> [SO] 
> http://stackoverflow.com/questions/1963632/what-is-best-practice-when-creating-document-ids-in-couchdb/1964947#1964947
> 
> -- 
> Ronan

Re: Document ID naming: random UUIDs or structured?

Reply via email to