Document ID naming: random UUIDs or structured?

Ronan Jouchet Thu, 24 Dec 2015 11:06:06 -0800

Hi.

I'm coming back on an already much debated subject, with a few questionsI couldn't find answers for.

I started working on a new system backed by CouchDB, and am questioningour choice to use "meaningful"/structured IDs (as opposed to UUIDs). Ourdata revolves around documents called "cases", which can relate tovarious documents, like notes, findings, measures. So we build IDslooking like:

 - 1234_case
 - 1234_finding_f2ac2351
 - 1234_finding_aa928399
 - 1234_note_22933cf5
 - 1234_measure_928dca87

Colleagues say they initially went for UUIDs, then moved on to ameaningful scheme for guess-ability, which enabled easier replication,as well as a few views referencing IDs (thanks to knowledge of thenaming structure), which expand to full documents with include_docs=true.

On my side, as a NoSQL freshman and without the project history, I can'thelp wanting to move back to UUIDs, because:

1. As we're leaning heavily on the *naming* of our documents, I have thefeeling we're hiding ourselves we're not properly structuring our datain a way that is view-friendly. Feels like it's going to come back andbite us later on.

2. As we are adding logic, we're starting to see unwieldy IDs(hash1_thing1_hash2_thing2_hash3_thing3_hash4)

3. Currently, the information contained in the ID (in the above example:caseId, type, hash) is currently *only* here. So to "extract" thisinformation we have repetitive-but-slightly-different "splitId"functions that extract and type these ids (for example:"1234_finding_f2ac2351" -> {"caseId": 1234, "type": "finding","contentId": "f2ac2351"}, which is painful.

3.1. The obvious solution is be to repeat {caseId, type, hash} asdocument properties. Then I can use them without having to callsplitId(doc._id). But then there's duplicated data, which will have tobe updated jointly. Is it a problem or is it just the time for me tolearn to stop worrying and not care about this kind of minor duplicationin NoSQL land?


Then, looking at what the internet says (see references below),

a. Both [PDB] and [DC] say non-uuid IDs are convenient for bare-bones_all_docs querying (e.g. for "all of Bob Dylan's albums released between1964 and 1965", just {startkey: 'album_dylan_1965_', endkey:'album_dylan_1964_\uffff'}).True, but how often will I be able to use such simple queries? I feellike I'm going to need views anyway.

b. Both [PDB] and [DC] say that a structured ID naming means usableindexes "for free", taking no additional space compared to a solutionwith random UUIDs complemented with views.- Also, both note that using UUIDs (thus, needing views) meansfailing to use the built-anyway index on _id. True.- [DC] goes as far as saying that "getting rid of as many views(relying on _all_docs instead) as you can is a worthwhile goal". Is thisa shared opinion?

c. [INOI] and [GUIDE] note that incremental IDs will yield betterperformance on bulk document inserts. Okay.

d. [SO] proposes to "use UUIDs unless you have a good reason not to",and recommends to base your choice on "Cost of changing ID vs. Howlikely the ID is to change" (if the ID is likely to change a lot, use aUUID to force yourself to not rely on it).


What do you think? What do you use in your own projects?

Thanks for your help, thanks for CouchDB, and happy end-of-year :)

References ----

[PDB] (section "Use and abuse your doc IDs")http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html

[DC]http://davidcaylor.com/2012/05/26/can-i-see-your-id-please-the-importance-of-couchdb-record-ids/


[GUIDE] http://guide.couchdb.org/draft/performance.html#bulk

[INOI]http://blog.inoi.fi/2010/11/impact-of-document-ids-on-performance.html

[SO]http://stackoverflow.com/questions/1963632/what-is-best-practice-when-creating-document-ids-in-couchdb/1964947#1964947


--
Ronan

Document ID naming: random UUIDs or structured?

Reply via email to