Hello Ronan, my two cents:
I tend to incorporate the type and possible parent into my id, so in your case that would look like case_1234 finding_1234_f2ac2351 finding_1234_aa928399 note_1234_22933cf5 measure_1234_928dca87 However, I tend to normalise the type and all “ids" into a fixed length e. g. case_1234 fndg_1234_f2ac2351 fndg_1234_aa928399 note_1234_22933cf5 msre_1234_928dca87 That enables me to pull an overview of all cases with all docs startkey case_ endkey case_\uffff and then access all details by type startkey fndg_1234_ endkey fndg_1234_\uffff That works pretty well for my use case (querying all cases and details only when needed). By adding the type to the start I make sure the docs are stored in order (your 3.1 c). Whether or not to use UUID depends. In the example of a people directory each person has a unique incremental UUID: person_<person-uuid> the telephone number could be shortened to the type telphn_<person-uuid>_home telphn_<person-uuid>_work telphn_<person-uuid>_fax telphn_<person-uuid>_mobile If there is a chance of conflicts I would always go for a UUID. Regards, Alexander > On 24. Dec. 2015, at 20:05, Ronan Jouchet <[email protected]> > wrote: > > Hi. > > I'm coming back on an already much debated subject, with a few questions I > couldn't find answers for. > > I started working on a new system backed by CouchDB, and am questioning our > choice to use "meaningful"/structured IDs (as opposed to UUIDs). Our data > revolves around documents called "cases", which can relate to various > documents, like notes, findings, measures. So we build IDs looking like: > - 1234_case > - 1234_finding_f2ac2351 > - 1234_finding_aa928399 > - 1234_note_22933cf5 > - 1234_measure_928dca87 > > Colleagues say they initially went for UUIDs, then moved on to a meaningful > scheme for guess-ability, which enabled easier replication, as well as a few > views referencing IDs (thanks to knowledge of the naming structure), which > expand to full documents with include_docs=true. > > On my side, as a NoSQL freshman and without the project history, I can't help > wanting to move back to UUIDs, because: > > 1. As we're leaning heavily on the *naming* of our documents, I have the > feeling we're hiding ourselves we're not properly structuring our data in a > way that is view-friendly. Feels like it's going to come back and bite us > later on. > > 2. As we are adding logic, we're starting to see unwieldy IDs > (hash1_thing1_hash2_thing2_hash3_thing3_hash4) > > 3. Currently, the information contained in the ID (in the above example: > caseId, type, hash) is currently *only* here. So to "extract" this > information we have repetitive-but-slightly-different "splitId" functions > that extract and type these ids (for example: "1234_finding_f2ac2351" -> > {"caseId": 1234, "type": "finding", "contentId": "f2ac2351"}, which is > painful. > > 3.1. The obvious solution is be to repeat {caseId, type, hash} as document > properties. Then I can use them without having to call splitId(doc._id). But > then there's duplicated data, which will have to be updated jointly. Is it a > problem or is it just the time for me to learn to stop worrying and not care > about this kind of minor duplication in NoSQL land? > > Then, looking at what the internet says (see references below), > > a. Both [PDB] and [DC] say non-uuid IDs are convenient for bare-bones > _all_docs querying (e.g. for "all of Bob Dylan's albums released between 1964 > and 1965", just {startkey: 'album_dylan_1965_', endkey: > 'album_dylan_1964_\uffff'}). > True, but how often will I be able to use such simple queries? I feel like > I'm going to need views anyway. > > b. Both [PDB] and [DC] say that a structured ID naming means usable indexes > "for free", taking no additional space compared to a solution with random > UUIDs complemented with views. > - Also, both note that using UUIDs (thus, needing views) means failing to > use the built-anyway index on _id. True. > - [DC] goes as far as saying that "getting rid of as many views (relying on > _all_docs instead) as you can is a worthwhile goal". Is this a shared opinion? > > c. [INOI] and [GUIDE] note that incremental IDs will yield better performance > on bulk document inserts. Okay. > > d. [SO] proposes to "use UUIDs unless you have a good reason not to", and > recommends to base your choice on "Cost of changing ID vs. How likely the ID > is to change" (if the ID is likely to change a lot, use a UUID to force > yourself to not rely on it). > > What do you think? What do you use in your own projects? > > Thanks for your help, thanks for CouchDB, and happy end-of-year :) > > References ---- > > [PDB] (section "Use and abuse your doc IDs") > http://pouchdb.com/2014/05/01/secondary-indexes-have-landed-in-pouchdb.html > > [DC] > http://davidcaylor.com/2012/05/26/can-i-see-your-id-please-the-importance-of-couchdb-record-ids/ > > [GUIDE] http://guide.couchdb.org/draft/performance.html#bulk > > [INOI] http://blog.inoi.fi/2010/11/impact-of-document-ids-on-performance.html > > [SO] > http://stackoverflow.com/questions/1963632/what-is-best-practice-when-creating-document-ids-in-couchdb/1964947#1964947 > > -- > Ronan
