Hi all I have what amounts to a 'schema' design question, I guess. It's for storing and displaying session logs, and groups of such logs. I've asked about this on IRC, though it was a while back, and I never managed to implement a solution due to other things cropping up. So here are my thoughts all together:
I have two main entities: A session. This has a title, a summary, some other misc items of metadata (participant list, date, tags, etc), and the session text itself (~80k of text). For display purposes, I want to display one session per page, with previous and next links that span session group boundaries. Ideally they can have the session title in the url but without any numbers relating to the order, so that a link to a session is a link to that session, regardless of if the order has been rearranged. This allows easy linking to sessions, though I suppose redirects could be used to achieve a similar result. A session group. This has a title, a summary, (perhaps some other misc metadata) and has an ordered list of sessions. Each session group will contain somewhere between 50 and a 100 sessions when complete. For display purposes, I want to display a list of links to session + session summaries, in order. In addition, there might be subgroups of sessions, each given a little title and summary. While theoretically session groups might be reordered, this is unlikely to happen often, if at all. On the other hand, moving sessions between groups should be relatively easy. Sessions are generally added in order, but there will be occasional out of order additions. I can't be sure when a new group is started the old one will be closed for additions, either. There are a few thoughts I had on how to store this: First, store it as per the suggestion from the couchdb book for storing lists. Give each session group and session (and subgroup, if they become 'first class') a number representing the order. On the face of it, this seems quite easy. However, it seems to involve a lot of duplicated information that needs to be kept in sync if views are to be easy to emit. Under this schema, each group has its order number. If have a subgroup, each subgroup needs to have its own order number. It also needs to have the id and order of the group, so it can emit [group_order, subgroup_order]. Each session needs the group id, order, subgroup id, order, so it can emit [group_order, subgroup_order, session_order]. With this schema, adding almost anything except at the 'end' seems to involve a lot of renumbering. If not at insert time, then when it comes to the time to 're-balance' the floating point order numbers, to make them equally spaced once more. However, this format does make outputting group display pages very easy, via range queries over a view which just emits the necessary summary information. A second suggestion (made by jchris, iirc) is to store an array of session log ids in the group document. These can then be emitted with an order number based on the index in the array, to provide the ordered sessions required for previous and next links. Reordering sessions becomes very easy, as the order in the array can just be switched around. There's also probably no need for subgroups under this 'schema', there can just be subgruop heading entries in the array too. Sessions can easily be inserted out of order, and moved between groups. To create the group display pages, though, this setup seems to get more complex. Since the group display page needs information from both the group and the sessions, I'd have to query the view with 'include_docs=true' to get all that information in one query. This would involve retrieving all the session information (which could easily be several mbs) to display a page which is perhaps 5kb at most, which seems unnecessary. Thanks for reading if you got this far. Am I imagining problems which arn't there with either of these solutions? Have I missed a solution which will avoid these problems? Which ones have people on the list implemented? Regards Jon
