On Tue, 28 Apr 2020, at 07:06, Andrea Brancatelli wrote: > Hello Robert, > > I see your point and mostly understand it. The plan was not to "use" > this secondary database as an active one, but as a passively replicated > database from a main instance, so performances of the secondary database > weren't a big priority - the idea is to keep the whole "journal" of the > main database.
Hi Andrea I've spent some time recently dealing with a 1.7.x era database that has made this decision in the past, and weird things start to happen, when you have a lot of versions. Best not to go against the grain. I have a couple of suggestions on dealing with this, both based on the assumption that you will likely not need the data on a regular basis, that haven't already come up. 1. use Kafka or similar, for storing a record. This stream oriented functionality with potentially repeating IDs is what they specialise in. A small listener on the changes feeds on top of couch that handles attachments does what you need. This is obviously rather one-way. 2. same couchdb listener, but you move the _id of the doc into a different field, or prepend a new time-ordered id to it. There are many choices here for the new _id, but you want one that will sort correctly for your needs - for example, time ordered uuids, called "flake ids". You use the latter part of the doc id to store your original _id, and the initial part ensures that "events" naturally sort by time, which allows you to reconstruct the _changes feed if needed, and you can provide a view that splits the _id to give you a per- doc view as well. Both flake & uuid formats are possible here, but you must validate that the _id works in both javascript for couch, and whatever language you choose to implement your listener in. Boundary[1] has a great write-up & yeller[2] too, incl the relevant papers[3]. Search for "flake id" in your preferred language. Craig's writeup in his erlang one is really helpful too[4], and the IETF RFC[5] has a more formal spec of other uuid schemes. The proposed UUID "v6" format[6] still in draft, will have time ordered uuid capabilities. I haven't checked either of these implementations[7][8]. Maybe flake, or v6 uuids when finalised, would be a useful addition to CouchDB. A+ Dave [1]: https://archive.is/2015.07.08-082503/http://www.boundary.com/blog/2012/01/flake-a-decentralized-k-ordered-unique-id-generator-in-erlang/ [2]: http://yellerapp.com/posts/2015-02-09-flake-ids.html [3]: https://www.researchgate.net/publication/262154069_Roughly_sorting_sequential_and_parallel_approach [4]: https://gitlab.com/zxq9/zuuid [5]: https://tools.ietf.org/html/rfc4122 [6]: https://tools.ietf.org/html/draft-peabody-dispatch-new-uuid-format-00 [7]: https://github.com/boundary/flake [8]: https://github.com/s-yadav/FlakeId
