On Tue, Mar 25, 2014 at 11:41 AM, Suraj Kumar <[email protected]> wrote: > I constantly see "one DB per user" being proposed as a solution. But I'm > not entirely convinced whether this will truly work for a large scale > setup. > > The reason people choose CouchDB is for high scale use where one could > potentially end up with a million users. Then, what good is a database hat > only relies on the underlying filesystem to do the job of index keeping? If > there are a million "*.couch" files under var/lib/couchdb/, I'd expect the > performance to be very poor / unpredictable since it now depends on the > underlying file system's logic. How can this be partitioned? > > What is the "right" way to handle million users with need for isolated > documents within each DB? How will replication solutions cope in > distributing these million databases? 2 million replicating connections > between two servers doesn't sound right.
This would be fixed with BigCouch merge where information of databases is stored inside special database. As for plain CouchDB, you may use / (slash) character in name to group databases on filesystem layer. Say, you name users databases with some hash name like: 00007fda . Then you may insert this character in the middle to produce the next structure: 0000/ ---- 4ea7.couch ---- 7fda.couch ---- dd47.couch 0016/ ---- bbc1.couch and so on. This will reduce negative FS behavior for million files inside single directory. -- ,,,^..^,,,
