Hi Dave,
Special thanks for your suggestion on initial bulk upload. Point [2]
explains why I always had to compact immediately afterwards, and reduced
disk space usage ten-fold....
(And the subject change is so that I and others can maybe find this
advice again in the future.)
Kevin
On 11/6/2012 2:15 AM, Dave Cottlehuber wrote:
On 5 November 2012 19:22, Kevin Burton <[email protected]> wrote:
[SNIP]
Hi Kevin,
[SNIP]
If you're initially bulk uploading data, I would do 3 things
differently to what you're currently doing.
1. assign UUIDs myself
This is the only enforced unique indexed attribute in a DB, so use it
well. Put something you want in it. It's basically free text ** within
reason.
2. insert them in sorted UUID order
CouchDB is a database and sorting matters. Couch uses a B~tree ** and
so if you insert randomly you spend a lot of time forcing the re-write
of intermediate nodes for no gain. As Couch is an append-only
datastore this means several things -
- wasted space until you compact
- slower insert performance as you have multiple writes instead of one
http://horicky.blogspot.co.at/2008/10/couchdb-implementation.html
3. try inserting the first few docs by hand with curl. And read up on
the _bulk_docs API, this is much much faster.
Re your drivers, there are several but I personally don't use any of
them. There are more popular ones (based on my dodgy recollection)
here http://wiki.apache.org/couchdb/Related_Projects hopefully some of
the other Windows folk will pipe up.
A+
Dave
** handwavey