You can see progress in /_active_tasks or in Futon, it should show the update sequence number. The four year old page you link to might have been accurate in 2008 but I don't think it's true now. You should expect view building to slow down in an O(log n) curve as befits a b+tree. The numbers you see with very few documents are unrealistically high as everything fits in the disk cache. If you were to graph it you would see a very high peak that quickly softens into a curve.
B. On 13 March 2012 23:08, Daniel Gonzalez <[email protected]> wrote: > Hi, > > I have no reduce on the view, and that is my only view. > I *am* doing bulk inserts (1000 documents), and after each bulk insert, I > access the view. (my assumption is that this will be faster than accessing > the view once at the end of inserting the 3 million documents) > > I know that I will get here very varying numbers, but: what is the expected > view indexing time for the view that I posted and for an amount of 3 > million documents? > > How can I monitor view creation? (how many documents have been already > indexed) > > I got the idea that "bulk insert + view access + repeat" was faster that > "full insert + view access" here: > http://iamseanmurphy.com/2008/09/08/couchdb-view-generation/ > > Thanks, > Daniel > > On Tue, Mar 13, 2012 at 11:58 PM, Robert Newson <[email protected]> wrote: > >> The view build is already batched. In my opinion your strategy A can >> only ever be slower or the same speed as B. >> >> Try inserting the docs using _bulk_docs, it'll go much faster. I'd >> fill the database up and hit the view at the end for the fastest build >> time, but I'd still expect it take a while to build the view the first >> time. >> >> Do you have a reduce on the view? Are there other views in the same >> design document? >> >> B. >> >> On 13 March 2012 22:45, Daniel Gonzalez <[email protected]> wrote: >> > Hi, >> > >> > I am creating a database with lots of documents (3 million). >> > I have a view in the database: >> > >> > function(doc) { >> > if (doc.PORTED_NUMBER) emit(doc.PORTED_NUMBER, >> doc.RECEIVING_OPERATOR); >> > } >> > >> > To speed up view creation, I am doing the following (Strategy A) >> > >> > 1. Define view >> > 2. Insert 1000 documents >> > 3. Access the view >> > 4. Goto 2 >> > >> > And I repeat this process until all documents have been inserted. >> > >> > I have read that this is faster than my previous strategy (Strategy B, >> > obsolete): >> > >> > 1. Insert all documents >> > 2. Define view >> > 3. Access view >> > >> > My problem is that, in my current Strategy A, step 3 is taking longer and >> > longer. Currently I have around 300 thousand documents inserted and view >> > access is taking around 120s. >> > The evolution of the delay in view access has been: >> > >> > 2012-03-13 23:01:40,405 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:03:29,589 - __main__ - INFO - - >> View >> > ready, ellapsed 109 >> > 2012-03-13 23:03:32,945 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:05:31,699 - __main__ - INFO - - >> View >> > ready, ellapsed 118 >> > 2012-03-13 23:05:35,106 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:07:28,392 - __main__ - INFO - - >> View >> > ready, ellapsed 113 >> > 2012-03-13 23:07:31,663 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:09:26,929 - __main__ - INFO - - >> View >> > ready, ellapsed 115 >> > 2012-03-13 23:09:30,572 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:11:27,490 - __main__ - INFO - - >> View >> > ready, ellapsed 116 >> > 2012-03-13 23:11:30,784 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:13:21,575 - __main__ - INFO - - >> View >> > ready, ellapsed 110 >> > 2012-03-13 23:13:24,937 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:15:23,519 - __main__ - INFO - - >> View >> > ready, ellapsed 118 >> > 2012-03-13 23:15:26,836 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > 2012-03-13 23:17:23,036 - __main__ - INFO - - >> View >> > ready, ellapsed 116 >> > 2012-03-13 23:17:26,310 - __main__ - INFO - - >> > BulkSend >> requested= 1000 ok= 1000 errors= 0 >> > >> > It started with around 1s, and it is increasing more or less >> monotonically. >> > It is already running since 7 hours ago, and only 300000 documents have >> > been imported and indexed. >> > If everything continues like this (I do not know what kind of matematical >> > function this is following, but for me it seems like an exponential >> > function), importing the 3 million of documents is going to take forever. >> > >> > Is there a way to speed this up? >> > >> > Thanks! >> > Daniel >>
