Hi, If your documents contain many "rows" it's probably better to have each "row" as a separate document and collate with views. If you use attachments you can't (currently) build an index on the data in the attachment IIRC. You'll want to test with a subset of the data and get some reasonable expectation on how it'll behave as the data grows before making a final decision. Cheers Simon
On Monday, 13 February 2012 at 16:03, mike iannacone wrote: > Thanks for the response. Looking around a bit more, it does seem like > our documents are larger than most people are using. Is there any > general guideline or rule of thumb as to how large documents should > be? > > For some background, this is full of public health metrics and related > data, which we're compiling from several different sources. Each > document basically corresponds to one metric from one source. Many of > these were imported from csv files, so mapping one csv file to one > document made some sense for us. The documents each contain various > metadata (the source, the years, possibly some statistical info, etc), > and then a list of individual data objects. It might make sense to > split this up, so that each document only contains the metadata, and > has an attachment with the actual data. Does that sound like a good > approach, or am I on the wrong track with that? > > Mike > > On Mon, Feb 13, 2012 at 9:36 AM, Steve Foulkes <[email protected] > (mailto:[email protected])> wrote: > > Hi, > > > > > > On 2/10/12 8:58 PM, mike iannacone wrote: > > > > > > Hi, I've been running into some rather strange errors when running my > > > view code in certain cases. It seems to run fine until the size of > > > the database grows beyond a certain point, at which point I get > > > timeouts. The confusing part is that this size where it starts > > > failing is quite low, around 1773 documents, totaling 402MB. > > > > > > environment: > > > This is my development server, running couchDB 1.1.1, built using the > > > build-couchdb tool as the wiki recommended, on a completely new Ubuntu > > > install. (I reinstalled it a few hours ago, thinking it might be some > > > kind of environment problem.) > > > > > > overall process shown in the logs: > > > > > > *load a subset of documents, and confirm the views work > > > > > > *load most of the remaining documents, views work > > > (this was done from the futon client, running on another machine. > > > It sees the connection time out, but view index builds ok anyway, and > > > completes a few minutes after the client has given up. When the > > > client requests the view afterwards, it works fine, and fast now that > > > the index is done.) > > > > > > *upload another 18 documents, (the largest ones, ranging from 10M to > > > 22M,) view failed with "OS Process timed out." > > > The log of everything described up to this point is included. > > > > > > > > > > > The large documents are the problem. The view process is taking too long to > > process them and is timing out. You can increase the timeout in the > > configuration which is accessible from futon, it's under "couchdb" and > > called "os_process_timeout". > > > > Steve > > > > > > > > This seems strange as it gave this error only now, when it took so > > > long previously. At any rate, I increased the os_process_timeout > > > value to 10 minutes, and attempted it again, and it still timed out > > > after only a few seconds. (this is shown in the second log file, > > > although it is essentially the same as the first.) > > > > > > > > > the actual view functions are shown in the log, but for convenience they > > > are: > > > "indicator_summary": { > > > "map": "function(doc) {\n if(doc.Data){\n var temp = > > > {};\n temp.Name = doc.Name;\n temp.Description = > > > doc.Description;\n temp.Sources = doc.Sources;\n temp.SourceURL > > > = doc.SourceURL;\n temp.Years = doc.Years;\n temp.National = > > > doc.National;\n temp.LocaleLevels = doc.LocaleLevels;\n > > > temp.Demographics = doc.Demographics;\n temp.Unit = doc.Unit;\n > > > temp.UnitLabel = doc.UnitLabel;\n temp.DataType = doc.DataType;\n > > > temp.Category = doc.Category;\n temp.TopCorrelated = > > > doc.TopCorrelated;\n emit(doc.Name, temp);\n }\n}" > > > }, > > > "indicator_detail": { > > > "map": "function(doc) {\n if(doc.Data&& doc.Years){\n > > > > > > for(var i=0; i<doc.Years.length; i++){\n for(var j=0; > > > j<doc.LocaleLevels.length; j++){\n var temp = {};\n > > > temp.Name = doc.Name;\n temp.Description = doc.Description;\n > > > temp.Sources = doc.Sources;\n temp.SourceURL = > > > doc.SourceURL;\n /*for(var k=0; k<doc.National.length; k++){\n > > > if(doc.National[k][doc.Years[i]]){\n temp.National > > > = doc.National[k][doc.Years[i]];\n }\n }*/\n > > > temp.Demographics = doc.Demographics;\n temp.Unit = doc.Unit;\n > > > temp.UnitLabel = doc.UnitLabel;\n temp.DataType = > > > doc.DataType;\n temp.Category = doc.Category;\n > > > temp.Data = doc.Data;\n temp.TopCorrelated = > > > doc.TopCorrelated;\n emit([doc.Name, doc.Years[i]], temp);\n > > > }\n }\n }\n}" > > > } > > > > > > > > > Besides this, I've tried replicated to a second machine, and on that > > > one adjusting several values, with no real progress: increased erlang > > > heartbeat timeout, increased erlang heap size, increased spidermonkey > > > stack size. These all either made no difference, or caused other > > > errors. I admit I was kind of guessing when changing those, so its > > > entirely possible that I was completely on the wrong track with those. > > > At any rate, the logs I included (and the current state of that dev > > > machine) is with everything set to its default values, except for that > > > 10 minute os_process_timeout value I mentioned above. > > > > > > Any help would be fantastic, as I'm completely out of ideas at this > > > point. I'd of course be glad to provide any additional info that > > > might be useful to you. > > > > > > Thanks! > > > Mike > > > > > > > > > >
