Hi Francesco, yes I think so too, let's keep that ticket up to date with notes.
On 11 June 2012 12:19, FRANCESCO FURIANI <[email protected]> wrote: > Hi Dave, > > thx for all the hints. > > We tried on an old linux machine with CouchDB 1.0.1 (Erlang R14B02/5.8.3) and > the views are working with more than 3 JSON (we're trying to import more, to > see what's the limit). Seems like a CouchDB 1.2.0 issue. > > I'll do futher testing. > > > Regards, > Francesco > ________________________________________ > Da: Dave Cottlehuber [[email protected]] > Inviato: lunedì 4 giugno 2012 19.48 > A: [email protected] > Oggetto: Re: Problems with CouchDB 1.2.0 views on large documents JSONs > > On 4 June 2012 21:03, Francesco Furiani <[email protected]> wrote: >> Hi, >> >> i run a couchdb server (v1.2.0) over a mac (intel architecture, 8gb of ram, >> os x version 10.6.8) installed with brew. >> >> The server itself is used as a storage of big jsons (example: >> https://raw.github.com/cvdlab-bio/webpdb/develop/docs/jsons/2LGB-pretty-print.json >> ) for a tiny uni project. >> >> When we load more than 3 of these jsons, all the map functions (we created >> to retrieve documents besides a simple get by id) does not work. >> A typical map is: >> >> function(doc){if(doc.TITLE.title.match('.*INSULIN.*') !== null) emit(doc.ID, >> doc);} >> >> but even a >> >> function(doc){emit(doc.ID, doc.ID)} >> >> cease to work. >> >> while when there are just 3 or 2 jsons in the database they work just fine. >> I tried increasing the stack for couchjs (1gb now, going over 1gb doesn't >> work it seems), increasing limits for files (4096), increasing timeout for >> processes but in the end i don't get any results and only a (Error: >> os_process_error {exit_status,0}) from the db. >> >> Is the json we provide too big for couch? We need to redisign map to remove >> parts for json? Is this a known bug (but i haven't found anything over the >> net)? >> >> Any clue that might help me? >> >> Thanks for the help, >> Francesco >> > > Hi Francesco, > > CouchDB stores JSON in a native erlang format on disk. Retrieving this > (whether to process in a JS map/reduce view, or to send through to an > http client) requires transforming this into JSON text format. For big > docs, this can take a while, or even when piped into couchjs, break. > There's a couple of other people who have reported this type of issue > recently on the ML. > > You could avoid this by using erlang views**, or you may check whether > you see the same issue in 1.1.1 which has a different (slower) JSON > parsing tool. > > Could you open a JIRA ticket for this issue please, seeing as you have > a nice sample doc to share? > > Some general points: > typically you can replace emit(doc.id, doc) with emit(null) in your view. > You can always use ?include_docs=true to return the full data files in > your query. > The id of any doc emitted is available "for free" so you don't need > the duplication. > This will make your view significantly smaller by orders of magnitude. > > ** erlang views run inside the erlang vm, without a trusted sandbox rm > -rf and worse are all possible. But its likely faster, less > limitations per above issue, and comes with less documentation too. > YMMV, don't forget to wear a seatbelt, and never _ever_ run with > scissors. > > A+ > Dave > >
