On Thu, Dec 1, 2011 at 3:49 AM, Rob Crowell <[email protected]> wrote: > I suppose it would be possible to make multiple queries, using > startkey and endkey to pull out the ranges. > > 1. Sort the "bad" tags: (BROKEN_IMAGE, OFFENSIVE_IMAGE) > 2. For each bad tag, request documents: > i. Query 1: > startkey = [] > endkey = ["BROKEN_IMAGE"] > > ii. Query 2: > startkey = ["BROKEN_IMAGE", {}] > endkey = ["OFFENSIVE_IMAGE"] > > iii. Query 3: > startkey = ["OFFENSIVE_IMAGE", {}] > endkey = [{}] > > Requires making N+1 queries, which for a fairly small list wouldn't be too > bad.
If you have a view of docs matching a condition, you can find docs *not* matching that condition efficiently: make simultaneous queries to _all_docs and your view. Both will be sorted by doc id. Iterate through both at the same time (no need to storing them in memory), spotting ids listed in _all_docs but not your view. I wrote this up here: http://stackoverflow.com/a/6210422/2938 Notes: * If rows have identical keys, CouchDB sorts them by doc id. You can emit any value for the rows; what's important here is row.id * You can generalize the technique to perform multiple "NOT" queries simultaneously. * This is a situation where concurrent or event-driven languages like Javascript or Erlang shine * I'm pretty sure that in practice, the "NOT" queries add zero cost to the query. It always takes the same time to complete: the time to fetch _all_docs. I do not know if this technique has a name. If it doesn't, may I propose: "The Thai Massage." > > On Wed, Nov 30, 2011 at 3:10 PM, Rob Crowell <[email protected]> wrote: >> Hey everyone, view question here. >> >> I've got couch records that represent images. They may have any >> number of tags (from zero to hundreds). However, while there are >> thousands of tags in the dataset, there are only a couple that are >> considered "bad" (BROKEN_IMAGE, BLANK_IMAGE, etc.) Here's an example >> document: >> >> { >> _id: ..., >> url: "http://example.org/whatever.png", >> tags: ["OUTDOORS", "BEACH", "RED_DRESS"] >> } >> >> I wrote a view to emit documents that don't have these "bad" tags by >> hard-coding the list of bad tags and checking every tag against this >> list. If none of the tags are bad, then emit the document. >> >> However, a user may also specify tags that he doesn't like >> (OFFENSIVE_IMAGE, DENVER_BRONCOS, whatever). Is there any good way to >> build a view around this idea ("show me all documents that don't have >> a set of tags") short of defining a custom view (with their own "bad" >> tags list) for every user? >> >> I could do this filtering client-side of course, but if I wanted to >> generate an exhaustive list of matching documents (for a report or >> something similar) then it would be a lot of work. I'm stumped at the >> moment. Thanks for any suggestions! >> > -- Iris Couch
