Hi,

I'm trying to figure out the best way to implement a query for "overlapping segments".

The specific use case involves (biological) genomic data, which is naturally represented by a triple of the form [Chromosome, Start, End]. As a concrete example, the index [1, 123456, 135789] represents the segment on chromosome 1 that extends from base position 123456 through (and including) base position 135789. The segments/documents in CouchDB came from analyzing a set of cell line DNA data to determine segments where the copy number changes.

A typical query against this database (from a biologist's point of view) would be to ask what happens to these cell lines in the region of a specific gene. I can easily convert gene names to their positions in the human genome, so this translates to a query asking for all segments that overlap with the region that defines the gene. For example, I might want to find all segments that overlap [1, 130000, 140000]. The example above should be returned as part of te results of this query.

The pseudocode for the query I have in mind is something like
   if (doc.Chromosome == query.Chromosome) {
      if (doc.Start <= query.End & doc.End >= query.Start) {
         // show me this document
      }
   }
The actual view at present is much simpler, basically consisting of
   if (doc.Start) {
      emit([doc.Chromosome, doc.Start, doc.End], other-relevant-stuff)
   }
with the idea being that the query parameters should be able to find the desired segments.

The problem I have is that I cannot see a reasonable way to use the startkey and endkey parameters to identify these kinds of overlaps. Am I missing something, or is there a way within the CouchDB API to do what I want?

(One might note that the database arising from 175 cell lines contains about 300,000 documents, and that you expect the results of most queries to contain onyl about 175 rows (one per cell line). This may constrain the kinds of tricks one can expect to do with additional views or with emitting more stuff.)

Thanks,
   Kevin

Reply via email to