I've made good use of CouchDB-Lucene in the past, but haven't had a chance to play around with ElasticSearch.
Another alternative would be to schedule a background process to create a summary document for each month's data. On Fri, Mar 18, 2011 at 8:41 AM, Justin Walgran <[email protected]> wrote: > Thanks for the suggestion, Zach. The problem I'm running into is that > there are too many results to sort quickly in a list function or on > the client. > > It is looking more and more like hooking up some flavor of Lucene may > be the only way to solve this problem. > > Does anyone have recommendations on using ElasticSearch vs. CouchDB-Lucene? > > Justin > > On Thu, Mar 17, 2011 at 5:23 PM, Zachary Zolton > <[email protected]> wrote: >> Justin, >> >> Depending on your intended usage, it may be acceptable to just use the >> view to filter by the desired month and then perform your sort in >> client-side code. Alternatively, you could do the sorting server-side >> in a _list function, but this may put quite a burden on your CouchDB >> server if you're making a high volume of these queries. >> >> Also, CouchDB-Lucene is very capable of querying ranges in one field >> while sorting on an additional field. >> >> >> Cheers, >> >> Zach >> >> On Thu, Mar 17, 2011 at 3:34 PM, Justin Walgran <[email protected]> wrote: >>> I'm sorry, I oversimplified my problem statement. Your solution is >>> correct if I only need to select by month. Unfortunately I also need >>> to support an arbitrary inspection date range for filtering results. >>> February 6th to march 14th for example. This is where the trouble >>> creeps in. >>> >>> Justin >>> >>> On Thu, Mar 17, 2011 at 4:29 PM, Keith Gable <[email protected]> >>> wrote: >>>> Then simply emit the name before the day of the month. Then, it'll >>>> sort by name then day of month. >>>> >>>> On Thu, Mar 17, 2011 at 3:17 PM, Justin Walgran <[email protected]> >>>> wrote: >>>>> Thanks for the thoughtful reply, Keith. >>>>> >>>>> Assume these input docs: >>>>> >>>>> { "inspection_date": "2011-03-01", "homeowner_name": "Bob" } >>>>> >>>>> { "inspection_date": "2011-03-02", "homeowner_name": "Keith" } >>>>> >>>>> { "inspection_date": "2011-03-03", "homeowner_name": "Alice" } >>>>> >>>>> The key output from >>>>> by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}] >>>>> would be: >>>>> >>>>> [2011,3,1,"Bob"] >>>>> [2011,3,2,"Keith"] >>>>> [2011,3,3,"Alice"] >>>>> >>>>> Which is not sorted by home owner name. That's the gotcha. >>>>> >>>>> >>>>> Justin >>>>> >>>>> On Thu, Mar 17, 2011 at 2:13 PM, Keith Gable <[email protected]> >>>>> wrote: >>>>>> Uh. This sounds simple? >>>>>> >>>>>> view: by_home_owner_name: >>>>>> if (doc.home_owner_name) { emit(doc.home_owner_name, 1); } >>>>>> >>>>>> view: by_inspection_date: >>>>>> if (doc.inspection_date) { >>>>>> var d = new Date(doc.inspection_date); >>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate() ], 1); >>>>>> } >>>>>> >>>>>> To look for all of my inspections: >>>>>> ...by_home_owner_name?key=Keith Gable >>>>>> >>>>>> To get all of the inspections for today: >>>>>> ...by_inspection_date?reduce=false&key=[2011,3,17] >>>>>> >>>>>> To get all of the inspections for this month: >>>>>> ...by_inspection_date?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}] >>>>>> >>>>>> >>>>>> Combining the two: >>>>>> >>>>>> view: by_inspection_date_and_homeowner_name: >>>>>> if (doc.inspection_date && doc.homeowner_name) { >>>>>> var d = new Date(doc.inspection_date); >>>>>> emit ([ d.getFullYear(), d.getMonth() + 1, d.getDate(), >>>>>> doc.homeowner_name ], 1); >>>>>> } >>>>>> >>>>>> ...by_inspection_date_and_homeowner_name?reduce=false&startkey=[2011,3,0]&endkey=[2011,3,{}] >>>>>> >>>>>> Will result in: >>>>>> [2011,3,1,"Alice"] >>>>>> [2011,3,1,"Bob"] >>>>>> [2011,3,2,"Keith"] >>>>>> >>>>>> >>>>>> Does any of that not do what you want? >>>>>> >>>>>> On Thu, Mar 17, 2011 at 12:33 PM, Justin Walgran <[email protected]> >>>>>> wrote: >>>>>>> Assume a CouchDB storing and indexing housing inspection records. Each >>>>>>> inspection document as two important fields. >>>>>>> >>>>>>> - Home owner name >>>>>>> - Inspection date >>>>>>> >>>>>>> There are about 15,000 inspection documents generated per month. >>>>>>> >>>>>>> I need to quickly retrieve a list of inspections for January, sorted >>>>>>> by home owner name. >>>>>>> >>>>>>> The issue I am running into is the fact that the size of the result >>>>>>> set requires paging the data using limit and startkey. This would >>>>>>> required that the view key be the inspection date, which means the >>>>>>> results cannot be sorted by home owner name. The size of the data >>>>>>> means that pulling it all down to the client and sorting in the >>>>>>> browser is not performant. >>>>>>> >>>>>>> Is there a clever way to solve this problem? >>>>>>> >>>>>>> >>>>>>> Thanks, >>>>>>> >>>>>>> Justin >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Keith Gable >>>>>> A+ Certified Professional >>>>>> Network+ Certified Professional >>>>>> Web Developer >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Keith Gable >>>> A+ Certified Professional >>>> Network+ Certified Professional >>>> Web Developer >>>> >>> >> >
