Chris, thanx for all this info! I'll think about these things again and then come back to you...
Cheers, Martin On Tue, 2007-06-26 at 23:22 -0700, Chris Hostetter wrote: > : my documents (products) have a price field, and I want to have > : a "dynamically" calculated range facet for that in the response. > > FYI: there have been some previous discussions on this topic... > > http://www.nabble.com/blahblah-t2387813.html#a6799060 > http://www.nabble.com/faceted-browsing-t1363854.html#a3753053 > > : AFAICS I do not have the possibility to specify range queries in my > : application, as I do not have a clue what's the lowest and highest > : price in the search result and what are "good" ranges according > : to the (statistical) distribution of prices in the search result. > > as mentioned in one of those threads, it's *really* hard to get the > statistical sampling to the point where it's both balanced, but also user > freindly. writing code specificly for price ranges in dollars lets you > make some assumptions about things that give you "nice" ranges (rounding > to one significant digit less then the max, doing log based ranges, etc..) > that wouldn't really apply if you were trying to implement a truely > generic dynamic range generator. > > one thing to keep in mind: it's typically not a good idea to have the > constraint set of a facet change just because some other constraint was > added to the query -- individual constraints might disappear because > they no longer apply, but it can be very disconcerting to a user to > when options hcange on them.... if i search on "ipod" a statistical > analysis of prices might yeild facet ranges of $1-20, $20-60, $60-120, > $120-$200 ... if i then click on "accessories" the statistics might skew > cheaper, so hte new ranges are $1-20, $20-30, $30-40, $40-70 ... and now > i'm a frustrated user, because i relaly wanted ot use the range $20-60 > (that just happens to be my budget) and you offered it to me and then you > took it away ... i have to undo my selection or "accessories" then click > $20-60, and then click accessories to get what i wnat ... not very nice. > > : So if it would be possible to go over each item in the search result > : I could check the price field and define my ranges for the specific > : query on solr side and return the price ranges as a facet. > > : Otherwise, what would be a good starting point to plug in such > : functionality into solr? > > if you relaly want to do statistical distributions, one way to avoid doing > all of this work on the client side (and needing to pull back all of hte > prices from all of hte matches) would be to write a custom request handler > that subclasses whichever on you currently use and does this computation > on the server side -- where it has lower level access to the data and > doesn't need to stream it over the wire. FieldCache in particular would > come in handy. > > it occurs to me that even though there may not be a way to dynamicly > create facet ranges that can apply usefully on any numeric field, we could > add generic support to the request handlers for optionally fetching some > basic statistics about a DocSet for clients that want them (either for > building ranges, or for any other purpose) > > min, max, mean, median, mode, midrange ... those should all be easy to > compute using the ValueSource from the field type (it would be nice if > FieldType's had some way of indicating which DocValues function can best > manage the field type, but we can always assume float or have an option > for dictating it ... people might want a float mean for an int field > anyway) > > i suppose even stddev could be computed fairly easily ... there's a > formula for that that works well in a single pass over a bunch of values > right? > > > > > -Hoss > -- Martin Grotzke http://www.javakaffee.de/blog/
signature.asc
Description: This is a digitally signed message part