Charlie,

I hear you!  I'm looking for that same functionality. This problem is bigger 
than it looks.

Your single-dimension example is a good starting point.  It makes sense that 
when the user asks for all widgets priced between $0 and $100 he gets that 
information in facets.
You have a couple of choices:
1. Give him 5 equal price ranges 1-20, 21-40, 41-60, 61-80, 81-100, where most 
of the widgets fall into the 41-60 facet.
2. Give him 5 groups but make sure the number of widgets in each is the same, 
so the ranges might be 1-38, 39-46, 47-52, 53-60, 61-70.

But the option you want is actually neither of these.  I believe you would want 
natural groupings of widgets near one another in price, without the constraint 
of $10 intervals, and without the constraint of median-calculated ranges of 
widget prices.  If they aren't near one another, they don't belong  in the same 
group, and if a widget is near the price of two different groups of widgets, 
then the algorithm needs to decide to which group to assign this widget, or 
whether to combine both groups near this widget into one group.

The geo-clustering example in the link above is impressive.  If you choose 1000 
markers from the drop-down menu...
http://gmaps-utility-library.googlecode.com/svn/trunk/markerclusterer/1.0/examples/speed_test_example.html
...it shows between 30 and 40 markers per zoom level.  At some point those 1000 
lat/lon pairs have to be sent from the server to my browser, for processing by 
javascript.  But what if you match 15,000 documents?  That's easy to do in 
Solr.  There's a limit to how many pairs of lat/lon coordinates you can send 
across the web to a client side java script.  Solr needs to handle the 
clustering on the server, and needs to send out only enough lat/lon pairs to 
draw a visually consumable map.

So I think the MarkerCluster algorithm needs to be implemented within Solr, so 
that Solr will send out 30-40 'documents', or 30-40 of some other object that 
can allow drilldown to all documents in a plotted point.

Am I on track with what you are asking for?

Joe



On week38--2010 Sep 15, at 3:22 AM, Charlie DeTar wrote:

> On 09/14/2010 07:48 PM, Dennis Gearon wrote:
>> You are probably not talking about clusters in the physical structure of 
>> data on this disk, right?
>> 
>> What do YOU mean by clusters if not?
> 
> I mean basically "range facets", where the ranges are 2-dimensional
> distances between documents that have indexed latitudes and longitudes.
> 
> An example of what I mean:
> 
> http://googlegeodevelopers.blogspot.com/2009/04/markerclusterer-solution-to-too-many.html
> http://gmaps-utility-library.googlecode.com/svn/trunk/markerclusterer/1.0/examples/simple_example.html
> 
> If you zoom in (or, in an analogy with searching, specify a bounding box
> within which to look for documents), the grouped points become
> individual points.
> 
> This is basically the same idea as "Show me the widgets between $0 and
> $100", and then narrowing further from $50 to $60.  But instead of just
> a single float or int, it's a distance calculation to a 2D point.
> 
> The mapping stuff is all out-of-scope for Solr, but indexing of
> documents in such a way that I could get counts of documents in various
> geographic ranges seems useful to anyone interested in providing a
> browsing/searching interface to a large corpus of geographic data.
> 
> Does that explain it?  The previous thread which discusses this is here:
> 
> http://search.lucidimagination.com/search/document/16d0dbc4ac0a7540/geographic_clustering#6c1bba9a39df5f1b
> 
> best,
> Charlie

Joe Chesak
j...@easyconnect.no



Reply via email to