Savvas-Andreas Moysidis wrote:
> In my understanding sorting on a field for which analysis has yielded
> multiple terms just doesn't make sense..
> If you have document#1 with a field A which has the terms Epsilon, Alpha,
> and document#2 with field A which has the terms Beta, Delta and request
> an ascending sort on field A what order should you get and why?

In the couple use cases I've been asked for it, either...
(a) returning each document only the first time it appeared
     document 1 [for alpha] followed by document 2[beta]
(b) or returning them with duplicates
     doc1 [alpha], doc2[beta], doc2[beta] doc1[epsilon]
... would have been an OK user experience.

The use case
   "show me documents relevant to things close to a location"
seems like a pretty broad use-case that any geospatial-aware
search engine would like to handle; and I imagine in many cases
a single document might refer to multiple addresses/locations.

In another case, I was asked if the application could "sort the
incidents by the age of rape victims".   And while most incidents
involved a single victim, some had 2 or more.    The idea wasn't
to impose some total ordering but rather make it quick to find
documents involving younger people.     I realize I can work
around that one by adding a "min-age" column.

For the spatial one, where different users might pick different
center points I can't think of any good workaround beyond Jonathan's
idea of facets -- perhaps overlaying some map grid on the data
and using facets for that.




> On 27 October 2010 17:56, Jonathan Rochkind <rochk...@jhu.edu> wrote:
> 
>> I would suggest that trying to sort on a multi-token/multi-value value in
>> the first place ought to always raise an exception. Are there any reasons
>> why you'd EVER want to do this, with the way it currently works?  Letting
>> people do this and only _sometimes_ raise an exception, but never do
>> anything that's actually reasonable, just adds confusion for newbies.
>>
>> Alternately, perhaps sorting on a multi-valued or tokenized field ought to
>> sort only on the FIRST token found in the first value of , but not sure how
>> feasible that is to code.
>>
>> Ron, for your particular use case -- lucene sorting just can't really do
>> that, I'm not sure there's a WAY to code sorting that works on multi-valued
>> fields.  A given lucene/solr search results set only includes each document
>> ONCE.  So where would that document appear in your sort on a multi-valued
>> field?   A different solution is required.  I too sometimes have similar use
>> cases, and my best ideas about how to solve them involve using faceting ---
>> you can facet on a multi-valued field, and you can sort facets--but you can
>> only sort facets by "index order", a strict byte-by-byte sort.  Which
>> doesn't always work for me either.  I haven't quite figured out the solution
>> to this sort of problem.
>>
>>
>> Ron Mayer wrote:
>>
>>> Lance Norskog wrote:
>>>
>>>
>>>> You may not sort on a tokenized field. You may not sort on a multiValued
>>>> field. You can only have one term in a field.
>>>>
>>>> If there are more search terms than documents, A) sorting doesn't mean
>>>> anything and B) Lucene will throw an exception.
>>>>
>>>>
>>>
>>> Is that considered a feature, or an annoyance/bug?
>>>
>>> One of the things I'm using Solr for is to store a whole bunch of
>>> documents about crime events that contain information roughly
>>> like this:
>>>
>>> "the gang member ran the red light at 100 main st, and
>>>  continued driving to 500 main street where he hit a car. He
>>>  fled his car and ran to 789 second avenue where he hijacked
>>>  another car and drove to his house at 654 someother st"
>>>
>>> If I do a search for the name of that gang member's gang,
>>> I'd really really like to be able to sort my documents by
>>> distance from a location -- for example to quickly find
>>> any documents referring to gang activity in a neighborhood.
>>>
>>> And I'd really like to see this document near the top
>>> of my search results whether the user chose 100 main, 500 main,
>>> 790 second, or 650 someother street  as his center point for
>>> sorting his search.
>>>
>>>
>>> If I wanted that so badly I'd be willing to try coding it
>>> so you _could_ sort on multiValued fields, would people want
>>> that feature?   If so - would someone know off the top of
>>> their head where I should get started looking in the code?
>>>
>>> Or is it considered a feature that solr currently disallows it?
>>>
>>>
>>>
>>>
> 

Reply via email to