Hi Tanguy,

On Jan 11, 2012, at 6:14 AM, Tanguy Moal wrote:

> Dear ML,
> 
> I'm performing some developments relying on spatial capabilities of solr.
> 
> I'm using Solr 3.5, have been reading 
> http://wiki.apache.org/solr/SpatialSearch#Spatial_Query_Parameters and have 
> the basic behaviours I wanted working.
> I use geofilt on a latlong field, with geodist() in the bf parameter.
> 
> When I doq=*:*&fq={!geofilt pt=x,y d=r unit=km 
> sfield=coordinates}&defType=edismax everything works fine.
> 
> But in some cases, documents don't have coordinates.
> For example, some of them refer to a city, so they have coordinates, while 
> others are not so precisely geolocated and simply refer to a broader area, a 
> region or a state, if you will.

You've seen this; right?
http://wiki.apache.org/solr/SpatialSearch#How_to_combine_with_a_sub-query_to_expand_results

> I tried with different queries :
> 
> - Include results from a broader area : q=*:*&fq=(state:FL OR 
> _query_:"{!geofilt ...}") .
> => That works fine (i.e. results showing up), but not as expected : this only 
> returns documents having FL as value in the state field AND some value in the 
> coordinates field *or* documents around my point but not documents without a 
> value in the coordinates field…

Your explanation of what happens is not consistent with with this query does.  
The filter query is OR, not AND.  The xml example docs that come with Solr 
don't all include a value in the "store" LatLonType field, so if what you claim 
is true, you should be able to prove it with a query against that data set we 
all have.  Please try to do so; I think you are mistaken.

> - Include results from a broader area, feeling lucky : 
> q=*:*&fq=((state:FL%20AND%20-coordinates:[*%20TO%20*])%20OR%20_query_:"{!geofilt%20pt=x,y%20d=r%20unit=km%20sfield=coordinates}")
>  
> => which does what is asked to... Return both the results with FL in the 
> state field and no value in the coordinates field *plus* results within a 
> radius around a point, *but* the problem is that in that case, the solr 
> search layer dies unconditionnally with the following stack :
>> Problem accessing /solr/geo_xpe/select. Reason:
>> 
>>    null
>> 
>> java.lang.NullPointerException
>>    at 
>> org.apache.lucene.spatial.DistanceUtils.parsePoint(DistanceUtils.java:351)
>>    at org.apache.solr.schema.LatLonType.getRangeQuery(LatLonType.java:95)
>>    at 
>> org.apache.solr.search.SolrQueryParser.getRangeQuery(SolrQueryParser.java:165)
...
> Of course, it doesn't make sense to expect the distance computation to work 
> with documents lacking value in the coordinate field!

Arguably this is a bug.  LatLonType doesn't handle open-ended range queries and 
it didn't check for a null argument defensively either.  This will happen 
wether there is indexed data or not.

[* TO *] queries are slow, particularly when there are many values -- like at 
least a thousand.  If you want to perform this type of query, instead index a 
boolean field corresponding to another field that indicates wether that field 
has a value.  This would be a good use of an UpdateRequestProcessor but you can 
just as well do it elsewhere.

> From a user perspective, having the possibility to define a default distance 
> to be returned for document missing a value in the coordinate field could be 
> helpful... If something like sortMissingFirst or sortMissingLast is specified 
> on the field.
> * sortMissingLast="true" could be obtained with a +Inf distance returned if 
> no value in the field
> * sortMissingFirst="true" could be obtained with a 0 distance returned if no 
> value in the field
> 
> I may be misunderstanding concepts, but those sorting attributes seem to only 
> apply for sorting and not to the documents selection process (geofilt)..? I 
> know that since solr3.5, it's possible to define sortMissing(Last|First) on 
> trie-based fields, but I don't know what happens for fields defined that way :
> ...
> <types>
>    ...
> <fieldType name="double" class="solr.TrieDoubleField" precisionStep="0" 
> omitNorms="true" positionIncrementGap="0"/>
> <fieldType name="latlong" class="solr.LatLonType" indexed="true" 
> sortMissingLast="true" omitNorms="true" subFieldType="double" />
>    ...
> <types>
> ...
> <fields>
>    ...
> <field name="coordinates" type="latlong" indexed="true" stored="true" 
> mutliValued="false"/>
>    ...
> </fields>
> ...
> 
> Help is welcome!

Indeed, sortMissing,etc. are used in sorting, and play no part in wether a 
document matches or not.  And for LatLonType, they won't do anything.  
LatLonType uses the a pair of double fields under the hood, as seen in your 
schema excerpt.  You could put those attributes there but I don't think that 
would work.  I was playing around with blank values yesterday and I found that 
blank values result in a distance away from the query point that is very large… 
I forget what value it was but you can try yourself.

~ David Smiley

Reply via email to