[ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708605#action_12708605
 ] 

patrick o'leary commented on SOLR-773:
--------------------------------------

Sorry for not getting into this sooner-

Lets take a step back for a second, and ask a couple of questions, my thoughts 
are provided.

1) What is the goal we want to achieve?
   - Provide a first iteration of a geographical search entity to SOLR
   - Bring an external popular plugin, in out of the cold into ASF and SOLR, 
helps solr users out, increases developers from 1 to many.

2) What is the level of commitment, and road map of spatial solutions in lucene 
and solr?
   - The primary goal of SOLR is as a text search engine, not GIS search, there 
are other and better ways to do that
    without reinventing the wheel and shoe horn-ing it into lucene. 
   (e.g. persistent doc id mappings that can be referenced outside of lucene, 
so things like postGis and other tools can be used)
   - We can never fully solve everyone's needs at once, lets start with what we 
have, and iterate upon it.
   - I'm happy for any improvements as long as they keep to two goals A. don't 
make it stupid B. don't make it complex.

3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's 
Cartesian tiers, which one?
   - Why not all? Again we can't solve everyone's needs so why not let them 
have the tools to help themselves.

 As for bench marking, I have performed some recently using tdouble precision 
0, 
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.

Returning all results, avg of 100 hits:
Trie Double: 108ms
Cartesian Tier: 12ms

The reason for the difference, is with Trie Ranges, you are doing 2 sets of 
range filters/ queries.
Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
And maybe switching the _localTier fields from sdouble to tdouble might improve 
that, I haven't tried, 12ms is something I can live with.

However, the distance calculation is the killer, 300,000 took about 1.8 seconds 
in a single thread on a 3.2GHz machine.
 
I was working on some additional features in locallucene, such as poly lines, 
and convex hulls, which using the Cartesian tierIds 
can give some basic quick features such as intersect, contains, and a nifty 
feature of having sorted id's is nearby results.

Also faceting on tierId's can give you hot spot results.
One final feature, the projection method is a an implementation of IProjector, 
which allows you to create your own projection
currently I'm using Sinusoidal, but you can do your own, such as say 
- Google Mercator (I use a similar quad grid concept, just different projection 
method) 
- Open Map
etc..

There's a lot that can be done, but we should stay focused on primary goals, 
and iterate, iterate iterate. 

> Incorporate Local Lucene/Solr
> -----------------------------
>
>                 Key: SOLR-773
>                 URL: https://issues.apache.org/jira/browse/SOLR-773
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
> SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to