[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

Uwe Schindler (JIRA) Wed, 13 May 2009 01:37:20 -0700

    [ 
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708825#action_12708825
 ]


Uwe Schindler commented on SOLR-773:
------------------------------------

{quote}
Agreed on the first, not 100% certain on the second.  On the second, this issue 
is the gate keeper.  If people reviewing the patch feel there are better ways 
to do things, then we should work through them before committing.  What you are 
effectively seeing is an increase in the developers working on from 1 to many, 
it's just not on committed code.
{quote}
I aggree with iterating about the patch and also LocalLucene (not only 
LocalSolr).

{quote}
On the first point, I don't follow.  Isn't LocalLucene and LocalSolr, just 
exactly a GIS search capability for Lucene/Solr?  I'm not sure if I would 
categorize it as shoe-horning.  There are many things that Lucene/Solr can 
power, GIS search with text is one of them.  By committing this patch (or some 
variation), we are saying Solr is going to support it.  Of course, there are 
other ways to do it, but that doesn't preclude it from L/S.  The combination of 
text search plus GIS search is very powerful, as you know. 
{quote}

Yes, and we tried solutions in the past that use unique doc ids to do joins 
between RDBMS used for geo search and Lucene used for the full text part. The 
biggest problem is, that this join operations are very inefficient if many 
documents are affected. Lucene as a full text engine has the great advantage to 
display the results very fast without retrieving the whole hits (you normally 
display only the best ranking ones). If you combine with data bases, you have 
to intersect the results in a HitCollector during filling the PriorityQueue. 
RDBMS have the problem to always have "transactions" around select statements 
and will only deliver the results, when the query is completely done. This puts 
an additional time lag. Doing the geo query completely in Lucene for our search 
in PANGAEA about a hundred of times faster in most cases (with TrieRange).

{quote}
Still, I think Yonik's main point is why reinvent the wheel when it comes to 
things like distributed search and the need for custom code for indexing, etc. 
when they likely can be handled through function queries and field types and 
therefore all of Solr's current functionality would just work.  The other 
capabilities (like sorting by a FunctionQuery) is icing on the cake that helps 
solve other problems as well.
{quote}

 I also agree about thinking to reimplement specific parts of the code, that 
may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, 
even as it is not "standard" today - but its generic and not bound to geo and 
hopefully will move to Lucene Core as NumericRangeQuery & utils) easily.

In my opinion, LocalLucene should be as generic as possible and should not add 
too many custom datatypes, specific index structures, fixed field names etc. A 
problem of most GIS solutions for relational databases available on the world 
is, that you are fixed to specific database schemas. E.g. for our search at 
PANGAEA, we want to display the results of the Lucene Query also as Map. But 
for that you cannot use common GIS solution, because they do not know how to 
extract the data from Lucene.

Soon I will start a small project, to add a plugin to GeoServer's feature 
store, that does not use RDBMS or shape files or whatever for the features, 
instead use Lucene. Using that it may also be possible to retrieve the geo 
objects (in our case data sets with lat/lon) and display them in a WMS using 
OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API 
(using TrieRange to support the bounding box filter) and so on.

About your benchmarks:
I suspect, that you have warmed up the readers, but I think you should get 
faster performace out of TrieRange. In my opinion, you should not use doubles 
for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to 
get 7 decimal digits (which is surely enough for geo, 180*1E7 should be 
<Integer.MAX_VALUE, too).
In general, the biggest speed improve of TrieRangecan be seen in comparison to 
other range queries, if the range contains a lot of distinct values and so hit 
many documents. E.g. you will also get 100 ms, if you do a search around the 
african continent where thousands of hits are in, each having a different 
lat/lon pair! How does LocalLucene behave with that?
Because of this, I would implement the Tiers using tint or tfloat or whatever.


> Incorporate Local Lucene/Solr
> -----------------------------
>
>                 Key: SOLR-773
>                 URL: https://issues.apache.org/jira/browse/SOLR-773
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, 
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, 
> SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr 
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr

Reply via email to