[
https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708825#action_12708825
]
Uwe Schindler commented on SOLR-773:
------------------------------------
{quote}
Agreed on the first, not 100% certain on the second. On the second, this issue
is the gate keeper. If people reviewing the patch feel there are better ways
to do things, then we should work through them before committing. What you are
effectively seeing is an increase in the developers working on from 1 to many,
it's just not on committed code.
{quote}
I aggree with iterating about the patch and also LocalLucene (not only
LocalSolr).
{quote}
On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just
exactly a GIS search capability for Lucene/Solr? I'm not sure if I would
categorize it as shoe-horning. There are many things that Lucene/Solr can
power, GIS search with text is one of them. By committing this patch (or some
variation), we are saying Solr is going to support it. Of course, there are
other ways to do it, but that doesn't preclude it from L/S. The combination of
text search plus GIS search is very powerful, as you know.
{quote}
Yes, and we tried solutions in the past that use unique doc ids to do joins
between RDBMS used for geo search and Lucene used for the full text part. The
biggest problem is, that this join operations are very inefficient if many
documents are affected. Lucene as a full text engine has the great advantage to
display the results very fast without retrieving the whole hits (you normally
display only the best ranking ones). If you combine with data bases, you have
to intersect the results in a HitCollector during filling the PriorityQueue.
RDBMS have the problem to always have "transactions" around select statements
and will only deliver the results, when the query is completely done. This puts
an additional time lag. Doing the geo query completely in Lucene for our search
in PANGAEA about a hundred of times faster in most cases (with TrieRange).
{quote}
Still, I think Yonik's main point is why reinvent the wheel when it comes to
things like distributed search and the need for custom code for indexing, etc.
when they likely can be handled through function queries and field types and
therefore all of Solr's current functionality would just work. The other
capabilities (like sorting by a FunctionQuery) is icing on the cake that helps
solve other problems as well.
{quote}
I also agree about thinking to reimplement specific parts of the code, that
may be done with "standard" Lucene/Solr tools (I would count TrieRange to that,
even as it is not "standard" today - but its generic and not bound to geo and
hopefully will move to Lucene Core as NumericRangeQuery & utils) easily.
In my opinion, LocalLucene should be as generic as possible and should not add
too many custom datatypes, specific index structures, fixed field names etc. A
problem of most GIS solutions for relational databases available on the world
is, that you are fixed to specific database schemas. E.g. for our search at
PANGAEA, we want to display the results of the Lucene Query also as Map. But
for that you cannot use common GIS solution, because they do not know how to
extract the data from Lucene.
Soon I will start a small project, to add a plugin to GeoServer's feature
store, that does not use RDBMS or shape files or whatever for the features,
instead use Lucene. Using that it may also be possible to retrieve the geo
objects (in our case data sets with lat/lon) and display them in a WMS using
OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API
(using TrieRange to support the bounding box filter) and so on.
About your benchmarks:
I suspect, that you have warmed up the readers, but I think you should get
faster performace out of TrieRange. In my opinion, you should not use doubles
for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to
get 7 decimal digits (which is surely enough for geo, 180*1E7 should be
<Integer.MAX_VALUE, too).
In general, the biggest speed improve of TrieRangecan be seen in comparison to
other range queries, if the range contains a lot of distinct values and so hit
many documents. E.g. you will also get 100 ms, if you do a search around the
african continent where thousands of hits are in, each having a different
lat/lon pair! How does LocalLucene behave with that?
Because of this, I would implement the Tiers using tint or tfloat or whatever.
> Incorporate Local Lucene/Solr
> -----------------------------
>
> Key: SOLR-773
> URL: https://issues.apache.org/jira/browse/SOLR-773
> Project: Solr
> Issue Type: New Feature
> Reporter: Grant Ingersoll
> Assignee: Grant Ingersoll
> Priority: Minor
> Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch,
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch,
> SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch,
> SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project. It has some Solr
> components, but we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.