[ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708825#action_12708825 ]
Uwe Schindler commented on SOLR-773: ------------------------------------ {quote} Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code. {quote} I aggree with iterating about the patch and also LocalLucene (not only LocalSolr). {quote} On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know. {quote} Yes, and we tried solutions in the past that use unique doc ids to do joins between RDBMS used for geo search and Lucene used for the full text part. The biggest problem is, that this join operations are very inefficient if many documents are affected. Lucene as a full text engine has the great advantage to display the results very fast without retrieving the whole hits (you normally display only the best ranking ones). If you combine with data bases, you have to intersect the results in a HitCollector during filling the PriorityQueue. RDBMS have the problem to always have "transactions" around select statements and will only deliver the results, when the query is completely done. This puts an additional time lag. Doing the geo query completely in Lucene for our search in PANGAEA about a hundred of times faster in most cases (with TrieRange). {quote} Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well. {quote} I also agree about thinking to reimplement specific parts of the code, that may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, even as it is not "standard" today - but its generic and not bound to geo and hopefully will move to Lucene Core as NumericRangeQuery & utils) easily. In my opinion, LocalLucene should be as generic as possible and should not add too many custom datatypes, specific index structures, fixed field names etc. A problem of most GIS solutions for relational databases available on the world is, that you are fixed to specific database schemas. E.g. for our search at PANGAEA, we want to display the results of the Lucene Query also as Map. But for that you cannot use common GIS solution, because they do not know how to extract the data from Lucene. Soon I will start a small project, to add a plugin to GeoServer's feature store, that does not use RDBMS or shape files or whatever for the features, instead use Lucene. Using that it may also be possible to retrieve the geo objects (in our case data sets with lat/lon) and display them in a WMS using OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API (using TrieRange to support the bounding box filter) and so on. About your benchmarks: I suspect, that you have warmed up the readers, but I think you should get faster performace out of TrieRange. In my opinion, you should not use doubles for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to get 7 decimal digits (which is surely enough for geo, 180*1E7 should be <Integer.MAX_VALUE, too). In general, the biggest speed improve of TrieRangecan be seen in comparison to other range queries, if the range contains a lot of distinct values and so hit many documents. E.g. you will also get 100 ms, if you do a search around the african continent where thousands of hits are in, each having a different lat/lon pair! How does LocalLucene behave with that? Because of this, I would implement the Tiers using tint or tfloat or whatever. > Incorporate Local Lucene/Solr > ----------------------------- > > Key: SOLR-773 > URL: https://issues.apache.org/jira/browse/SOLR-773 > Project: Solr > Issue Type: New Feature > Reporter: Grant Ingersoll > Assignee: Grant Ingersoll > Priority: Minor > Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, > SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, > SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch, > SOLR-773.patch, spatial-solr.tar.gz > > > Local Lucene has been donated to the Lucene project. It has some Solr > components, but we should evaluate how best to incorporate it into Solr. > See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.