Dear Wiki user, You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.
The "SolrAdaptersForLuceneSpatial4" page has been changed by DavidSmiley: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4?action=diff&rev1=4&rev2=5 Comment: Updated intro text; more to come... = Lucene / Solr 4 Spatial = - This document describes how to use the new spatial functionality in Lucene / Solr 4. The bulk of the implementation lives in the new Lucene spatial module in v4 committed on March 13th. It replaces the former "Lucene spatial contrib" in v3. The Solr piece is small as it only needs to provide field types which are essentially adapters to the code in the Lucene spatial module. Furthermore, understand that the shape implementations and other core spatial code that isn't related to Lucene is held in another new open-source project called Spatial4j. Presently, polygon support requires an additional dependency -- JTS. As of this writing, 28-June 2012, the Solr portion has yet to be introduced into Solr trunk. It should come into Solr via SOLR-3304 "soon". + This document describes how to use the new spatial functionality in Lucene / Solr 4. The bulk of the implementation lives in the new Lucene 4 spatial module. It replaces the former "Lucene spatial contrib" in v3. The Solr piece is small as it only needs to provide field types which are essentially adapters to the code in the Lucene spatial module. Furthermore, understand that the shape implementations and other core spatial code that isn't related to Lucene is held in another new open-source project called [[https://github.com/spatial4j/spatial4j|Spatial4j]]. Presently, polygon support requires an additional dependency -- [[http://sourceforge.net/projects/jts-topo-suite/|JTS]]. == New features, over Solr 3 spatial == - Note: "Solr 3 spatial" refers to the spatial support introduced in that version of Solr which still exists in v4. Solr 3 spatial does ''not'' actually use Lucene 3's spatial contrib module aside from DistanceUtils.java. + Note: "Solr 3 spatial" refers to the spatial support introduced in that version of Solr which still exists in v4. Except for a small utility class, Solr 3 spatial does ''not'' actually use Lucene 3's defunct spatial contrib module. - These features describe what developer-users of Lucene/Solr 4 will appreciate. Under the hood, it's a framework designed to be extended for different so-called spatial strategies. I'll assume here the RecursivePrefixTreeStrategy as it should address most use-cases and it's has the best tests. + These features describe what developer-users of Lucene/Solr 4 will appreciate. Under the hood, it's a framework designed to be extended for different so-called "spatial strategies". I'll assume here the RecursivePrefixTreeStrategy as it should address most use-cases and it has the best tests. - * Multi-value indexes. This is key for any project that geocodes natural language documents, since a variable number of locations are extracted from text. - * Index shapes with area, not just points. An indexed shape is essentially pixelated (i.e. gridded) to a configured resolution per shape. Note: If extremely high precision of the edges of the shape needs to be retained for accurate searching, then this solution probably won't scale well compared to other approaches such as those that index the bounding box but retain the original shape vector. Note: this capability sorely needs testing. - * A polygon shape. It can be the indexed shape or query shape. Note: This requires the JTS dependency. The polygon assumes a Mercator / Cartesian projection, and consequently doesn't support pole-wrap. As of 1 June 2012 in Spatial4j 0.3-SNAPSHOT, it does support dateline crossing. + * Multi-valued indexed fields. This is critical for storing the results of automatic place extraction from text using natural language processing techniques with a gazetteer (a variant of "geocoding"), since a variable number of locations will be found. + * Index shapes with area, not just points. An indexed shape is essentially pixelated (i.e. gridded) to a configured resolution per shape. By default that resolution is defined by a percentage of the overall shape size, and it applies to query shapes too. Note: If extremely high precision of shape edges needs to be retained for accurate indexing, then this solution probably won't scale too well at indexing time (big indexes, slow indexing). On the other hand, query shapes generally scale well to the maximum configured precision regardless of shape size. Note: indexing shapes with area sorely [[https://issues.apache.org/jira/browse/LUCENE-4419|needs testing]]. + * Polygon, LineString and other new shapes. All shapes are supported as indexed shapes and query shapes. Note: Shapes other than point, rectangle and circle are supported via JTS -- an otherwise optional dependency. JTS views the world as a flat plane; the latitude and longitude are mapped to this plane directly. It uses Euclidean math operations, not Geodesic ones. By and large this isn't a problem, although it can be if the vertices are particularly far apart longitudinally. Spatial4j adapts shapes that cross the dateline to be compatible with JTS, and you shouldn't notice a problem (notwithstanding unknown bugs). It does not support shapes covering the poles yet. Consequently if you want to index or query by the Antarctica polygon for example, you are out of luck for now. + * Rectangles with user-specifiable corners. Oddly, Solr 3 spatial only supports the bounding box of a circle. - * Multi-value distance sort / score boost. Note: this is a preliminary unoptimized implementation that uses a fair amount of RAM. + * Multi-value distance sort / score boost. Note: this is a preliminary unoptimized implementation that uses a fair amount of RAM. An alternative should be provided in the future. - * Configurable precision which can vary per shape at both index & query time. This enhances the performance. Solr 3 indexes and queries based on the full precision of a double for latitude and longitude, which is excessive for nearly any use-case. - * Fast filtering. The code was benchmarked once showing it outperforms Solr 3's "LatLonType" at its own game (single valued indexed points), and a 3rd party anecdotally reported it was faster on his large index. It hasn't been benchmarked in well over a year now though, and this is a TODO item. Also, Solr 3 LatLonType sometimes requires all the points to be in memory, whereas the new spatial module here doesn't for filtering. + * Configurable precision which can vary per shape at query time (and sort of at index time). This enhances the performance. + * Fast filtering. The code was benchmarked once showing it outperforms Solr 3's "LatLonType" at its own game (single valued indexed points), and several 3rd parties anecdotally reported the same, especially for multi-million document indices. It is based on SOLR-2155 which was benchmarked in January 2010; so a new benchmark is a TODO item. Also, Solr 3 LatLonType sometimes requires all the points to be in memory, whereas the new spatial module here doesn't for filtering. Of course, the basics in Solr 3 not mentioned here are implemented in this framework. For example, lat-lon bounding boxes and circles.