Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "SolrAdaptersForLuceneSpatial4" page has been changed by DavidSmiley:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4?action=diff&rev1=4&rev2=5

Comment:
Updated intro text; more to come...

  
  = Lucene / Solr 4 Spatial =
  
- This document describes how to use the new spatial functionality in Lucene / 
Solr 4.  The bulk of the implementation lives in the new Lucene spatial module 
in v4 committed on March 13th.  It replaces the former "Lucene spatial contrib" 
in v3.  The Solr piece is small as it only needs to provide field types which 
are essentially adapters to the code in the Lucene spatial module.  
Furthermore, understand that the shape implementations and other core spatial 
code that isn't related to Lucene is held in another new open-source project 
called Spatial4j.  Presently, polygon support requires an additional dependency 
-- JTS.  As of this writing, 28-June 2012, the Solr portion has yet to be 
introduced into Solr trunk. It should come into Solr via SOLR-3304 "soon".
+ This document describes how to use the new spatial functionality in Lucene / 
Solr 4.  The bulk of the implementation lives in the new Lucene 4 spatial 
module.  It replaces the former "Lucene spatial contrib" in v3.  The Solr piece 
is small as it only needs to provide field types which are essentially adapters 
to the code in the Lucene spatial module.  Furthermore, understand that the 
shape implementations and other core spatial code that isn't related to Lucene 
is held in another new open-source project called 
[[https://github.com/spatial4j/spatial4j|Spatial4j]].  Presently, polygon 
support requires an additional dependency -- 
[[http://sourceforge.net/projects/jts-topo-suite/|JTS]].
  
  
  == New features, over Solr 3 spatial ==
  
- Note: "Solr 3 spatial" refers to the spatial support introduced in that 
version of Solr which still exists in v4.  Solr 3 spatial does ''not'' actually 
use Lucene 3's spatial contrib module aside from DistanceUtils.java.
+ Note: "Solr 3 spatial" refers to the spatial support introduced in that 
version of Solr which still exists in v4.  Except for a small utility class, 
Solr 3 spatial does ''not'' actually use Lucene 3's defunct spatial contrib 
module.
  
- These features describe what developer-users of Lucene/Solr 4 will 
appreciate.  Under the hood, it's a framework designed to be extended for 
different so-called spatial strategies.  I'll assume here the 
RecursivePrefixTreeStrategy as it should address most use-cases and it's has 
the best tests.
+ These features describe what developer-users of Lucene/Solr 4 will 
appreciate.  Under the hood, it's a framework designed to be extended for 
different so-called "spatial strategies".  I'll assume here the 
RecursivePrefixTreeStrategy as it should address most use-cases and it has the 
best tests.
  
-  * Multi-value indexes.  This is key for any project that geocodes natural 
language documents, since a variable number of locations are extracted from 
text.
-  * Index shapes with area, not just points.  An indexed shape is essentially 
pixelated (i.e. gridded) to a configured resolution per shape.  Note: If 
extremely high precision of the edges of the shape needs to be retained for 
accurate searching, then this solution probably won't scale well compared to 
other approaches such as those that index the bounding box but retain the 
original shape vector.  Note: this capability sorely needs testing.
-  * A polygon shape.  It can be the indexed shape or query shape.  Note: This 
requires the JTS dependency.  The polygon assumes a Mercator / Cartesian 
projection, and consequently doesn't support pole-wrap.  As of 1 June 2012 in 
Spatial4j 0.3-SNAPSHOT, it does support dateline crossing.
+  * Multi-valued indexed fields.  This is critical for storing the results of 
automatic place extraction from text using natural language processing 
techniques with a gazetteer (a variant of "geocoding"), since a variable number 
of locations will be found.
+  * Index shapes with area, not just points.  An indexed shape is essentially 
pixelated (i.e. gridded) to a configured resolution per shape.  By default that 
resolution is defined by a percentage of the overall shape size, and it applies 
to query shapes too.  Note: If extremely high precision of shape edges needs to 
be retained for accurate indexing, then this solution probably won't scale too 
well at indexing time (big indexes, slow indexing).  On the other hand, query 
shapes generally scale well to the maximum configured precision regardless of 
shape size.  Note: indexing shapes with area sorely 
[[https://issues.apache.org/jira/browse/LUCENE-4419|needs testing]].
+  * Polygon, LineString and other new shapes.  All shapes are supported as 
indexed shapes and query shapes.  Note: Shapes other than point, rectangle and 
circle are supported via JTS -- an otherwise optional dependency.  JTS views 
the world as a flat plane; the latitude and longitude are mapped to this plane 
directly.  It uses Euclidean math operations, not Geodesic ones.  By and large 
this isn't a problem, although it can be if the vertices are particularly far 
apart longitudinally.  Spatial4j adapts shapes that cross the dateline to be 
compatible with JTS, and you shouldn't notice a problem (notwithstanding 
unknown bugs).  It does not support shapes covering the poles yet.  
Consequently if you want to index or query by the Antarctica polygon for 
example, you are out of luck for now.
+  * Rectangles with user-specifiable corners.  Oddly, Solr 3 spatial only 
supports the bounding box of a circle. 
-  * Multi-value distance sort / score boost.  Note: this is a preliminary 
unoptimized implementation that uses a fair amount of RAM. 
+  * Multi-value distance sort / score boost.  Note: this is a preliminary 
unoptimized implementation that uses a fair amount of RAM.  An alternative 
should be provided in the future.
-  * Configurable precision which can vary per shape at both index & query 
time.  This enhances the performance.  Solr 3 indexes and queries based on the 
full precision of a double for latitude and longitude, which is excessive for 
nearly any use-case.
-  * Fast filtering.  The code was benchmarked once showing it outperforms Solr 
3's "LatLonType" at its own game (single valued indexed points), and a 3rd 
party anecdotally reported it was faster on his large index.  It hasn't been 
benchmarked in well over a year now though, and this is a TODO item.  Also, 
Solr 3 LatLonType sometimes requires all the points to be in memory, whereas 
the new spatial module here doesn't for filtering.
+  * Configurable precision which can vary per shape at query time (and sort of 
at index time).  This enhances the performance.
+  * Fast filtering.  The code was benchmarked once showing it outperforms Solr 
3's "LatLonType" at its own game (single valued indexed points), and several 
3rd parties anecdotally reported the same, especially for multi-million 
document indices.  It is based on SOLR-2155 which was benchmarked in January 
2010; so a new benchmark is a TODO item.  Also, Solr 3 LatLonType sometimes 
requires all the points to be in memory, whereas the new spatial module here 
doesn't for filtering.
  
  Of course, the basics in Solr 3 not mentioned here are implemented in this 
framework.  For example, lat-lon bounding boxes and circles.
  

Reply via email to