Re: Solr geospatial index?

2015-01-11 Thread Matteo Tarantino
Wow, thank you David!
You are really kind to spend your time writing all these informations to
me. This will be very helpful for my thesis work.

Thank you again.
MT



2015-01-11 2:46 GMT+01:00 david.w.smi...@gmail.com david.w.smi...@gmail.com
:

 Hello Matteo,

 Welcome. You are not bothering/me-us; you are asking in the right place.

 Jack’s right in terms of the field type dictating how it works.

 LatLonType, simply stores the latitude and longitude internally as
 separate floating point fields and it does efficient range queries over
 them for bounding-box queries.  Lucene has remarkably fast/efficient range
 queries over numbers based on a Trie/PrefixTree. In fact systems like
 TitanDB leave such queries to Lucene.  For point-radius, it iterates over
 all of them in-memory in a brute-force fashion (not scalable but may be
 fine).

 BBoxField is similar in spirit to LatLonType; each side of an indexed
 rectangle gets its own floating point field internally.

 Note that for both listed above, the underlying storage and range queries
 use built-in numeric fields.

 SpatialRecursivePrefixTreeFieldType (RPT for short) is interesting in that
 it supports indexing essentially any shape by representing the indexed
 shape as multiple grid squares.  Non-point shapes (e.g. a polygon) are
 approximated; if you need accuracy, you should additionally store the
 vector geometry and validate the results in a 2nd pass (see
 SerializedDVStrategy for help with that).  RPT, like Lucene’s numeric
 fields, uses a Trie/PrefixTree but encodes two dimensions, not one.

 The Trie/PrefixTree concept underlies both RPT and numeric fields, which
 are approaches to using Lucene’s terms index to encode prefixes.  So the
 big point here is that Lucene/Solr doesn’t have side indexes using
 fundamentally different technologies for different types of data; no;
 Lucene’s one versatile index looks up terms (for keyword search), numbers,
 AND 2-d spatial.  For keyword search, the term is a word, for numbers, the
 term represents a contiguous range of values (e.g. 100-200), and for 2-d
 spatial, a term is a grid square (a 2-D range).

 I am aware many other DBs put spatial data in R-Trees, and I have no
 interest investing energy in doing that in Lucene.  That isn’t to say I
 think that other DBs shouldn’t be using R-Trees.  I think a system based on
 sorted keys/terms (like Lucene and Cassandra, Accumulo, HBase, and others)
 already have a powerful/versatile index such that it doesn’t warrant
 complexity in adding something different.  And Lucene’s underlying index
 continues to improve.  I am most excited about an “auto-prefixing”
 technique McCandless has been working on that will bring performance up to
 the next level for numeric  spatial data in Lucene’s index.

 If you’d like to learn more about RPT and Lucene/Solr spatial, I suggest
 my “Spatial Deep Dive” presentation at Lucene Revolution in San Diego, May
 2013:  Lucene / Solr 4 Spatial Deep Dive
 https://www.youtube.com/watch?v=L2cUGv0Rebslist=PLsj1Ri57ZE94ulvk2vI_WoJrDYs3ckmH0index=31
 Also, my article here illustrates some RPT concepts in terms of indexing:
 http://opensourceconnections.com/blog/2014/04/11/indexing-polygons-in-lucene-with-accuracy/

 ~ David Smiley
 Freelance Apache Lucene/Solr Search Consultant/Developer
 http://www.linkedin.com/in/davidwsmiley

 On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino 
 matteo.tarant...@gmail.com wrote:

 Hi all,
 I hope to not bother you, but I think I'm writing to the only mailing
 list that can help me with my question.

 I am writing my master thesis about Geographical Information Retrieval
 (GIR) and I'm using Solr to create a little geospatial search engine.
 Reading  papers about GIR I noticed that these systems use a separate data
 structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
 geographical coordinates of documents, but I have found nothing about how
 Solr manages coordinates.

 Can someone help me, and most of all, can someone address me to documents
 that talk about how and where Solr saves spatial informations?

 Thank you in advance
 Matteo





Solr geospatial index?

2015-01-10 Thread Matteo Tarantino
Hi all,
I hope to not bother you, but I think I'm writing to the only mailing list
that can help me with my question.

I am writing my master thesis about Geographical Information Retrieval
(GIR) and I'm using Solr to create a little geospatial search engine.
Reading  papers about GIR I noticed that these systems use a separate data
structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
geographical coordinates of documents, but I have found nothing about how
Solr manages coordinates.

Can someone help me, and most of all, can someone address me to documents
that talk about how and where Solr saves spatial informations?

Thank you in advance
Matteo


Re: Solr geospatial index?

2015-01-10 Thread Matteo Tarantino
Thank you for your reply,
I have read the documentation, but I still don't understand if Solr creates
or not two different indexes, one for the text of the documents and one for
the geographic information of the document (something like this:
http://imgur.com/E0R3alo )

2015-01-10 17:03 GMT+01:00 Jack Krupansky jack.krupan...@gmail.com:

 See the Solr reference guide section on Spatial Search:
 https://cwiki.apache.org/confluence/display/solr/Spatial+Search

 -- Jack Krupansky

 On Sat, Jan 10, 2015 at 10:26 AM, Matteo Tarantino 
 matteo.tarant...@gmail.com wrote:

 Hi all,
 I hope to not bother you, but I think I'm writing to the only mailing
 list that can help me with my question.

 I am writing my master thesis about Geographical Information Retrieval
 (GIR) and I'm using Solr to create a little geospatial search engine.
 Reading  papers about GIR I noticed that these systems use a separate data
 structure (like an R-tree http://it.wikipedia.org/wiki/R-tree) to save
 geographical coordinates of documents, but I have found nothing about how
 Solr manages coordinates.

 Can someone help me, and most of all, can someone address me to documents
 that talk about how and where Solr saves spatial informations?

 Thank you in advance
 Matteo