Re: Sorting results for spatial search

2018-02-01 Thread Leila Deljkovic
Hey David,

Thanks for your suggestions! I think I’ve got the right behaviour now; I’ve 
done fq={!parent which=is_parent:true score=total v='+is_parent:false 
+{!func}density'} desc instead of sort=…

Side note: the grid cells can be POLYGON or MULTIPOLYGON, so BBoxField didn’t 
work when I tried it, so had to resort to RptWithGeometrySpatialField. Not sure 
how it will do performance wise yet.



> On 2/02/2018, at 5:24 AM, David Smiley <david.w.smi...@gmail.com> wrote:
> 
> quote: "The problem is that this includes children that DON’T touch the
> search area in the sum. How can I only include the shapes from the first
> query above in my sort?"
> 
> Unless I'm misunderstanding your intent, I think this is a simple matter of
> adding the spatial filter to the parent join query you are sorting on.  So
> something like this (not tested):
> 
> =query($sortQ) desc
> ={!parent which=is_parent:true score=total}
>  +is_parent:false
>  +{!func}density
>  +gridcell_rpt:"Intersects(POLYGON((-20 70, -50 80, -20 20, 30 60, -10 40,
> -20 70)))"
> 
> Separately from your question, you state that these are grid cells and thus
> rectangles.  For rectangles, I recommend using BBoxField, which will
> probably overall perform better (smaller index, faster queries).  If you
> need an RPT field nonetheless (heatmaps?) then you could use the more
> concise ENVELOPE syntax but it shouldn't matter since a polygon that is a
> rectangle will internally be optimized to be one.
> 
> On Wed, Jan 31, 2018 at 3:33 PM Leila Deljkovic <
> leila.deljko...@koordinates.com> wrote:
> 
>> Hiya,
>> 
>> So I have some nested documents in my index with this kind of structure:
>> {
>>"id": “parent",
>>"gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
>>"density": “30"
>> 
>>"_childDocuments_" : [
>>{
>>"id":"child1",
>>"gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
>>"density":"25"
>>},
>>{
>>"id":"child2",
>>"gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15
>> 5)))",
>>"density":"5"
>>}
>>]
>> }
>> 
>> The parent document is a WKT shape, and its children are “grid cells”,
>> which are just divisions of the main shape (ie; cutting up the parent shape
>> to get children shapes). The “density" is the feature count in each shape.
>> When I query (through the Solr UI) I use “Intersects” to return parents
>> which touch the search area (note that if a child is touching, the parent
>> must also be touching).
>> 
>>eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80,
>> -20 20, 30 60, -10 40, -20 70)))
>> 
>> and I want to sort the results by the sum of the densities of all the
>> children touching the search area (so which parent has children that touch
>> the search area, and how big the sum of these children’s densities is)
>>something like {!parent which=is_parent:true score=total
>> v='+is_parent:false +{!func}density'} desc
>> 
>> The problem is that this includes children that DON’T touch the search
>> area in the sum. How can I only include the shapes from the first query
>> above in my sort?
>> 
>> Cheers :)
> 
> -- 
> Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com



Sorting results for spatial search

2018-01-31 Thread Leila Deljkovic
Hiya,

So I have some nested documents in my index with this kind of structure:
{   
"id": “parent",
"gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
"density": “30"

"_childDocuments_" : [
{
"id":"child1",
"gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
"density":"25"
},
{
"id":"child2",
"gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15 5)))",
"density":"5"
}
]
}

The parent document is a WKT shape, and its children are “grid cells”, which 
are just divisions of the main shape (ie; cutting up the parent shape to get 
children shapes). The “density" is the feature count in each shape. When I 
query (through the Solr UI) I use “Intersects” to return parents which touch 
the search area (note that if a child is touching, the parent must also be 
touching).

eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80, -20 
20, 30 60, -10 40, -20 70)))

and I want to sort the results by the sum of the densities of all the children 
touching the search area (so which parent has children that touch the search 
area, and how big the sum of these children’s densities is)
something like {!parent which=is_parent:true score=total 
v='+is_parent:false +{!func}density'} desc

The problem is that this includes children that DON’T touch the search area in 
the sum. How can I only include the shapes from the first query above in my 
sort?

Cheers :)

Sort for spatial search

2018-01-30 Thread Leila Deljkovic
Hiya,

So I have some nested documents in my index with this kind of structure:
{   
"id": “parent",
"gridcell_rpt": "POLYGON((30 10, 40 40, 20 40, 10 20, 30 10))",
"density": “30"

"_childDocuments_" : [
{
"id":"child1",
"gridcell_rpt":"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)))",
"density":"25"
},
{
"id":"child2",
"gridcell_rpt":"MULTIPOLYGON(((15 5, 40 10, 10 20, 5 10, 15 5)))",
"density":"5"
}
]
}

The parent document is a WKT shape, and its children are “grid cells”, which 
are just divisions of the main shape (ie; cutting up the parent shape to get 
children shapes). The “density" is the feature count in each shape. When I 
query (through the Solr UI) I use “Intersects” to return parents which touch 
the search area (note that if a child is touching, the parent must also be 
touching).

eg; fq={!field f=gridcell_rpt}Intersects(POLYGON((-20 70, -50 80, -20 
20, 30 60, -10 40, -20 70)))

and I want to sort the results by the sum of the densities of all the children 
touching the search area (so which parent has children that touch the search 
area, and how big the sum of these children’s densities is)
something like {!parent which=is_parent:true score=total 
v='+is_parent:false +{!func}density'} desc

The problem is that this includes children that DON’T touch the search area in 
the sum. How can I only include the shapes from the first query above in my 
sort?

Cheers :)

Re: Solr 7 spatial search and WKT

2018-01-17 Thread Leila Deljkovic
Hi Emir

I’ve been following one of the only examples I could find on how to index a 
POLYGON, which does specify the field as multiValued:

Configuration: schema.xml

 
Index a polygon (JavaScript syntax around WKT):
{"id":"1", "geo_rpt":
"POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”}

Indexing one MULTIPOLYGON works also, but trying to enter them as a list like 
you would for any other multiValued field does not work. I couldn’t find 
explicitly that RptWithGeometrySpatialField supports multiValued, but according 
to the Solr docs, it is derived from SpatialRecursivePrefixTreeFieldType (RPT) 
which supports multiValued and is “configured just like RPT except that the 
default distErrPct is 0.15 (higher than 0.025)…”

The reason I’m trying to index multiple shapes per document is that in the 
index, each “layer” (document) has grid cells associated with it (they help 
describe the density of features across the layer; more density = smaller grid 
cells in an area). Indexing the grid cells will allow me to figure out how 
relevant a result for a search extent on a map might be; a layer could cover an 
entire country but be dense in major cities, so if I am looking for a major 
city, I’d want to boost this search result. Hope that makes sense. I’m not sure 
if flattening into a single shape would work for this purpose.

Thanks :)

> On 17/01/2018, at 10:12 PM, Emir Arnautović <emir.arnauto...@sematext.com> 
> wrote:
> 
> Hi Leila,
> I haven’t been using spatial in a while and did not test this, but based on 
> error, it seems that multivalue is not supported for this field type. Can you 
> index a single MULTIPOLYGON? Why do you need to have multiple values? Can you 
> flat your geometry to a single MULTIPOLYGON or MULTIGEOMETRY (if supported)? 
> Can you explain why do you need to have multiValued field?
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 17 Jan 2018, at 00:09, Leila Deljkovic <leila.deljko...@koordinates.com> 
>> wrote:
>> 
>> Hi all,
>> 
>> I need to index multiple POLYGONS/MULTIPOLYGONS per document; I’m trying to 
>> use multiValued RptWithGeometrySpatialField and I’m getting this error:
>> 
>> Exception writing document id leila_test to the index; possible analysis 
>> error: DocValuesField "gridcell_rpt" appears more than once in this document 
>> (only one value is allowed per field)
>> 
>> This is what I’m indexing:
>> {
>>  "id": "leila_test",
>>  "gridcell_rpt": ["POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”, 
>> "MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 
>> 5)))"]
>> }
>> 
>> This is what’s in my schema.xml:
>>  
>>  …
>>  >   
>>  distanceUnits=”kilometers” autoIndex="true”/>
>> 
>> I’m pretty confused on why this isn’t working. I can’t find an example of 
>> multiValued RptWithGeometrySpatialField anywhere -_-
>> 
>> Thanks :)
> 



Solr 7 spatial search and WKT

2018-01-16 Thread Leila Deljkovic
Hi all,

I need to index multiple POLYGONS/MULTIPOLYGONS per document; I’m trying to use 
multiValued RptWithGeometrySpatialField and I’m getting this error:

Exception writing document id leila_test to the index; possible analysis error: 
DocValuesField "gridcell_rpt" appears more than once in this document (only one 
value is allowed per field)

This is what I’m indexing:
{
"id": "leila_test",
"gridcell_rpt": ["POLYGON((30 10, 10 20, 20 40, 40 40, 30 10))”, 
"MULTIPOLYGON(((30 20, 45 40, 10 40, 30 20)), ((15 5, 40 10, 10 20, 5 10, 15 
5)))"]
}

This is what’s in my schema.xml:

…

Spatial search - indexing WKT data

2018-01-14 Thread Leila Deljkovic
Hi,

I have some data in WKT string format (either POLYGON or MULTIPOLYGON) and I’d 
like to index it in Solr 7.0. As there are multiple polygons in every WKT 
string, I’d ideally like to index them multiValued BBoxField (couldn’t find 
anywhere to confirm, but it looks like multiValued is a valid attribute for 
BBoxField). Anyone indexed WKT data in Solr before? Is it necessary to convert 
it to CSV (I would do that first but I’m having trouble exporting it as CSV…)?

Thanks

Spatial search, nested docs, feature density

2018-01-10 Thread Leila Deljkovic
Hi,

https://lucene.apache.org/solr/guide/7_0/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
 


I have never used nested documents, but a bit of background on what I’m doing 
is that a spatial data layer consisting of features (points, lines, polygons, 
or an aerial image) is split up into sections (grid cells) based on the density 
of these features over the layer; smaller grid cells indicate high density of 
features in that area. 

I need to rank results based on density of features and whether dense areas of 
the layer overlap with the region of space on a map I am searching in. This is 
important because a layer could cover an entire country, for example if I query 
for “roads”, the layer would be dense in urban areas as there are more roads 
there, and less dense in rural areas, and if I am searching for a particular 
city, this layer would be of interest to me even though it covers the entire 
country. The idea is for the original layer to be the parent document (which is 
what should be returned when a query is made), and the child documents are the 
individual grid cells (which will hold the geometry of the cell and a density 
field for the features inside the cell). 

I would like to know if it is possible to rank the parent document based on a 
function which aggregates fields from the child documents (in this case, the 
density field). There is not much info on this that I could find online.

Thanks

Re: Spatial search (and nested docs)

2018-01-10 Thread Leila Deljkovic
Hi Emir,

Thanks for the reply. My problem has been simplified a bit now. 

https://lucene.apache.org/solr/guide/7_0/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments
 
<https://lucene.apache.org/solr/guide/7_0/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-NestedChildDocuments>

I have never used nested documents, but a bit of background is that a spatial 
data layer consisting of features (points, lines, polygons, or an aerial image) 
is split up into sections (grid cells) based on the density of these features 
over the layer; smaller grid cells indicate high density of features in that 
area. 

I need to rank results based on density of features and whether dense areas of 
the layer overlap with the region of space on a map I am searching in. This is 
important because a layer could cover an entire country, for example if I query 
for “roads”, the layer would be dense in urban areas as there are more roads 
there, and less dense in rural areas, and if I am searching for a particular 
city, this layer would be of interest to me even though it covers the entire 
country. The idea is for the original layer to be the parent document (which is 
what should be returned when a query is made), and the child documents are the 
individual grid cells (which will hold the geometry of the cell and a density 
field for the features inside the cell). 

I would like to know if it is possible to rank the parent document based on a 
function which aggregates fields from the child documents (in this case, the 
density field). There is not much info on this that I could find online.

Thanks

> On 10/01/2018, at 11:58 PM, Emir Arnautović <emir.arnauto...@sematext.com> 
> wrote:
> 
> Hi Leila,
> Maybe I need to refresh my spatial terminology, but I am having troubles 
> following your case. Can you explain a bit more, what is dataset that is 
> indexed and what are query inputs and what should be the result. The one 
> thing that puzzles me the most is “nested documents”.
> 
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> 
> 
> 
>> On 10 Jan 2018, at 04:15, Leila Deljkovic <leila.deljko...@koordinates.com> 
>> wrote:
>> 
>> Hi,
>> 
>> I’m quite new to Solr and am interested in using spatial search for 
>> geospatial data (Solr 7.1).
>> 
>> One problem is addressing feature density over a layer and using this to 
>> determine if a layer would be a relevant result over a search extent. I’d 
>> like to know is it feasible/possible to “split” a data layer into nested 
>> documents and index them, then at query time, count the number of nested 
>> documents that coincide with the search extent. Or maybe make use of 
>> overlapRatio or similar.
>> 
>> Thanks
>> 
>> 
> 



Spatial search

2018-01-09 Thread Leila Deljkovic
Hi,

I’m quite new to Solr and am interested in using spatial search for geospatial 
data (Solr 7.1).

One problem is addressing feature density over a layer and using this to 
determine if a layer would be a relevant result over a search extent. I’d like 
to know is it feasible/possible to “split” a data layer into nested documents 
and index them, then at query time, count the number of nested documents that 
coincide with the search extent. Or maybe make use of overlapRatio or similar.

Thanks