[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-25 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838651#comment-15838651
 ] 

David Smiley commented on SOLR-5170:


Jeff, I wound up doing this today; see SOLR-10039.  I plan to close this issue 
on the completion of that issue.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-10 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816053#comment-15816053
 ] 

Jeff Wartes commented on SOLR-5170:
---

Well, yes, I'm interested. I've got enough other work projects going at the 
moment I'm not sure if I'll be able to dedicate much time in the next month or 
two, but I wouldn't mind trying to chip at it.

I don't want to pollute this issue, so if you have a few minutes, and could 
drop me an email with any pointers about the code areas involved, or references 
to any prior art you're aware of, I expect that'd accelerate things a lot. 
Thanks.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-09 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812936#comment-15812936
 ] 

David Smiley commented on SOLR-5170:


The fastest is very likely "LatLonDocValuesField", currently hiding out in 
Lucene sandbox.  There are some really clever tricks it does.

Interested in adding a Solr adapter for it?

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2017-01-09 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812871#comment-15812871
 ] 

Jeff Wartes commented on SOLR-5170:
---

It's coming up on two years, and I'm aware there have been some significant 
changes to areas like docvalues and geospatial since the last update to this 
issue. 

What's the state of the world now? 
If you have entities with multiple locations, and you want to filter and sort, 
is this patch still the highest-performance option available? I'm more willing 
to give up on the real-time-friendliness these days, if that changes the answer.

> Spatial multi-value distance sort via DocValues
> ---
>
> Key: SOLR-5170
> URL: https://issues.apache.org/jira/browse/SOLR-5170
> Project: Solr
>  Issue Type: New Feature
>  Components: spatial
>Reporter: David Smiley
>Assignee: David Smiley
> Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
> SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt
>
>
> The attached patch implements spatial multi-value distance sorting.  In other 
> words, a document can have more than one point per field, and using a 
> provided function query, it will return the distance to the closest point.  
> The data goes into binary DocValues, and as-such it's pretty friendly to 
> realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2015-04-23 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509521#comment-14509521
 ] 

Jeff Wartes commented on SOLR-5170:
---


I got tired of maintaining a custom solr build process for the sole purpose of 
this patch at my work, especially given the deployment changes in Solr 5.0.
Since this patch really just adds new classes, I pulled those files out into a 
freestanding repository that builds a jar, copied the necessary infrastructure 
to allow the tests to run, and posted that here:

https://github.com/randomstatistic/SOLR-5170

This repo contains the necessary API changes to the patch to support Solr 5.0. 
I have not bothered to update the patch in Jira here with those changes, and 
going forward, I'll probably continue to only push changes to that repo unless 
someone asks otherwise.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-06-30 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048437#comment-14048437
 ] 

David Smiley commented on SOLR-5170:


Thanks for maintaining the patch, [~jwartes].  Sorry, I won't have time for 
awhile to get to this, which is kinda blocked by another issue ( SOLR-4329 ).  
Going with the SortedSetDocValues approach is kinda tempting.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, 
 SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2014-01-07 Thread Jeff Wartes (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864738#comment-13864738
 ] 

Jeff Wartes commented on SOLR-5170:
---

I've been using this patch with some minor tweaks and solr 4.3.1 in production 
for about six months now. Since I was applying it again against 4.6 this 
morning, I figured I should attach my tweaks, and mention it passes tests 
against 4.6.

This does NOT address the design issues David raises in the initial comment. 
The changes vs the initial patchfile allow it to be applied against a greater 
range of solr versions, and brings it a little closer to feeling the same as 
geofilt's params.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-23 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748324#comment-13748324
 ] 

David Smiley commented on SOLR-5170:


I'm slowly working on a benchmark using [Google 
Caliper|http://code.google.com/p/caliper/]; but I have limited time on vacation 
at the moment.

Bill: it adds up is not a memory concern, it's speed/performance overhead.  
And your reference to geofilt and caching is largely irrelevant -- this is 
about sorting.  The cache in question (be it DocValues or whatever) is to put 
all points in memory, it's *not* distance sorted results that may or may not be 
likely to be re-used in another query.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-22 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748318#comment-13748318
 ] 

Bill Bell commented on SOLR-5170:
-

David,

How many points is the limit when it adds up? Does it give an OOM exception? 
Or does it just take longer and longer to respond? 

In most use cases there is almost no need to cache the geo spatial search 
results, since most users are running queries from multiple locations (with GEO 
IP) targeting. At least that is our use case. If the corpus of points is high, 
is there an approximation that can be use to reduce it and then run the Circle 
radius? For example fq={!cache=false cost=10}lat:[X to Y] AND long:[X1 to Y1] 
and apply the fq={!geofilt cost=100} or geodist ?

We have found that doing that speeds things up... Wonder if the code could just 
do that for us ?



 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-17 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742905#comment-13742905
 ] 

Robert Muir commented on SOLR-5170:
---

Err, that conversation is both wrong and totally irrelevant. Its based on some 
bogus apples and oranges faceting benchmarks those guys did before: where they 
spent lots of time optimizing that silly facet vint decode, whereas sortedset 
is the simplest thing that can work and was done in like 2 days.

Ive said it before, I think its good to reinvestigate removing the BINARY type 
completely. If i have to go optimize some loops somewhere in order to make that 
happen, fine, its worth it to me to remove this useless shit. 

I don't think you should refactor solr around broken assumptions and misleading 
benchmarks.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-16 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742654#comment-13742654
 ] 

Robert Muir commented on SOLR-5170:
---

why use BINARY vs SORTED_SET? that has a much easier fit in solr to boot. its 
designed for multiple values...

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues

2013-08-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742738#comment-13742738
 ] 

David Smiley commented on SOLR-5170:


Hi Rob.
The other DocValues types are a better fit to Solr's API, yes.  Assuming each 
point is encoded into 8 bytes (2x4 binary encoded floats) and added as a value 
with SortedSetDocValuesField, this still means one lookup per point.  If there 
are a lot of points per document, then the overhead adds up ([as Shai 
noted|https://issues.apache.org/jira/browse/LUCENE-4583?focusedCommentId=13652097page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13652097]).
  Granted I didn't measure this overhead, but I'd rather SOLR-4329 get 
addressed somehow so BinaryDocValues can be used elegantly and then users don't 
have to pay an unnecessary price per point dereference.

 Spatial multi-value distance sort via DocValues
 ---

 Key: SOLR-5170
 URL: https://issues.apache.org/jira/browse/SOLR-5170
 Project: Solr
  Issue Type: New Feature
  Components: spatial
Reporter: David Smiley
Assignee: David Smiley
 Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch


 The attached patch implements spatial multi-value distance sorting.  In other 
 words, a document can have more than one point per field, and using a 
 provided function query, it will return the distance to the closest point.  
 The data goes into binary DocValues, and as-such it's pretty friendly to 
 realtime search requirements, and it only uses 8 bytes per point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org