[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15838651#comment-15838651 ] David Smiley commented on SOLR-5170: Jeff, I wound up doing this today; see SOLR-10039. I plan to close this issue on the completion of that issue. > Spatial multi-value distance sort via DocValues > --- > > Key: SOLR-5170 > URL: https://issues.apache.org/jira/browse/SOLR-5170 > Project: Solr > Issue Type: New Feature > Components: spatial >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt > > > The attached patch implements spatial multi-value distance sorting. In other > words, a document can have more than one point per field, and using a > provided function query, it will return the distance to the closest point. > The data goes into binary DocValues, and as-such it's pretty friendly to > realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15816053#comment-15816053 ] Jeff Wartes commented on SOLR-5170: --- Well, yes, I'm interested. I've got enough other work projects going at the moment I'm not sure if I'll be able to dedicate much time in the next month or two, but I wouldn't mind trying to chip at it. I don't want to pollute this issue, so if you have a few minutes, and could drop me an email with any pointers about the code areas involved, or references to any prior art you're aware of, I expect that'd accelerate things a lot. Thanks. > Spatial multi-value distance sort via DocValues > --- > > Key: SOLR-5170 > URL: https://issues.apache.org/jira/browse/SOLR-5170 > Project: Solr > Issue Type: New Feature > Components: spatial >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt > > > The attached patch implements spatial multi-value distance sorting. In other > words, a document can have more than one point per field, and using a > provided function query, it will return the distance to the closest point. > The data goes into binary DocValues, and as-such it's pretty friendly to > realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812936#comment-15812936 ] David Smiley commented on SOLR-5170: The fastest is very likely "LatLonDocValuesField", currently hiding out in Lucene sandbox. There are some really clever tricks it does. Interested in adding a Solr adapter for it? > Spatial multi-value distance sort via DocValues > --- > > Key: SOLR-5170 > URL: https://issues.apache.org/jira/browse/SOLR-5170 > Project: Solr > Issue Type: New Feature > Components: spatial >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt > > > The attached patch implements spatial multi-value distance sorting. In other > words, a document can have more than one point per field, and using a > provided function query, it will return the distance to the closest point. > The data goes into binary DocValues, and as-such it's pretty friendly to > realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15812871#comment-15812871 ] Jeff Wartes commented on SOLR-5170: --- It's coming up on two years, and I'm aware there have been some significant changes to areas like docvalues and geospatial since the last update to this issue. What's the state of the world now? If you have entities with multiple locations, and you want to filter and sort, is this patch still the highest-performance option available? I'm more willing to give up on the real-time-friendliness these days, if that changes the answer. > Spatial multi-value distance sort via DocValues > --- > > Key: SOLR-5170 > URL: https://issues.apache.org/jira/browse/SOLR-5170 > Project: Solr > Issue Type: New Feature > Components: spatial >Reporter: David Smiley >Assignee: David Smiley > Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, > SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt > > > The attached patch implements spatial multi-value distance sorting. In other > words, a document can have more than one point per field, and using a > provided function query, it will return the distance to the closest point. > The data goes into binary DocValues, and as-such it's pretty friendly to > realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509521#comment-14509521 ] Jeff Wartes commented on SOLR-5170: --- I got tired of maintaining a custom solr build process for the sole purpose of this patch at my work, especially given the deployment changes in Solr 5.0. Since this patch really just adds new classes, I pulled those files out into a freestanding repository that builds a jar, copied the necessary infrastructure to allow the tests to run, and posted that here: https://github.com/randomstatistic/SOLR-5170 This repo contains the necessary API changes to the patch to support Solr 5.0. I have not bothered to update the patch in Jira here with those changes, and going forward, I'll probably continue to only push changes to that repo unless someone asks otherwise. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14048437#comment-14048437 ] David Smiley commented on SOLR-5170: Thanks for maintaining the patch, [~jwartes]. Sorry, I won't have time for awhile to get to this, which is kinda blocked by another issue ( SOLR-4329 ). Going with the SortedSetDocValues approach is kinda tempting. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, SOLR-5170_spatial_multi-value_sort_via_docvalues.patch, SOLR-5170_spatial_multi-value_sort_via_docvalues.patch.txt The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.2#6252) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13864738#comment-13864738 ] Jeff Wartes commented on SOLR-5170: --- I've been using this patch with some minor tweaks and solr 4.3.1 in production for about six months now. Since I was applying it again against 4.6 this morning, I figured I should attach my tweaks, and mention it passes tests against 4.6. This does NOT address the design issues David raises in the initial comment. The changes vs the initial patchfile allow it to be applied against a greater range of solr versions, and brings it a little closer to feeling the same as geofilt's params. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message was sent by Atlassian JIRA (v6.1.5#6160) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748324#comment-13748324 ] David Smiley commented on SOLR-5170: I'm slowly working on a benchmark using [Google Caliper|http://code.google.com/p/caliper/]; but I have limited time on vacation at the moment. Bill: it adds up is not a memory concern, it's speed/performance overhead. And your reference to geofilt and caching is largely irrelevant -- this is about sorting. The cache in question (be it DocValues or whatever) is to put all points in memory, it's *not* distance sorted results that may or may not be likely to be re-used in another query. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748318#comment-13748318 ] Bill Bell commented on SOLR-5170: - David, How many points is the limit when it adds up? Does it give an OOM exception? Or does it just take longer and longer to respond? In most use cases there is almost no need to cache the geo spatial search results, since most users are running queries from multiple locations (with GEO IP) targeting. At least that is our use case. If the corpus of points is high, is there an approximation that can be use to reduce it and then run the Circle radius? For example fq={!cache=false cost=10}lat:[X to Y] AND long:[X1 to Y1] and apply the fq={!geofilt cost=100} or geodist ? We have found that doing that speeds things up... Wonder if the code could just do that for us ? Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742905#comment-13742905 ] Robert Muir commented on SOLR-5170: --- Err, that conversation is both wrong and totally irrelevant. Its based on some bogus apples and oranges faceting benchmarks those guys did before: where they spent lots of time optimizing that silly facet vint decode, whereas sortedset is the simplest thing that can work and was done in like 2 days. Ive said it before, I think its good to reinvestigate removing the BINARY type completely. If i have to go optimize some loops somewhere in order to make that happen, fine, its worth it to me to remove this useless shit. I don't think you should refactor solr around broken assumptions and misleading benchmarks. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742654#comment-13742654 ] Robert Muir commented on SOLR-5170: --- why use BINARY vs SORTED_SET? that has a much easier fit in solr to boot. its designed for multiple values... Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5170) Spatial multi-value distance sort via DocValues
[ https://issues.apache.org/jira/browse/SOLR-5170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13742738#comment-13742738 ] David Smiley commented on SOLR-5170: Hi Rob. The other DocValues types are a better fit to Solr's API, yes. Assuming each point is encoded into 8 bytes (2x4 binary encoded floats) and added as a value with SortedSetDocValuesField, this still means one lookup per point. If there are a lot of points per document, then the overhead adds up ([as Shai noted|https://issues.apache.org/jira/browse/LUCENE-4583?focusedCommentId=13652097page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13652097]). Granted I didn't measure this overhead, but I'd rather SOLR-4329 get addressed somehow so BinaryDocValues can be used elegantly and then users don't have to pay an unnecessary price per point dereference. Spatial multi-value distance sort via DocValues --- Key: SOLR-5170 URL: https://issues.apache.org/jira/browse/SOLR-5170 Project: Solr Issue Type: New Feature Components: spatial Reporter: David Smiley Assignee: David Smiley Attachments: SOLR-5170_spatial_multi-value_sort_via_docvalues.patch The attached patch implements spatial multi-value distance sorting. In other words, a document can have more than one point per field, and using a provided function query, it will return the distance to the closest point. The data goes into binary DocValues, and as-such it's pretty friendly to realtime search requirements, and it only uses 8 bytes per point. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org