Re: Distance sort on a multi-value field
This is actually pretty far afield from my original subject, but it turns out that I also had issues with NRT and multi-field geospatial performance in Solr 4, so I'll follow that up. I've been testing and working with David's SOLR-5170 patch ever since he posted it, and I pushed it into production with only some cosmetic changes a few hours ago. I have a relatively low update and query rate for this particular query type, (something like 2 updates/sec, 10 queries/sec) but a short autosoftcommit time. (5 sec) Based on the data so far this patch looks like it's brought my average response time down from 4 seconds to about 50ms. Very nice! On 8/20/13 7:37 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: The distance sorting code in SOLR-2155 is roughly equivalent to the code that RPT uses (RPT has its lineage in SOLR-2155 after all). I just reviewed it to double-check. It's possible the behavior is slightly better in SOLR-2155 because the cache (a Solr cache) contains normal hard-references whereas RPT has one based on weak references, which will linger longer. But I think the likelihood of OOM is the same. Any way, the current best option is https://issues.apache.org/jira/browse/SOLR-5170 which I posted a few days ago. ~ David Billnbell wrote We have been using 2155 for over 6 months in production with over 2M hits every 10 minutes. No OOM yet. 2155 seems great, and would this issue be any worse than 2155? On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt; jwartes@ gt; wrote: Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc handle multi-values. I feel like multi-value interaction really deserves its own section in the documentation. I'm aware of the memory issue, but it seems like if you want sort multi-valued points, it's either this or try to pull in the 2155 patch. In general I'd rather go with the thing that's being maintained. Thanks for the code pointer. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. lt; dsmiley@ gt; wrote: On 8/14/13 2:26 PM, Jeff Wartes lt; jwartes@ gt; wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David -- Bill Bell billnbell@ cell 720-256-8076 - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp 4084666p4085797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distance sort on a multi-value field
Awesome! Be sure to watch the JIRA issue as it develops. The patch will improve (I've already improved it but not posted it) and one day a solution is bound to get committed. ~ David Jeff Wartes wrote This is actually pretty far afield from my original subject, but it turns out that I also had issues with NRT and multi-field geospatial performance in Solr 4, so I'll follow that up. I've been testing and working with David's SOLR-5170 patch ever since he posted it, and I pushed it into production with only some cosmetic changes a few hours ago. I have a relatively low update and query rate for this particular query type, (something like 2 updates/sec, 10 queries/sec) but a short autosoftcommit time. (5 sec) Based on the data so far this patch looks like it's brought my average response time down from 4 seconds to about 50ms. Very nice! On 8/20/13 7:37 PM, David Smiley (@MITRE.org) lt; DSMILEY@ gt; wrote: The distance sorting code in SOLR-2155 is roughly equivalent to the code that RPT uses (RPT has its lineage in SOLR-2155 after all). I just reviewed it to double-check. It's possible the behavior is slightly better in SOLR-2155 because the cache (a Solr cache) contains normal hard-references whereas RPT has one based on weak references, which will linger longer. But I think the likelihood of OOM is the same. Any way, the current best option is https://issues.apache.org/jira/browse/SOLR-5170 which I posted a few days ago. ~ David Billnbell wrote We have been using 2155 for over 6 months in production with over 2M hits every 10 minutes. No OOM yet. 2155 seems great, and would this issue be any worse than 2155? On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt; jwartes@ gt; wrote: Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc handle multi-values. I feel like multi-value interaction really deserves its own section in the documentation. I'm aware of the memory issue, but it seems like if you want sort multi-valued points, it's either this or try to pull in the 2155 patch. In general I'd rather go with the thing that's being maintained. Thanks for the code pointer. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. lt; dsmiley@ gt; wrote: On 8/14/13 2:26 PM, Jeff Wartes lt; jwartes@ gt; wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David -- Bill Bell billnbell@ cell 720-256-8076 - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp 4084666p4085797.html Sent from the Solr - User mailing list archive at Nabble.com. - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4086226.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distance sort on a multi-value field
The distance sorting code in SOLR-2155 is roughly equivalent to the code that RPT uses (RPT has its lineage in SOLR-2155 after all). I just reviewed it to double-check. It's possible the behavior is slightly better in SOLR-2155 because the cache (a Solr cache) contains normal hard-references whereas RPT has one based on weak references, which will linger longer. But I think the likelihood of OOM is the same. Any way, the current best option is https://issues.apache.org/jira/browse/SOLR-5170 which I posted a few days ago. ~ David Billnbell wrote We have been using 2155 for over 6 months in production with over 2M hits every 10 minutes. No OOM yet. 2155 seems great, and would this issue be any worse than 2155? On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt; jwartes@ gt; wrote: Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc handle multi-values. I feel like multi-value interaction really deserves its own section in the documentation. I'm aware of the memory issue, but it seems like if you want sort multi-valued points, it's either this or try to pull in the 2155 patch. In general I'd rather go with the thing that's being maintained. Thanks for the code pointer. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. lt; dsmiley@ gt; wrote: On 8/14/13 2:26 PM, Jeff Wartes lt; jwartes@ gt; wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David -- Bill Bell billnbell@ cell 720-256-8076 - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4085797.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Distance sort on a multi-value field
We have been using 2155 for over 6 months in production with over 2M hits every 10 minutes. No OOM yet. 2155 seems great, and would this issue be any worse than 2155? On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes jwar...@whitepages.com wrote: Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc handle multi-values. I feel like multi-value interaction really deserves its own section in the documentation. I'm aware of the memory issue, but it seems like if you want sort multi-valued points, it's either this or try to pull in the 2155 patch. In general I'd rather go with the thing that's being maintained. Thanks for the code pointer. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. dsmi...@mitre.org wrote: On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David -- Bill Bell billnb...@gmail.com cell 720-256-8076
Distance sort on a multi-value field
I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? Thanks.
Re: Distance sort on a multi-value field
On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David
Re: Distance sort on a multi-value field
Hm, Give me all the stores that only have branches in this area might be a plausible use case for farthest distance. That's essentially a contains question though, so maybe that's already supported? I guess it depends on how contains/intersects/etc handle multi-values. I feel like multi-value interaction really deserves its own section in the documentation. I'm aware of the memory issue, but it seems like if you want sort multi-valued points, it's either this or try to pull in the 2155 patch. In general I'd rather go with the thing that's being maintained. Thanks for the code pointer. You're right, that doesn't look like something I can easily use for more general aggregate scoring control. Ah well. On 8/14/13 12:35 PM, Smiley, David W. dsmi...@mitre.org wrote: On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote: I'm still pondering aggregate-type operations for scoring multi-valued fields (original thread: http://goo.gl/zOX53f ), and it occurred to me that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing something like that. It isn't. Somewhat surprisingly I don't see this in the documentation anywhere, but I presume the example query: (from: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4) q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10} assigns the distance/score based on the *closest* lat/long if the sfield is a multi-valued field. Yes it does. That's a reasonable default, but it's a bit arbitrary. Can I sort based on the *furthest* lat/long in the document? Or the average distance? Anyone know more about how this works and could give me some pointers? I considered briefly supporting the farthest distance but dismissed it as I saw no real use-case. I didn't think of the average distance; that's plausible. Any way, you're best bet is to dig into the code. The relevant part is ShapeFieldCacheDistanceValueSource. FYI something to keep in mind: https://issues.apache.org/jira/browse/LUCENE-4698 ~ David