Re: Distance sort on a multi-value field

2013-08-22 Thread Jeff Wartes

This is actually pretty far afield from my original subject, but it turns
out that I also had issues  with NRT and multi-field geospatial
performance in Solr 4, so I'll follow that up.


I've been testing and working with David's SOLR-5170 patch ever since he
posted it, and I pushed it into production with only some cosmetic changes
a few hours ago. 
I have a relatively low update and query rate for this particular query
type, (something like 2 updates/sec, 10 queries/sec) but a short
autosoftcommit time. (5 sec) Based on the data so far this patch looks
like it's brought my average response time down from 4 seconds to about
50ms.

Very nice!



On 8/20/13 7:37 PM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote:

The distance sorting code in SOLR-2155 is roughly equivalent to the code
that
RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
to double-check.  It's possible the behavior is slightly better in
SOLR-2155
because the cache (a Solr cache) contains normal hard-references whereas
RPT
has one based on weak references, which will linger longer.  But I think
the
likelihood of OOM is the same.

Any way, the current best option is
https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
ago.

~ David


Billnbell wrote
 We have been using 2155 for over 6 months in production with over 2M
hits
 every 10 minutes. No OOM yet.
 
 2155 seems great, and would this issue be any worse than 2155?
 
 
 
 On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 

 Hm, Give me all the stores that only have branches in this area might
 be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's
already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its
own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control.
Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. lt;

 dsmiley@

 gt; wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 
 
 I'm still pondering aggregate-type operations for scoring
multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to
me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the
 sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort
based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some
pointers?
 
 I considered briefly supporting the farthest distance but dismissed it
 as
 I saw no real use-case.  I didn't think of the average distance;
that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 


 
 
 -- 
 Bill Bell

 billnbell@

 cell 720-256-8076





-
 Author: 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
4084666p4085797.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Distance sort on a multi-value field

2013-08-22 Thread David Smiley (@MITRE.org)
Awesome!

Be sure to watch the JIRA issue as it develops.  The patch will improve
(I've already improved it but not posted it) and one day a solution is bound
to get committed.

~ David


Jeff Wartes wrote
 This is actually pretty far afield from my original subject, but it turns
 out that I also had issues  with NRT and multi-field geospatial
 performance in Solr 4, so I'll follow that up.
 
 
 I've been testing and working with David's SOLR-5170 patch ever since he
 posted it, and I pushed it into production with only some cosmetic changes
 a few hours ago. 
 I have a relatively low update and query rate for this particular query
 type, (something like 2 updates/sec, 10 queries/sec) but a short
 autosoftcommit time. (5 sec) Based on the data so far this patch looks
 like it's brought my average response time down from 4 seconds to about
 50ms.
 
 Very nice!
 
 
 
 On 8/20/13 7:37 PM, David Smiley (@MITRE.org) lt;

 DSMILEY@

 gt; wrote:
 
The distance sorting code in SOLR-2155 is roughly equivalent to the code
that
RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
to double-check.  It's possible the behavior is slightly better in
SOLR-2155
because the cache (a Solr cache) contains normal hard-references whereas
RPT
has one based on weak references, which will linger longer.  But I think
the
likelihood of OOM is the same.

Any way, the current best option is
https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
ago.

~ David


Billnbell wrote
 We have been using 2155 for over 6 months in production with over 2M
hits
 every 10 minutes. No OOM yet.
 
 2155 seems great, and would this issue be any worse than 2155?
 
 
 
 On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 

 Hm, Give me all the stores that only have branches in this area might
 be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's
already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its
own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control.
Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. lt;

 dsmiley@

 gt; wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 
 
 I'm still pondering aggregate-type operations for scoring
multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to
me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the
 sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort
based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some
pointers?
 
 I considered briefly supporting the farthest distance but dismissed it
 as
 I saw no real use-case.  I didn't think of the average distance;
that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 


 
 
 -- 
 Bill Bell

 billnbell@

 cell 720-256-8076





-
 Author: 
http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp
4084666p4085797.html
Sent from the Solr - User mailing list archive at Nabble.com.





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4086226.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distance sort on a multi-value field

2013-08-20 Thread David Smiley (@MITRE.org)
The distance sorting code in SOLR-2155 is roughly equivalent to the code that
RPT uses (RPT has its lineage in SOLR-2155 after all).  I just reviewed it
to double-check.  It's possible the behavior is slightly better in SOLR-2155
because the cache (a Solr cache) contains normal hard-references whereas RPT
has one based on weak references, which will linger longer.  But I think the
likelihood of OOM is the same.

Any way, the current best option is
https://issues.apache.org/jira/browse/SOLR-5170  which I posted a few days
ago.

~ David


Billnbell wrote
 We have been using 2155 for over 6 months in production with over 2M hits
 every 10 minutes. No OOM yet.
 
 2155 seems great, and would this issue be any worse than 2155?
 
 
 
 On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 

 Hm, Give me all the stores that only have branches in this area might
 be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch.
 In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control. Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. lt;

 dsmiley@

 gt; wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes lt;

 jwartes@

 gt; wrote:
 
 
 I'm still pondering aggregate-type operations for scoring multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be
 doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere,
 but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the
 sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some pointers?
 
 I considered briefly supporting the farthest distance but dismissed it
 as
 I saw no real use-case.  I didn't think of the average distance; that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 


 
 
 -- 
 Bill Bell

 billnbell@

 cell 720-256-8076





-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Distance-sort-on-a-multi-value-field-tp4084666p4085797.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Distance sort on a multi-value field

2013-08-17 Thread William Bell
We have been using 2155 for over 6 months in production with over 2M hits
every 10 minutes. No OOM yet.

2155 seems great, and would this issue be any worse than 2155?



On Wed, Aug 14, 2013 at 4:08 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Hm, Give me all the stores that only have branches in this area might be
 a plausible use case for farthest distance.
 That's essentially a contains question though, so maybe that's already
 supported? I guess it depends on how contains/intersects/etc handle
 multi-values. I feel like multi-value interaction really deserves its own
 section in the documentation.


 I'm aware of the memory issue, but it seems like if you want sort
 multi-valued points, it's either this or try to pull in the 2155 patch. In
 general I'd rather go with the thing that's being maintained.


 Thanks for the code pointer. You're right, that doesn't look like
 something I can easily use for more general aggregate scoring control. Ah
 well.



 On 8/14/13 12:35 PM, Smiley, David W. dsmi...@mitre.org wrote:

 
 
 On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote:
 
 
 I'm still pondering aggregate-type operations for scoring multi-valued
 fields (original thread: http://goo.gl/zOX53f ), and it occurred to me
 that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing
 something like that.
 
 It isn't.
 
 
 Somewhat surprisingly I don't see this in the documentation anywhere, but
 I presume the example query: (from:
 http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
 q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}
 
 assigns the distance/score based on the *closest* lat/long if the sfield
 is a multi-valued field.
 
 Yes it does.
 
 
 That's a reasonable default, but it's a bit arbitrary. Can I sort based
 on
 the *furthest* lat/long in the document? Or the average distance?
 
 Anyone know more about how this works and could give me some pointers?
 
 I considered briefly supporting the farthest distance but dismissed it as
 I saw no real use-case.  I didn't think of the average distance; that's
 plausible.  Any way, you're best bet is to dig into the code.  The
 relevant part is ShapeFieldCacheDistanceValueSource.
 
 FYI something to keep in mind:
 https://issues.apache.org/jira/browse/LUCENE-4698
 
 ~ David
 




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Distance sort on a multi-value field

2013-08-14 Thread Jeff Wartes

I'm still pondering aggregate-type operations for scoring multi-valued
fields (original thread: http://goo.gl/zOX53f ), and it occurred to me
that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing
something like that.

Somewhat surprisingly I don't see this in the documentation anywhere, but
I presume the example query: (from:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}

assigns the distance/score based on the *closest* lat/long if the sfield
is a multi-valued field.

That's a reasonable default, but it's a bit arbitrary. Can I sort based on
the *furthest* lat/long in the document? Or the average distance?

Anyone know more about how this works and could give me some pointers?

Thanks.



Re: Distance sort on a multi-value field

2013-08-14 Thread Smiley, David W.


On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote:


I'm still pondering aggregate-type operations for scoring multi-valued
fields (original thread: http://goo.gl/zOX53f ), and it occurred to me
that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing
something like that.

It isn't.


Somewhat surprisingly I don't see this in the documentation anywhere, but
I presume the example query: (from:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}

assigns the distance/score based on the *closest* lat/long if the sfield
is a multi-valued field.

Yes it does.


That's a reasonable default, but it's a bit arbitrary. Can I sort based on
the *furthest* lat/long in the document? Or the average distance?

Anyone know more about how this works and could give me some pointers?

I considered briefly supporting the farthest distance but dismissed it as
I saw no real use-case.  I didn't think of the average distance; that's
plausible.  Any way, you're best bet is to dig into the code.  The
relevant part is ShapeFieldCacheDistanceValueSource.

FYI something to keep in mind:
https://issues.apache.org/jira/browse/LUCENE-4698

~ David



Re: Distance sort on a multi-value field

2013-08-14 Thread Jeff Wartes

Hm, Give me all the stores that only have branches in this area might be
a plausible use case for farthest distance.
That's essentially a contains question though, so maybe that's already
supported? I guess it depends on how contains/intersects/etc handle
multi-values. I feel like multi-value interaction really deserves its own
section in the documentation.


I'm aware of the memory issue, but it seems like if you want sort
multi-valued points, it's either this or try to pull in the 2155 patch. In
general I'd rather go with the thing that's being maintained.


Thanks for the code pointer. You're right, that doesn't look like
something I can easily use for more general aggregate scoring control. Ah
well.



On 8/14/13 12:35 PM, Smiley, David W. dsmi...@mitre.org wrote:



On 8/14/13 2:26 PM, Jeff Wartes jwar...@whitepages.com wrote:


I'm still pondering aggregate-type operations for scoring multi-valued
fields (original thread: http://goo.gl/zOX53f ), and it occurred to me
that distance-sort with SpatialRecursivePrefixTreeFieldType must be doing
something like that.

It isn't.


Somewhat surprisingly I don't see this in the documentation anywhere, but
I presume the example query: (from:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4)
q={!geofilt score=distance sfield=geo pt=54.729696,-98.525391 d=10}

assigns the distance/score based on the *closest* lat/long if the sfield
is a multi-valued field.

Yes it does.


That's a reasonable default, but it's a bit arbitrary. Can I sort based
on
the *furthest* lat/long in the document? Or the average distance?

Anyone know more about how this works and could give me some pointers?

I considered briefly supporting the farthest distance but dismissed it as
I saw no real use-case.  I didn't think of the average distance; that's
plausible.  Any way, you're best bet is to dig into the code.  The
relevant part is ShapeFieldCacheDistanceValueSource.

FYI something to keep in mind:
https://issues.apache.org/jira/browse/LUCENE-4698

~ David