RE: Strategies for sorting by array, when you can't sort by array?

2011-08-04 Thread Olson, Ron
For anyone who comes across this topic in the future, I solved the problem 
this way: by agreement with the stakeholders, on the presumption that no one 
would look at more than 5000 records, I modified my search code so that, if the 
user selected to sort by the name, I set the row count to return 
(query.setRows) to 5000. I then put all the result records into a list, sort 
it, then, depending on what page they're on, extract that subset of the 5000 
and return it.

There is a small performance hit on initial searching for common names (e.g. 
Smith, Jones, etc.), but the performance is still far more acceptable than the 
legacy system Solr is meant to replace (a few seconds as opposed to twenty(!) 
minutes).

Most certainly there are better ways, but this one worked for me, and wanted to 
make sure it was added to the pool of options for anyone who comes across this 
problem in the future.

Thanks to everyone who offered suggestions!

Ron

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, August 03, 2011 11:36 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Not so much that it's a corner case in the sense of being unusual
neccesarily (I'm not sure), it's just something that fundamentally
doesn't fit well into lucene's architecture.

I'm not sure that filing a JIRA will be much use, it's really unclear
how one would get lucene to do this, it would be signficant work to do,
and it's unlikely any Solr developer is going to decide to spend
signficant time on it unless they need it for their own clients.

On 8/3/2011 11:40 AM, Olson, Ron wrote:
 *Sigh*...I had thought maybe reversing it would work, but that would require 
 creating a whole new index, on a separate core, as the existing index is used 
 for other purposes. Plus, given the volume of data, that would be a big deal, 
 update-wise. What would be better would be to remove that particular sort 
 option-button on the webpage. ;)

 I'll create a Jira issue, but in the meanwhile I'll have to come up with 
 something else. I guess I didn't realize how much of a corner case this 
 problem is. :)

 Thanks for the suggestions!

 Ron

 -Original Message-
 From: Smiley, David W. [mailto:dsmi...@mitre.org]
 Sent: Wednesday, August 03, 2011 10:26 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Strategies for sorting by array, when you can't sort by array?

 Hi Ron.
 This is an interesting problem you have. One idea would be to create an index 
 with the entity relationship going in the other direction.  So instead of one 
 to many, go many to one.  You would end up with multiple documents with 
 varying names but repeated parent entity information -- perhaps simply using 
 just an ID which is used as a lookup. Do a search on this name field, sorting 
 by a non-tokenized variant of the name field. Use Result-Grouping to 
 consolidate multiple matches of a name to the same parent document. This 
 whole idea might very well be academic since duplicating all the parent 
 entity information for searching on that too might be a bit much than you 
 care to bother with. And I don't think Solr 4's join feature addresses this 
 use case. In the end, I think Solr could be modified to support this, with 
 some work. It would make a good feature request in JIRA.

 ~ David Smiley

 On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

 Hi all-

 Well, this is a problem. I have a list of names as a multi-valued field and 
 I am searching on this field and need to return the results sorted. I know 
 from searching and reading the documentation (and getting the error) that 
 sorting on a multi-valued field isn't possible. Okay, so, what I haven't 
 found is any real good solution/workaround to the problem. I was wondering 
 what strategies others have done to overcome this particular situation; 
 collapsing the individual names into a single field with copyField doesn't 
 work because the name searched may not be the first name in the field.

 Thanks for any hints/tips/tricks.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with 
 it is  unauthorized and strictly prohibited.  If you have received this 
 message in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.


 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee

Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Mike Sokolov
Although you weren't very clear about it, it sounds as if you want the 
results to be sorted by a name that actually matched the query?  In 
general that is not going to be easy, since it is not something that can 
be computed in advance and thus indexed.



-Mike

On 08/03/2011 10:39 AM, Olson, Ron wrote:

Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.
   


RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
Right, the search term is the sort field. I can manually sort an individual 
page, but when the user clicks on the next page, the sort is reset, visually.

-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, August 03, 2011 9:52 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Although you weren't very clear about it, it sounds as if you want the
results to be sorted by a name that actually matched the query?  In
general that is not going to be easy, since it is not something that can
be computed in advance and thus indexed.


-Mike

On 08/03/2011 10:39 AM, Olson, Ron wrote:
 Hi all-

 Well, this is a problem. I have a list of names as a multi-valued field and I 
 am searching on this field and need to return the results sorted. I know from 
 searching and reading the documentation (and getting the error) that sorting 
 on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
 real good solution/workaround to the problem. I was wondering what strategies 
 others have done to overcome this particular situation; collapsing the 
 individual names into a single field with copyField doesn't work because the 
 name searched may not be the first name in the field.

 Thanks for any hints/tips/tricks.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Smiley, David W.
Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

 Hi all-
 
 Well, this is a problem. I have a list of names as a multi-valued field and I 
 am searching on this field and need to return the results sorted. I know from 
 searching and reading the documentation (and getting the error) that sorting 
 on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
 real good solution/workaround to the problem. I was wondering what strategies 
 others have done to overcome this particular situation; collapsing the 
 individual names into a single field with copyField doesn't work because the 
 name searched may not be the first name in the field.
 
 Thanks for any hints/tips/tricks.
 
 Ron
 
 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
*Sigh*...I had thought maybe reversing it would work, but that would require 
creating a whole new index, on a separate core, as the existing index is used 
for other purposes. Plus, given the volume of data, that would be a big deal, 
update-wise. What would be better would be to remove that particular sort 
option-button on the webpage. ;)

I'll create a Jira issue, but in the meanwhile I'll have to come up with 
something else. I guess I didn't realize how much of a corner case this 
problem is. :)

Thanks for the suggestions!

Ron

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 03, 2011 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

 Hi all-

 Well, this is a problem. I have a list of names as a multi-valued field and I 
 am searching on this field and need to return the results sorted. I know from 
 searching and reading the documentation (and getting the error) that sorting 
 on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
 real good solution/workaround to the problem. I was wondering what strategies 
 others have done to overcome this particular situation; collapsing the 
 individual names into a single field with copyField doesn't work because the 
 name searched may not be the first name in the field.

 Thanks for any hints/tips/tricks.

 Ron

 DISCLAIMER: This electronic message, including any attachments, files or 
 documents, is intended only for the addressee and may contain CONFIDENTIAL, 
 PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
 recipient, you are hereby notified that any use, disclosure, copying or 
 distribution of this message or any of the information included in or with it 
 is  unauthorized and strictly prohibited.  If you have received this message 
 in error, please notify the sender immediately by reply e-mail and 
 permanently delete and destroy this message and its attachments, along with 
 any copies thereof. This message does not create any contractual obligation 
 on behalf of the sender or Law Bulletin Publishing Company.
 Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
There's no great way to do this. I understand your problem as: It's a 
multi-valued field, but you want to sort on whichever of those values 
matched the query, not on the values that didn't. (Not entirely clear 
what to do if the documents are in the result set becuse of a match in 
an entirely different field!)


I would sometimes like to do that too, and haven't really been able to 
come up with any great way to do it.


Something involving facetting kind of gets you closer, but ends up being 
a huge pain and doesn't get  you (or at least me) all the way to 
supporting the interface I'd really want.


On 8/3/2011 10:39 AM, Olson, Ron wrote:

Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.



Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
Not so much that it's a corner case in the sense of being unusual 
neccesarily (I'm not sure), it's just something that fundamentally 
doesn't fit well into lucene's architecture.


I'm not sure that filing a JIRA will be much use, it's really unclear 
how one would get lucene to do this, it would be signficant work to do, 
and it's unlikely any Solr developer is going to decide to spend 
signficant time on it unless they need it for their own clients.


On 8/3/2011 11:40 AM, Olson, Ron wrote:

*Sigh*...I had thought maybe reversing it would work, but that would require 
creating a whole new index, on a separate core, as the existing index is used 
for other purposes. Plus, given the volume of data, that would be a big deal, 
update-wise. What would be better would be to remove that particular sort 
option-button on the webpage. ;)

I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. 
I guess I didn't realize how much of a corner case this problem is. :)

Thanks for the suggestions!

Ron

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 03, 2011 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:


Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.