RE: Strategies for sorting by array, when you can't sort by array?

2011-08-04 Thread Olson, Ron
For anyone who comes across this topic in the future, I "solved" the problem 
this way: by agreement with the stakeholders, on the presumption that no one 
would look at more than 5000 records, I modified my search code so that, if the 
user selected to sort by the name, I set the row count to return 
(query.setRows) to 5000. I then put all the result records into a list, sort 
it, then, depending on what page they're on, extract that subset of the 5000 
and return it.

There is a small performance hit on initial searching for common names (e.g. 
Smith, Jones, etc.), but the performance is still far more acceptable than the 
legacy system Solr is meant to replace (a few seconds as opposed to twenty(!) 
minutes).

Most certainly there are better ways, but this one worked for me, and wanted to 
make sure it was added to the pool of options for anyone who comes across this 
problem in the future.

Thanks to everyone who offered suggestions!

Ron

-Original Message-
From: Jonathan Rochkind [mailto:rochk...@jhu.edu]
Sent: Wednesday, August 03, 2011 11:36 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Not so much that it's a corner case in the sense of being unusual
neccesarily (I'm not sure), it's just something that fundamentally
doesn't fit well into lucene's architecture.

I'm not sure that filing a JIRA will be much use, it's really unclear
how one would get lucene to do this, it would be signficant work to do,
and it's unlikely any Solr developer is going to decide to spend
signficant time on it unless they need it for their own clients.

On 8/3/2011 11:40 AM, Olson, Ron wrote:
> *Sigh*...I had thought maybe reversing it would work, but that would require 
> creating a whole new index, on a separate core, as the existing index is used 
> for other purposes. Plus, given the volume of data, that would be a big deal, 
> update-wise. What would be better would be to remove that particular sort 
> option-button on the webpage. ;)
>
> I'll create a Jira issue, but in the meanwhile I'll have to come up with 
> something else. I guess I didn't realize how much of a "corner case" this 
> problem is. :)
>
> Thanks for the suggestions!
>
> Ron
>
> -Original Message-
> From: Smiley, David W. [mailto:dsmi...@mitre.org]
> Sent: Wednesday, August 03, 2011 10:26 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Strategies for sorting by array, when you can't sort by array?
>
> Hi Ron.
> This is an interesting problem you have. One idea would be to create an index 
> with the entity relationship going in the other direction.  So instead of one 
> to many, go many to one.  You would end up with multiple documents with 
> varying names but repeated parent entity information -- perhaps simply using 
> just an ID which is used as a lookup. Do a search on this name field, sorting 
> by a non-tokenized variant of the name field. Use Result-Grouping to 
> consolidate multiple matches of a name to the same parent document. This 
> whole idea might very well be academic since duplicating all the parent 
> entity information for searching on that too might be a bit much than you 
> care to bother with. And I don't think Solr 4's join feature addresses this 
> use case. In the end, I think Solr could be modified to support this, with 
> some work. It would make a good feature request in JIRA.
>
> ~ David Smiley
>
> On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:
>
>> Hi all-
>>
>> Well, this is a problem. I have a list of names as a multi-valued field and 
>> I am searching on this field and need to return the results sorted. I know 
>> from searching and reading the documentation (and getting the error) that 
>> sorting on a multi-valued field isn't possible. Okay, so, what I haven't 
>> found is any real good solution/workaround to the problem. I was wondering 
>> what strategies others have done to overcome this particular situation; 
>> collapsing the individual names into a single field with copyField doesn't 
>> work because the name searched may not be the first name in the field.
>>
>> Thanks for any hints/tips/tricks.
>>
>> Ron
>>
>> DISCLAIMER: This electronic message, including any attachments, files or 
>> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
>> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
>> recipient, you are hereby notified that any use, disclosure, copying or 
>> distribution of this message or any of the information included in or with 
>> it is  unauthorized and strictly prohibited.  If you hav

Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
Not so much that it's a corner case in the sense of being unusual 
neccesarily (I'm not sure), it's just something that fundamentally 
doesn't fit well into lucene's architecture.


I'm not sure that filing a JIRA will be much use, it's really unclear 
how one would get lucene to do this, it would be signficant work to do, 
and it's unlikely any Solr developer is going to decide to spend 
signficant time on it unless they need it for their own clients.


On 8/3/2011 11:40 AM, Olson, Ron wrote:

*Sigh*...I had thought maybe reversing it would work, but that would require 
creating a whole new index, on a separate core, as the existing index is used 
for other purposes. Plus, given the volume of data, that would be a big deal, 
update-wise. What would be better would be to remove that particular sort 
option-button on the webpage. ;)

I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. 
I guess I didn't realize how much of a "corner case" this problem is. :)

Thanks for the suggestions!

Ron

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 03, 2011 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:


Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.



Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Jonathan Rochkind
There's no great way to do this. I understand your problem as: It's a 
multi-valued field, but you want to sort on whichever of those values 
matched the query, not on the values that didn't. (Not entirely clear 
what to do if the documents are in the result set becuse of a match in 
an entirely different field!)


I would sometimes like to do that too, and haven't really been able to 
come up with any great way to do it.


Something involving facetting kind of gets you closer, but ends up being 
a huge pain and doesn't get  you (or at least me) all the way to 
supporting the interface I'd really want.


On 8/3/2011 10:39 AM, Olson, Ron wrote:

Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.



RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
*Sigh*...I had thought maybe reversing it would work, but that would require 
creating a whole new index, on a separate core, as the existing index is used 
for other purposes. Plus, given the volume of data, that would be a big deal, 
update-wise. What would be better would be to remove that particular sort 
option-button on the webpage. ;)

I'll create a Jira issue, but in the meanwhile I'll have to come up with 
something else. I guess I didn't realize how much of a "corner case" this 
problem is. :)

Thanks for the suggestions!

Ron

-Original Message-
From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: Wednesday, August 03, 2011 10:26 AM
To: solr-user@lucene.apache.org
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

> Hi all-
>
> Well, this is a problem. I have a list of names as a multi-valued field and I 
> am searching on this field and need to return the results sorted. I know from 
> searching and reading the documentation (and getting the error) that sorting 
> on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
> real good solution/workaround to the problem. I was wondering what strategies 
> others have done to overcome this particular situation; collapsing the 
> individual names into a single field with copyField doesn't work because the 
> name searched may not be the first name in the field.
>
> Thanks for any hints/tips/tricks.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Smiley, David W.
Hi Ron.
This is an interesting problem you have. One idea would be to create an index 
with the entity relationship going in the other direction.  So instead of one 
to many, go many to one.  You would end up with multiple documents with varying 
names but repeated parent entity information -- perhaps simply using just an ID 
which is used as a lookup. Do a search on this name field, sorting by a 
non-tokenized variant of the name field. Use Result-Grouping to consolidate 
multiple matches of a name to the same parent document. This whole idea might 
very well be academic since duplicating all the parent entity information for 
searching on that too might be a bit much than you care to bother with. And I 
don't think Solr 4's join feature addresses this use case. In the end, I think 
Solr could be modified to support this, with some work. It would make a good 
feature request in JIRA.

~ David Smiley

On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote:

> Hi all-
> 
> Well, this is a problem. I have a list of names as a multi-valued field and I 
> am searching on this field and need to return the results sorted. I know from 
> searching and reading the documentation (and getting the error) that sorting 
> on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
> real good solution/workaround to the problem. I was wondering what strategies 
> others have done to overcome this particular situation; collapsing the 
> individual names into a single field with copyField doesn't work because the 
> name searched may not be the first name in the field.
> 
> Thanks for any hints/tips/tricks.
> 
> Ron
> 
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.



RE: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Olson, Ron
Right, the search term is the sort field. I can manually sort an individual 
page, but when the user clicks on the next page, the sort is "reset", visually.

-Original Message-
From: Mike Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, August 03, 2011 9:52 AM
To: solr-user@lucene.apache.org
Cc: Olson, Ron
Subject: Re: Strategies for sorting by array, when you can't sort by array?

Although you weren't very clear about it, it sounds as if you want the
results to be sorted by a name that actually matched the query?  In
general that is not going to be easy, since it is not something that can
be computed in advance and thus indexed.


-Mike

On 08/03/2011 10:39 AM, Olson, Ron wrote:
> Hi all-
>
> Well, this is a problem. I have a list of names as a multi-valued field and I 
> am searching on this field and need to return the results sorted. I know from 
> searching and reading the documentation (and getting the error) that sorting 
> on a multi-valued field isn't possible. Okay, so, what I haven't found is any 
> real good solution/workaround to the problem. I was wondering what strategies 
> others have done to overcome this particular situation; collapsing the 
> individual names into a single field with copyField doesn't work because the 
> name searched may not be the first name in the field.
>
> Thanks for any hints/tips/tricks.
>
> Ron
>
> DISCLAIMER: This electronic message, including any attachments, files or 
> documents, is intended only for the addressee and may contain CONFIDENTIAL, 
> PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
> recipient, you are hereby notified that any use, disclosure, copying or 
> distribution of this message or any of the information included in or with it 
> is  unauthorized and strictly prohibited.  If you have received this message 
> in error, please notify the sender immediately by reply e-mail and 
> permanently delete and destroy this message and its attachments, along with 
> any copies thereof. This message does not create any contractual obligation 
> on behalf of the sender or Law Bulletin Publishing Company.
> Thank you.
>


DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.


Re: Strategies for sorting by array, when you can't sort by array?

2011-08-03 Thread Mike Sokolov
Although you weren't very clear about it, it sounds as if you want the 
results to be sorted by a name that actually matched the query?  In 
general that is not going to be easy, since it is not something that can 
be computed in advance and thus indexed.



-Mike

On 08/03/2011 10:39 AM, Olson, Ron wrote:

Hi all-

Well, this is a problem. I have a list of names as a multi-valued field and I 
am searching on this field and need to return the results sorted. I know from 
searching and reading the documentation (and getting the error) that sorting on 
a multi-valued field isn't possible. Okay, so, what I haven't found is any real 
good solution/workaround to the problem. I was wondering what strategies others 
have done to overcome this particular situation; collapsing the individual 
names into a single field with copyField doesn't work because the name searched 
may not be the first name in the field.

Thanks for any hints/tips/tricks.

Ron

DISCLAIMER: This electronic message, including any attachments, files or 
documents, is intended only for the addressee and may contain CONFIDENTIAL, 
PROPRIETARY or LEGALLY PRIVILEGED information.  If you are not the intended 
recipient, you are hereby notified that any use, disclosure, copying or 
distribution of this message or any of the information included in or with it 
is  unauthorized and strictly prohibited.  If you have received this message in 
error, please notify the sender immediately by reply e-mail and permanently 
delete and destroy this message and its attachments, along with any copies 
thereof. This message does not create any contractual obligation on behalf of 
the sender or Law Bulletin Publishing Company.
Thank you.