RE: Strategies for sorting by array, when you can't sort by array?
For anyone who comes across this topic in the future, I "solved" the problem this way: by agreement with the stakeholders, on the presumption that no one would look at more than 5000 records, I modified my search code so that, if the user selected to sort by the name, I set the row count to return (query.setRows) to 5000. I then put all the result records into a list, sort it, then, depending on what page they're on, extract that subset of the 5000 and return it. There is a small performance hit on initial searching for common names (e.g. Smith, Jones, etc.), but the performance is still far more acceptable than the legacy system Solr is meant to replace (a few seconds as opposed to twenty(!) minutes). Most certainly there are better ways, but this one worked for me, and wanted to make sure it was added to the pool of options for anyone who comes across this problem in the future. Thanks to everyone who offered suggestions! Ron -Original Message- From: Jonathan Rochkind [mailto:rochk...@jhu.edu] Sent: Wednesday, August 03, 2011 11:36 AM To: solr-user@lucene.apache.org Cc: Olson, Ron Subject: Re: Strategies for sorting by array, when you can't sort by array? Not so much that it's a corner case in the sense of being unusual neccesarily (I'm not sure), it's just something that fundamentally doesn't fit well into lucene's architecture. I'm not sure that filing a JIRA will be much use, it's really unclear how one would get lucene to do this, it would be signficant work to do, and it's unlikely any Solr developer is going to decide to spend signficant time on it unless they need it for their own clients. On 8/3/2011 11:40 AM, Olson, Ron wrote: > *Sigh*...I had thought maybe reversing it would work, but that would require > creating a whole new index, on a separate core, as the existing index is used > for other purposes. Plus, given the volume of data, that would be a big deal, > update-wise. What would be better would be to remove that particular sort > option-button on the webpage. ;) > > I'll create a Jira issue, but in the meanwhile I'll have to come up with > something else. I guess I didn't realize how much of a "corner case" this > problem is. :) > > Thanks for the suggestions! > > Ron > > -Original Message- > From: Smiley, David W. [mailto:dsmi...@mitre.org] > Sent: Wednesday, August 03, 2011 10:26 AM > To: solr-user@lucene.apache.org > Subject: Re: Strategies for sorting by array, when you can't sort by array? > > Hi Ron. > This is an interesting problem you have. One idea would be to create an index > with the entity relationship going in the other direction. So instead of one > to many, go many to one. You would end up with multiple documents with > varying names but repeated parent entity information -- perhaps simply using > just an ID which is used as a lookup. Do a search on this name field, sorting > by a non-tokenized variant of the name field. Use Result-Grouping to > consolidate multiple matches of a name to the same parent document. This > whole idea might very well be academic since duplicating all the parent > entity information for searching on that too might be a bit much than you > care to bother with. And I don't think Solr 4's join feature addresses this > use case. In the end, I think Solr could be modified to support this, with > some work. It would make a good feature request in JIRA. > > ~ David Smiley > > On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: > >> Hi all- >> >> Well, this is a problem. I have a list of names as a multi-valued field and >> I am searching on this field and need to return the results sorted. I know >> from searching and reading the documentation (and getting the error) that >> sorting on a multi-valued field isn't possible. Okay, so, what I haven't >> found is any real good solution/workaround to the problem. I was wondering >> what strategies others have done to overcome this particular situation; >> collapsing the individual names into a single field with copyField doesn't >> work because the name searched may not be the first name in the field. >> >> Thanks for any hints/tips/tricks. >> >> Ron >> >> DISCLAIMER: This electronic message, including any attachments, files or >> documents, is intended only for the addressee and may contain CONFIDENTIAL, >> PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended >> recipient, you are hereby notified that any use, disclosure, copying or >> distribution of this message or any of the information included in or with >> it is unauthorized and strictly prohibited. If you hav
Re: Strategies for sorting by array, when you can't sort by array?
Not so much that it's a corner case in the sense of being unusual neccesarily (I'm not sure), it's just something that fundamentally doesn't fit well into lucene's architecture. I'm not sure that filing a JIRA will be much use, it's really unclear how one would get lucene to do this, it would be signficant work to do, and it's unlikely any Solr developer is going to decide to spend signficant time on it unless they need it for their own clients. On 8/3/2011 11:40 AM, Olson, Ron wrote: *Sigh*...I had thought maybe reversing it would work, but that would require creating a whole new index, on a separate core, as the existing index is used for other purposes. Plus, given the volume of data, that would be a big deal, update-wise. What would be better would be to remove that particular sort option-button on the webpage. ;) I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. I guess I didn't realize how much of a "corner case" this problem is. :) Thanks for the suggestions! Ron -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 03, 2011 10:26 AM To: solr-user@lucene.apache.org Subject: Re: Strategies for sorting by array, when you can't sort by array? Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Strategies for sorting by array, when you can't sort by array?
There's no great way to do this. I understand your problem as: It's a multi-valued field, but you want to sort on whichever of those values matched the query, not on the values that didn't. (Not entirely clear what to do if the documents are in the result set becuse of a match in an entirely different field!) I would sometimes like to do that too, and haven't really been able to come up with any great way to do it. Something involving facetting kind of gets you closer, but ends up being a huge pain and doesn't get you (or at least me) all the way to supporting the interface I'd really want. On 8/3/2011 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
RE: Strategies for sorting by array, when you can't sort by array?
*Sigh*...I had thought maybe reversing it would work, but that would require creating a whole new index, on a separate core, as the existing index is used for other purposes. Plus, given the volume of data, that would be a big deal, update-wise. What would be better would be to remove that particular sort option-button on the webpage. ;) I'll create a Jira issue, but in the meanwhile I'll have to come up with something else. I guess I didn't realize how much of a "corner case" this problem is. :) Thanks for the suggestions! Ron -Original Message- From: Smiley, David W. [mailto:dsmi...@mitre.org] Sent: Wednesday, August 03, 2011 10:26 AM To: solr-user@lucene.apache.org Subject: Re: Strategies for sorting by array, when you can't sort by array? Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: > Hi all- > > Well, this is a problem. I have a list of names as a multi-valued field and I > am searching on this field and need to return the results sorted. I know from > searching and reading the documentation (and getting the error) that sorting > on a multi-valued field isn't possible. Okay, so, what I haven't found is any > real good solution/workaround to the problem. I was wondering what strategies > others have done to overcome this particular situation; collapsing the > individual names into a single field with copyField doesn't work because the > name searched may not be the first name in the field. > > Thanks for any hints/tips/tricks. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Strategies for sorting by array, when you can't sort by array?
Hi Ron. This is an interesting problem you have. One idea would be to create an index with the entity relationship going in the other direction. So instead of one to many, go many to one. You would end up with multiple documents with varying names but repeated parent entity information -- perhaps simply using just an ID which is used as a lookup. Do a search on this name field, sorting by a non-tokenized variant of the name field. Use Result-Grouping to consolidate multiple matches of a name to the same parent document. This whole idea might very well be academic since duplicating all the parent entity information for searching on that too might be a bit much than you care to bother with. And I don't think Solr 4's join feature addresses this use case. In the end, I think Solr could be modified to support this, with some work. It would make a good feature request in JIRA. ~ David Smiley On Aug 3, 2011, at 10:39 AM, Olson, Ron wrote: > Hi all- > > Well, this is a problem. I have a list of names as a multi-valued field and I > am searching on this field and need to return the results sorted. I know from > searching and reading the documentation (and getting the error) that sorting > on a multi-valued field isn't possible. Okay, so, what I haven't found is any > real good solution/workaround to the problem. I was wondering what strategies > others have done to overcome this particular situation; collapsing the > individual names into a single field with copyField doesn't work because the > name searched may not be the first name in the field. > > Thanks for any hints/tips/tricks. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you.
RE: Strategies for sorting by array, when you can't sort by array?
Right, the search term is the sort field. I can manually sort an individual page, but when the user clicks on the next page, the sort is "reset", visually. -Original Message- From: Mike Sokolov [mailto:soko...@ifactory.com] Sent: Wednesday, August 03, 2011 9:52 AM To: solr-user@lucene.apache.org Cc: Olson, Ron Subject: Re: Strategies for sorting by array, when you can't sort by array? Although you weren't very clear about it, it sounds as if you want the results to be sorted by a name that actually matched the query? In general that is not going to be easy, since it is not something that can be computed in advance and thus indexed. -Mike On 08/03/2011 10:39 AM, Olson, Ron wrote: > Hi all- > > Well, this is a problem. I have a list of names as a multi-valued field and I > am searching on this field and need to return the results sorted. I know from > searching and reading the documentation (and getting the error) that sorting > on a multi-valued field isn't possible. Okay, so, what I haven't found is any > real good solution/workaround to the problem. I was wondering what strategies > others have done to overcome this particular situation; collapsing the > individual names into a single field with copyField doesn't work because the > name searched may not be the first name in the field. > > Thanks for any hints/tips/tricks. > > Ron > > DISCLAIMER: This electronic message, including any attachments, files or > documents, is intended only for the addressee and may contain CONFIDENTIAL, > PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended > recipient, you are hereby notified that any use, disclosure, copying or > distribution of this message or any of the information included in or with it > is unauthorized and strictly prohibited. If you have received this message > in error, please notify the sender immediately by reply e-mail and > permanently delete and destroy this message and its attachments, along with > any copies thereof. This message does not create any contractual obligation > on behalf of the sender or Law Bulletin Publishing Company. > Thank you. > DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.
Re: Strategies for sorting by array, when you can't sort by array?
Although you weren't very clear about it, it sounds as if you want the results to be sorted by a name that actually matched the query? In general that is not going to be easy, since it is not something that can be computed in advance and thus indexed. -Mike On 08/03/2011 10:39 AM, Olson, Ron wrote: Hi all- Well, this is a problem. I have a list of names as a multi-valued field and I am searching on this field and need to return the results sorted. I know from searching and reading the documentation (and getting the error) that sorting on a multi-valued field isn't possible. Okay, so, what I haven't found is any real good solution/workaround to the problem. I was wondering what strategies others have done to overcome this particular situation; collapsing the individual names into a single field with copyField doesn't work because the name searched may not be the first name in the field. Thanks for any hints/tips/tricks. Ron DISCLAIMER: This electronic message, including any attachments, files or documents, is intended only for the addressee and may contain CONFIDENTIAL, PROPRIETARY or LEGALLY PRIVILEGED information. If you are not the intended recipient, you are hereby notified that any use, disclosure, copying or distribution of this message or any of the information included in or with it is unauthorized and strictly prohibited. If you have received this message in error, please notify the sender immediately by reply e-mail and permanently delete and destroy this message and its attachments, along with any copies thereof. This message does not create any contractual obligation on behalf of the sender or Law Bulletin Publishing Company. Thank you.