Re: TermsQuery Result Ordering
If it's worth the effort to you, you could write a custom scorer that "somehow" pulled these terms out and did what you require. I suppose some kind of clever function query might work, but again probably custom. Frankly, though, I wouldn't go there until I'd exhausted either my resources or my user's patience. In the worst case, you could break it up into N sub-queries and sort the results in the app. Best, Erick On Thu, Oct 19, 2017 at 6:59 AM, Webster Homerwrote: > Thank you, Erick. > > That is exactly what I thought. Indeed, we don't care about solr's scoring, > as I said we do care about the order of the terms be maintained, hence the > requirement for boosting the term values. > > > On Wed, Oct 18, 2017 at 4:23 PM, Erick Erickson > wrote: > >> bq: Can I boost the Terms in the terms query >> >> I'm pretty sure you can't. But how many of these do you have? You can >> always increase the maxBooleanClauses limit in solrconfig.xml. It's >> primarily there to say "having this many clauses is usually a bad >> idea, so proceed with caution". I've seen 10,000 and higher be used >> before, you're really only limited by memory. >> >> And I'm going to guess that your application doesn't have a high query >> rate, so you can likely make maxBooleanClauses be very high. >> >> Basically, the code that TermsQuerParser uses bypasses scoring on the >> theory that these vary large OR clauses are usually useless for >> scoring, your application is an outlier. But you knew that already ;) >> >> >> Best, >> Erick >> >> On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer >> wrote: >> > I have an application which currently uses a boolean query. The query >> could >> > have a large number of boolean terms. I know that the TermsQuery doesn't >> > have the same limitations as the boolean query. However I need to >> maintain >> > the order of the original terms. >> > >> > The query terms from the boolean query are actually values returned by a >> > chemical structure search, which are returned in order of their relevancy >> > in the structure search. I maintain the order by giving them a boost >> which >> > is a function of the relevancy from the structure search. >> > >> > structure_id:(12345^800 OR 12356^750 OR abcde^600 ... >> > >> > This approach gives me the results in the order I need them in. I'd love >> to >> > use the TermsQuery instead as it doesn't have the same limitations. >> > >> > Can I boost the Terms in the terms query? Is there a way to order the >> > results? e.g. would the results be returned in the same order I specified >> > the terms? >> > >> > Thanks, >> > >> > -- >> > >> > >> > This message and any attachment are confidential and may be privileged or >> > otherwise protected from disclosure. If you are not the intended >> recipient, >> > you must not copy this message or attachment or disclose the contents to >> > any other person. If you have received this transmission in error, please >> > notify the sender immediately and delete the message and any attachment >> > from your system. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not accept liability for any omissions or errors in this >> > message which may arise as a result of E-Mail-transmission or for damages >> > resulting from any unauthorized changes of the content of this message >> and >> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> > subsidiaries do not guarantee that this message is free of viruses and >> does >> > not accept liability for any damages caused by any virus transmitted >> > therewith. >> > >> > Click http://www.emdgroup.com/disclaimer to access the German, French, >> > Spanish and Portuguese versions of this disclaimer. >> > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Re: TermsQuery Result Ordering
Thank you, Erick. That is exactly what I thought. Indeed, we don't care about solr's scoring, as I said we do care about the order of the terms be maintained, hence the requirement for boosting the term values. On Wed, Oct 18, 2017 at 4:23 PM, Erick Ericksonwrote: > bq: Can I boost the Terms in the terms query > > I'm pretty sure you can't. But how many of these do you have? You can > always increase the maxBooleanClauses limit in solrconfig.xml. It's > primarily there to say "having this many clauses is usually a bad > idea, so proceed with caution". I've seen 10,000 and higher be used > before, you're really only limited by memory. > > And I'm going to guess that your application doesn't have a high query > rate, so you can likely make maxBooleanClauses be very high. > > Basically, the code that TermsQuerParser uses bypasses scoring on the > theory that these vary large OR clauses are usually useless for > scoring, your application is an outlier. But you knew that already ;) > > > Best, > Erick > > On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer > wrote: > > I have an application which currently uses a boolean query. The query > could > > have a large number of boolean terms. I know that the TermsQuery doesn't > > have the same limitations as the boolean query. However I need to > maintain > > the order of the original terms. > > > > The query terms from the boolean query are actually values returned by a > > chemical structure search, which are returned in order of their relevancy > > in the structure search. I maintain the order by giving them a boost > which > > is a function of the relevancy from the structure search. > > > > structure_id:(12345^800 OR 12356^750 OR abcde^600 ... > > > > This approach gives me the results in the order I need them in. I'd love > to > > use the TermsQuery instead as it doesn't have the same limitations. > > > > Can I boost the Terms in the terms query? Is there a way to order the > > results? e.g. would the results be returned in the same order I specified > > the terms? > > > > Thanks, > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.emdgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: TermsQuery Result Ordering
bq: Can I boost the Terms in the terms query I'm pretty sure you can't. But how many of these do you have? You can always increase the maxBooleanClauses limit in solrconfig.xml. It's primarily there to say "having this many clauses is usually a bad idea, so proceed with caution". I've seen 10,000 and higher be used before, you're really only limited by memory. And I'm going to guess that your application doesn't have a high query rate, so you can likely make maxBooleanClauses be very high. Basically, the code that TermsQuerParser uses bypasses scoring on the theory that these vary large OR clauses are usually useless for scoring, your application is an outlier. But you knew that already ;) Best, Erick On Wed, Oct 18, 2017 at 9:42 AM, Webster Homerwrote: > I have an application which currently uses a boolean query. The query could > have a large number of boolean terms. I know that the TermsQuery doesn't > have the same limitations as the boolean query. However I need to maintain > the order of the original terms. > > The query terms from the boolean query are actually values returned by a > chemical structure search, which are returned in order of their relevancy > in the structure search. I maintain the order by giving them a boost which > is a function of the relevancy from the structure search. > > structure_id:(12345^800 OR 12356^750 OR abcde^600 ... > > This approach gives me the results in the order I need them in. I'd love to > use the TermsQuery instead as it doesn't have the same limitations. > > Can I boost the Terms in the terms query? Is there a way to order the > results? e.g. would the results be returned in the same order I specified > the terms? > > Thanks, > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
TermsQuery Result Ordering
I have an application which currently uses a boolean query. The query could have a large number of boolean terms. I know that the TermsQuery doesn't have the same limitations as the boolean query. However I need to maintain the order of the original terms. The query terms from the boolean query are actually values returned by a chemical structure search, which are returned in order of their relevancy in the structure search. I maintain the order by giving them a boost which is a function of the relevancy from the structure search. structure_id:(12345^800 OR 12356^750 OR abcde^600 ... This approach gives me the results in the order I need them in. I'd love to use the TermsQuery instead as it doesn't have the same limitations. Can I boost the Terms in the terms query? Is there a way to order the results? e.g. would the results be returned in the same order I specified the terms? Thanks, -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.