Re: TermsQuery Result Ordering

2017-10-19 Thread Erick Erickson
If it's worth the effort to you, you could write a custom scorer that "somehow"
pulled these terms out and did what you require. I suppose some kind of
clever function query might work, but again probably custom.

Frankly, though, I wouldn't go there until I'd exhausted either my resources
or my user's patience.

In the worst case, you could break it up into N sub-queries and sort the results
in the app.

Best,
Erick

On Thu, Oct 19, 2017 at 6:59 AM, Webster Homer  wrote:
> Thank you, Erick.
>
> That is exactly what I thought. Indeed, we don't care about solr's scoring,
> as I said we do care about the order of the terms be maintained, hence the
> requirement for boosting the term values.
>
>
> On Wed, Oct 18, 2017 at 4:23 PM, Erick Erickson 
> wrote:
>
>> bq: Can I boost the Terms in the terms query
>>
>> I'm pretty sure you can't. But how many of these do you have? You can
>> always increase the maxBooleanClauses limit in solrconfig.xml. It's
>> primarily there to say "having this many clauses is usually a bad
>> idea, so proceed with caution". I've seen 10,000 and higher be used
>> before, you're really only limited by memory.
>>
>> And I'm going to guess that your application doesn't have a high query
>> rate, so you can likely make maxBooleanClauses be very high.
>>
>> Basically, the code that TermsQuerParser uses bypasses scoring on the
>> theory that these vary large OR clauses are usually useless for
>> scoring, your application is an outlier. But you knew that already ;)
>>
>>
>> Best,
>> Erick
>>
>> On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer 
>> wrote:
>> > I have an application which currently uses a boolean query. The query
>> could
>> > have a large number of boolean terms. I know that the TermsQuery doesn't
>> > have the same limitations as the boolean query. However I need to
>> maintain
>> > the order of the original terms.
>> >
>> > The query terms from the boolean query are actually values returned by a
>> > chemical structure search, which are returned in order of their relevancy
>> > in the structure search. I maintain the order by giving them a boost
>> which
>> > is a function of the relevancy from the structure search.
>> >
>> > structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
>> >
>> > This approach gives me the results in the order I need them in. I'd love
>> to
>> > use the TermsQuery instead as it doesn't have the same limitations.
>> >
>> > Can I boost the Terms in the terms query? Is there a way to order the
>> > results? e.g. would the results be returned in the same order I specified
>> > the terms?
>> >
>> > Thanks,
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: TermsQuery Result Ordering

2017-10-19 Thread Webster Homer
Thank you, Erick.

That is exactly what I thought. Indeed, we don't care about solr's scoring,
as I said we do care about the order of the terms be maintained, hence the
requirement for boosting the term values.


On Wed, Oct 18, 2017 at 4:23 PM, Erick Erickson 
wrote:

> bq: Can I boost the Terms in the terms query
>
> I'm pretty sure you can't. But how many of these do you have? You can
> always increase the maxBooleanClauses limit in solrconfig.xml. It's
> primarily there to say "having this many clauses is usually a bad
> idea, so proceed with caution". I've seen 10,000 and higher be used
> before, you're really only limited by memory.
>
> And I'm going to guess that your application doesn't have a high query
> rate, so you can likely make maxBooleanClauses be very high.
>
> Basically, the code that TermsQuerParser uses bypasses scoring on the
> theory that these vary large OR clauses are usually useless for
> scoring, your application is an outlier. But you knew that already ;)
>
>
> Best,
> Erick
>
> On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer 
> wrote:
> > I have an application which currently uses a boolean query. The query
> could
> > have a large number of boolean terms. I know that the TermsQuery doesn't
> > have the same limitations as the boolean query. However I need to
> maintain
> > the order of the original terms.
> >
> > The query terms from the boolean query are actually values returned by a
> > chemical structure search, which are returned in order of their relevancy
> > in the structure search. I maintain the order by giving them a boost
> which
> > is a function of the relevancy from the structure search.
> >
> > structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
> >
> > This approach gives me the results in the order I need them in. I'd love
> to
> > use the TermsQuery instead as it doesn't have the same limitations.
> >
> > Can I boost the Terms in the terms query? Is there a way to order the
> > results? e.g. would the results be returned in the same order I specified
> > the terms?
> >
> > Thanks,
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: TermsQuery Result Ordering

2017-10-18 Thread Erick Erickson
bq: Can I boost the Terms in the terms query

I'm pretty sure you can't. But how many of these do you have? You can
always increase the maxBooleanClauses limit in solrconfig.xml. It's
primarily there to say "having this many clauses is usually a bad
idea, so proceed with caution". I've seen 10,000 and higher be used
before, you're really only limited by memory.

And I'm going to guess that your application doesn't have a high query
rate, so you can likely make maxBooleanClauses be very high.

Basically, the code that TermsQuerParser uses bypasses scoring on the
theory that these vary large OR clauses are usually useless for
scoring, your application is an outlier. But you knew that already ;)


Best,
Erick

On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer  wrote:
> I have an application which currently uses a boolean query. The query could
> have a large number of boolean terms. I know that the TermsQuery doesn't
> have the same limitations as the boolean query. However I need to maintain
> the order of the original terms.
>
> The query terms from the boolean query are actually values returned by a
> chemical structure search, which are returned in order of their relevancy
> in the structure search. I maintain the order by giving them a boost which
> is a function of the relevancy from the structure search.
>
> structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
>
> This approach gives me the results in the order I need them in. I'd love to
> use the TermsQuery instead as it doesn't have the same limitations.
>
> Can I boost the Terms in the terms query? Is there a way to order the
> results? e.g. would the results be returned in the same order I specified
> the terms?
>
> Thanks,
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


TermsQuery Result Ordering

2017-10-18 Thread Webster Homer
I have an application which currently uses a boolean query. The query could
have a large number of boolean terms. I know that the TermsQuery doesn't
have the same limitations as the boolean query. However I need to maintain
the order of the original terms.

The query terms from the boolean query are actually values returned by a
chemical structure search, which are returned in order of their relevancy
in the structure search. I maintain the order by giving them a boost which
is a function of the relevancy from the structure search.

structure_id:(12345^800 OR 12356^750 OR abcde^600 ...

This approach gives me the results in the order I need them in. I'd love to
use the TermsQuery instead as it doesn't have the same limitations.

Can I boost the Terms in the terms query? Is there a way to order the
results? e.g. would the results be returned in the same order I specified
the terms?

Thanks,

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.