[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094834#comment-13094834
 ] 

Robert Muir commented on LUCENE-3130:
-

Well, I don't think it prohibits it? 

This kinda refactoring/feature would be really good, if we could refactor our 
queryparser stuff to make it easier to customize how queries are built, you 
know thats exactly the kind of thing we should be doing!

I'm just saying I think right now it would be painful to do with lucene's core 
QP given its current design.


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094828#comment-13094828
 ] 

Jan Høydahl commented on LUCENE-3130:
-

I'm not saying that most people use or want single-term synonyms, but since 
that's the only thing that works query-time with Solr now, most people accept 
that limitation, but it's still useful. I tend to split synonyms in two 
dictionaries - a separate one with multi term synonyms for use on index side. 
Not a perfect situation, but that's what we got and how people use it.

I'm not into the code, and I have no reason do doubt your judgement that the 
cost/benefit prohibits adding this to current QParser :-(

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094554#comment-13094554
 ] 

Robert Muir commented on LUCENE-3130:
-

Jan, not sure that most people only use single-term synonyms... if this is the 
case maybe we should rethink our synonyms implementation because multi-word 
adds a ton of complexity!

Another reason I suggested avoiding adding this to the core queryparser is 
because its going to be challenging to allow this optional boosting in a 
flexible way (just look at the getFieldQuery... its very hairy). I think in the 
ideal case, we somehow restructure all this code so that subclasses have more 
control over how the query is created... however I think this might be 
challenging just given how the code is structured now.

The reason I think it would be best exposed as a 'hook' to subclasses (versus 
adding a "deboost synonyms" option directly to the core QP), is that I think 
people are going to want to customize how this works, e.g. control it per-field 
and things like that.

At the end of the day, a queryparser could always subclass getFieldQuery 
completely and do this now, but thats not great either because the code is so 
hairy :(

This kind of feature might be easier to implement with the new queryparser in 
contrib, but I'm not sure.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094545#comment-13094545
 ] 

Jan Høydahl commented on LUCENE-3130:
-

The core of this issue is providing a mechanism for deboosting synonyms, and as 
long as it works with single-term synonyms that at least covers what most 
people use today. I propose we handle that first. Agree that it would be nice 
with a query-parser which can handle multi word synonyms. But that could be 
handled incrementally in a separate issue.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094535#comment-13094535
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
But for the synonym case, what remains is to modify the QueryParser to act on 
the already-present TypeAttribute, is that so? If so, let's open another issue 
for that.
{quote}

I think so? Though it might be more useful not to modify the core queryparser 
for this? The reason is that such a feature is geared towards synonyms and 
multi-word synonyms don't work well with it... So maybe instead to a simpler 
queryparser that *does* work well with multi-word synonyms by default?

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094505#comment-13094505
 ] 

Jan Høydahl commented on LUCENE-3130:
-

Robert, two fields work great for supporting stuff like phonetic and 
stem/non-stem search, and also lower/exact-case search although index size 
could be lower with a one-field approach. Let's those use cases rest for now.

But for the synonym case, what remains is to modify the QueryParser to act on 
the already-present TypeAttribute, is that so? If so, let's open another issue 
for that.


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094454#comment-13094454
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
Let's get back to the original issue: we need some way to let the "original" 
form of a term have higher weight than the alternative forms generated by 
analysis (whether those are synonyms, stems, lowercase or what have you).
{quote}

I'm not sure we do! see my last response. I think 2 fields is just fine.

As for things like synonyms, these already set TypeAttribute. So if your 
consumer wants to do something on synonyms, look for type = "" or 
whatever it already sets.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-08-31 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094397#comment-13094397
 ] 

Jan Høydahl commented on LUCENE-3130:
-

Let's get back to the original issue: we need some way to let the "original" 
form of a term have higher weight than the alternative forms generated by 
analysis (whether those are synonyms, stems, lowercase or what have you).

Is tagging the added tokens with a tokenType, and then enabling the QParsers to 
act on these tokenTypes a viable way forward?

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-26 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055050#comment-13055050
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
Currently I use a separate field for phonetic normalization and include it with 
a lower weight in DisMax. If phonetic variant instead was stored alongside the 
original with posIncr=0 and tokenType=phonetic, I could instead specify a 
deboost factor for phonetic terms and even highlighting would work ootb!
{quote}

This doesn't make any sense to me: how is this "better" shoved into one field 
than two fields? I don't see any advantage at all. field A with original terms 
and field B with phonetic terms is no less efficient in the index than having 
field AB with both mixed up, but keeping them separate keeps code and 
configurations simple.

As for the highlighting, that sounds like a highlighting problem, not an 
analysis problem. If its often the case that users use things like copyField 
and do this boosting, then highlighting in Solr needs to be fixed to correlate 
the offsets back to the original stored field: but we need not make analysis 
more complicated because of this limitation.


{quote}
If the LowerCaseFilter would keep the original token and add a lowercased token 
on same posIncr with tokenType=lowercase, we could support case insensitive 
match with preference for correct case.
{quote}

I don't think we should complicate our tokenfilters with such things: in this 
case I think it would just make the code more complicated and make relevance 
worse: often case is totally meaningless and boosting terms for some arbitrary 
reason will skew scores.

This is for the same reason as above. If you want to do this, I think you 
should use two fields, one with no case, and one with case, and boost one of 
them. 


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-26 Thread JIRA

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055040#comment-13055040
 ] 

Jan Høydahl commented on LUCENE-3130:
-

The feature is absolutely needed. Probably it's enough to be able to specify a 
global term boost factor per query for all synonyms, so Robert's method would 
work for me.

Another usecase is Phonetic variants. Currently I use a separate field for 
phonetic normalization and include it with a lower weight in DisMax. If 
phonetic variant instead was stored alongside the original with posIncr=0 and 
tokenType=phonetic, I could instead specify a deboost factor for phonetic terms 
and even highlighting would work ootb!

Yet another is lower/upper case search. If the LowerCaseFilter would keep the 
original token and add a lowercased token on same posIncr with 
tokenType=lowercase, we could support case insensitive match with preference 
for correct case.

If user needs different boost for different fields, perhaps the TokenType name 
could be configurable on each filter.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052785#comment-13052785
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
why don't you consider an attribute that denotes "this term is worth less then 
a typical term" a general description of the text?
{quote}

I would rather it be more descriptive: for example an attribute that notes this 
is a Synonym is useful. If its a named entity, mark it as a named entity.

But i don't want to see float values cranked into the tokenfilters, I think 
this is messy. I think the analysis process should instead describe up the text 
for the consumer (e.g. queryparser) to do with as they please: i.e. in this 
case its the queryparser's job to then turn this into some concrete query that 
does something magic: if thats downboosting synonyms, but maybe it would do 
something different, like drop them alltogether.


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-06-21 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052771#comment-13052771
 ] 

Hoss Man commented on LUCENE-3130:
--

bq. A QP can already solve this issue today, simply by boosting down terms with 
positionIncrement = 0.

That assumes:
a) that every TokenFilter which might inject terms like this will always put 
the most important one first
b) that the amount of boost should be fixed

what i'm suggesting is that we make this more flexible so that people wiring 
together their apps and analyzers have an easy way to guide the queryParsers 
behavior.  if we have allow a well defined attribute for this people can have 
custom analysis that specify arbitrary boosts in cases we may not be able to 
specificly anticipate. (synonyms, entity recognition, common word demoting, 
etc..)

bq. But I really think the implementation details of QP should remain in QP, 
the analysis chain should instead be general and describe up the text.

why don't you consider an attribute that denotes "this term is worth less then 
a typical term" a general description of the text?

bq. Otherwise, things get really confusing, e.g. what should a ShingleFilter do 
when it combines two tokens that have different BoostAttributes?

It does whatever it already does when it encounters two tokens that may have 
attributes it doesn't know about (ignore them when creating the new token, if i 
remember correctly).  Unrecognized attributes isn't a new problem.

bq. If you do what you describe, what if you then want to tweak the ranking for 
synonyms? You must reindex.

how is that any different from any other aspect of index time synonyms?  if you 
use them you *always* have to reindex when you change your synonyms.

I'm not arguing that index time synonyms is a good idea in general, i'm not 
arguing that this "we look for BoostAttributes on tokens" feature of the QP 
would be useful (or even a good idea for everyone).  I'm arguing that having 
such a feature would provide an easy way for people who are alreayd customizing 
their analysis to easily modify/influence the behavior of the query parser (w/o 
subclassing) that could still easily work in conjunction with other techniques.

> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-05-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037387#comment-13037387
 ] 

Robert Muir commented on LUCENE-3130:
-

{quote}
Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the 
boost attribute into the payload attribute, these same filters could give 
"penalizing" payloads to terms when used at index time) could give "penalizing" 
payloads to terms.
{quote}

Again, I think this is at the wrong level. If you do what you describe, what if 
you then want to tweak the ranking for synonyms? You must reindex.

Instead, its far better to use TypeAsPayloadFilter and put the type into the 
payload. Then you can tweak scorePayload() to your hearts content to adjust the 
ranking without reindexing all documents.


> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts

2011-05-21 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037386#comment-13037386
 ] 

Robert Muir commented on LUCENE-3130:
-

Hi Hoss Man,

I don't think I agree that a boost attribute is the best way to implement this.

A QP can already solve this issue today, simply by boosting down terms with 
positionIncrement = 0. This would solve all of the cases you listed, without 
making these tokenstreams more complicated.

If such a QP really needs to know more than positionIncrement=0, then a better 
approach would be to set token types (need not be TypeAttribute, could be 
something more strongly-typed), to indicate synonym, phonetic variation, etc 
etc.

But I really think the implementation details of QP should remain in QP, the 
analysis chain should instead be general and describe up the text.

Otherwise, things get really confusing, e.g. what should a ShingleFilter do 
when it combines two tokens that have different BoostAttributes? But with 
types, this is no problem at all, because the ShingleFilter can simply set the 
type to 'shingle' and its unambiguous... its up to the consumer to do whatever 
it wants with this.



> Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should 
> give lower boosts
> ---
>
> Key: LUCENE-3130
> URL: https://issues.apache.org/jira/browse/LUCENE-3130
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Hoss Man
>
> A recent thread asked if there was anyway to use QueryTime synonyms such that 
> matches on the original term specified by the user would score higher then 
> matches on the synonym.  It occurred to me later that a float Attribute could 
> be set by the SynonymFilter in such situations, and QueryParser could use 
> that float as a boost in the resulting Query.  IThis would be fairly 
> straightforward for the simple "synonyms => BooleamQuery" case, but we'd have 
> to decide how to handle the case of synonyms with multiple terms that produce 
> MTPQ, possibly just punt for now)
> Likewise, there may be other TokenFilters that "inject" artificial tokens at 
> query time where it also might make sense to have a reduced "boost" factor...
> * SynonymFilter
> * CommonGramsFilter
> * WordDelimiterFilter
> * etc...
> In all of these cases, the amount of the "boost" could me configured, and for 
> back compact could default to "1.0" (or null to not set a boost at all)
> Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied 
> the boost attribute into the payload attribute, these same filters could give 
> "penalizing" payloads to terms when used at index time) could give 
> "penalizing" payloads to terms.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org