[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094834#comment-13094834 ] Robert Muir commented on LUCENE-3130: - Well, I don't think it prohibits it? This kinda refactoring/feature would be really good, if we could refactor our queryparser stuff to make it easier to customize how queries are built, you know thats exactly the kind of thing we should be doing! I'm just saying I think right now it would be painful to do with lucene's core QP given its current design. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094828#comment-13094828 ] Jan Høydahl commented on LUCENE-3130: - I'm not saying that most people use or want single-term synonyms, but since that's the only thing that works query-time with Solr now, most people accept that limitation, but it's still useful. I tend to split synonyms in two dictionaries - a separate one with multi term synonyms for use on index side. Not a perfect situation, but that's what we got and how people use it. I'm not into the code, and I have no reason do doubt your judgement that the cost/benefit prohibits adding this to current QParser :-( > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094554#comment-13094554 ] Robert Muir commented on LUCENE-3130: - Jan, not sure that most people only use single-term synonyms... if this is the case maybe we should rethink our synonyms implementation because multi-word adds a ton of complexity! Another reason I suggested avoiding adding this to the core queryparser is because its going to be challenging to allow this optional boosting in a flexible way (just look at the getFieldQuery... its very hairy). I think in the ideal case, we somehow restructure all this code so that subclasses have more control over how the query is created... however I think this might be challenging just given how the code is structured now. The reason I think it would be best exposed as a 'hook' to subclasses (versus adding a "deboost synonyms" option directly to the core QP), is that I think people are going to want to customize how this works, e.g. control it per-field and things like that. At the end of the day, a queryparser could always subclass getFieldQuery completely and do this now, but thats not great either because the code is so hairy :( This kind of feature might be easier to implement with the new queryparser in contrib, but I'm not sure. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094545#comment-13094545 ] Jan Høydahl commented on LUCENE-3130: - The core of this issue is providing a mechanism for deboosting synonyms, and as long as it works with single-term synonyms that at least covers what most people use today. I propose we handle that first. Agree that it would be nice with a query-parser which can handle multi word synonyms. But that could be handled incrementally in a separate issue. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094535#comment-13094535 ] Robert Muir commented on LUCENE-3130: - {quote} But for the synonym case, what remains is to modify the QueryParser to act on the already-present TypeAttribute, is that so? If so, let's open another issue for that. {quote} I think so? Though it might be more useful not to modify the core queryparser for this? The reason is that such a feature is geared towards synonyms and multi-word synonyms don't work well with it... So maybe instead to a simpler queryparser that *does* work well with multi-word synonyms by default? > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094505#comment-13094505 ] Jan Høydahl commented on LUCENE-3130: - Robert, two fields work great for supporting stuff like phonetic and stem/non-stem search, and also lower/exact-case search although index size could be lower with a one-field approach. Let's those use cases rest for now. But for the synonym case, what remains is to modify the QueryParser to act on the already-present TypeAttribute, is that so? If so, let's open another issue for that. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094454#comment-13094454 ] Robert Muir commented on LUCENE-3130: - {quote} Let's get back to the original issue: we need some way to let the "original" form of a term have higher weight than the alternative forms generated by analysis (whether those are synonyms, stems, lowercase or what have you). {quote} I'm not sure we do! see my last response. I think 2 fields is just fine. As for things like synonyms, these already set TypeAttribute. So if your consumer wants to do something on synonyms, look for type = "" or whatever it already sets. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094397#comment-13094397 ] Jan Høydahl commented on LUCENE-3130: - Let's get back to the original issue: we need some way to let the "original" form of a term have higher weight than the alternative forms generated by analysis (whether those are synonyms, stems, lowercase or what have you). Is tagging the added tokens with a tokenType, and then enabling the QParsers to act on these tokenTypes a viable way forward? > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055050#comment-13055050 ] Robert Muir commented on LUCENE-3130: - {quote} Currently I use a separate field for phonetic normalization and include it with a lower weight in DisMax. If phonetic variant instead was stored alongside the original with posIncr=0 and tokenType=phonetic, I could instead specify a deboost factor for phonetic terms and even highlighting would work ootb! {quote} This doesn't make any sense to me: how is this "better" shoved into one field than two fields? I don't see any advantage at all. field A with original terms and field B with phonetic terms is no less efficient in the index than having field AB with both mixed up, but keeping them separate keeps code and configurations simple. As for the highlighting, that sounds like a highlighting problem, not an analysis problem. If its often the case that users use things like copyField and do this boosting, then highlighting in Solr needs to be fixed to correlate the offsets back to the original stored field: but we need not make analysis more complicated because of this limitation. {quote} If the LowerCaseFilter would keep the original token and add a lowercased token on same posIncr with tokenType=lowercase, we could support case insensitive match with preference for correct case. {quote} I don't think we should complicate our tokenfilters with such things: in this case I think it would just make the code more complicated and make relevance worse: often case is totally meaningless and boosting terms for some arbitrary reason will skew scores. This is for the same reason as above. If you want to do this, I think you should use two fields, one with no case, and one with case, and boost one of them. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055040#comment-13055040 ] Jan Høydahl commented on LUCENE-3130: - The feature is absolutely needed. Probably it's enough to be able to specify a global term boost factor per query for all synonyms, so Robert's method would work for me. Another usecase is Phonetic variants. Currently I use a separate field for phonetic normalization and include it with a lower weight in DisMax. If phonetic variant instead was stored alongside the original with posIncr=0 and tokenType=phonetic, I could instead specify a deboost factor for phonetic terms and even highlighting would work ootb! Yet another is lower/upper case search. If the LowerCaseFilter would keep the original token and add a lowercased token on same posIncr with tokenType=lowercase, we could support case insensitive match with preference for correct case. If user needs different boost for different fields, perhaps the TokenType name could be configurable on each filter. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052785#comment-13052785 ] Robert Muir commented on LUCENE-3130: - {quote} why don't you consider an attribute that denotes "this term is worth less then a typical term" a general description of the text? {quote} I would rather it be more descriptive: for example an attribute that notes this is a Synonym is useful. If its a named entity, mark it as a named entity. But i don't want to see float values cranked into the tokenfilters, I think this is messy. I think the analysis process should instead describe up the text for the consumer (e.g. queryparser) to do with as they please: i.e. in this case its the queryparser's job to then turn this into some concrete query that does something magic: if thats downboosting synonyms, but maybe it would do something different, like drop them alltogether. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13052771#comment-13052771 ] Hoss Man commented on LUCENE-3130: -- bq. A QP can already solve this issue today, simply by boosting down terms with positionIncrement = 0. That assumes: a) that every TokenFilter which might inject terms like this will always put the most important one first b) that the amount of boost should be fixed what i'm suggesting is that we make this more flexible so that people wiring together their apps and analyzers have an easy way to guide the queryParsers behavior. if we have allow a well defined attribute for this people can have custom analysis that specify arbitrary boosts in cases we may not be able to specificly anticipate. (synonyms, entity recognition, common word demoting, etc..) bq. But I really think the implementation details of QP should remain in QP, the analysis chain should instead be general and describe up the text. why don't you consider an attribute that denotes "this term is worth less then a typical term" a general description of the text? bq. Otherwise, things get really confusing, e.g. what should a ShingleFilter do when it combines two tokens that have different BoostAttributes? It does whatever it already does when it encounters two tokens that may have attributes it doesn't know about (ignore them when creating the new token, if i remember correctly). Unrecognized attributes isn't a new problem. bq. If you do what you describe, what if you then want to tweak the ranking for synonyms? You must reindex. how is that any different from any other aspect of index time synonyms? if you use them you *always* have to reindex when you change your synonyms. I'm not arguing that index time synonyms is a good idea in general, i'm not arguing that this "we look for BoostAttributes on tokens" feature of the QP would be useful (or even a good idea for everyone). I'm arguing that having such a feature would provide an easy way for people who are alreayd customizing their analysis to easily modify/influence the behavior of the query parser (w/o subclassing) that could still easily work in conjunction with other techniques. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037387#comment-13037387 ] Robert Muir commented on LUCENE-3130: - {quote} Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied the boost attribute into the payload attribute, these same filters could give "penalizing" payloads to terms when used at index time) could give "penalizing" payloads to terms. {quote} Again, I think this is at the wrong level. If you do what you describe, what if you then want to tweak the ranking for synonyms? You must reindex. Instead, its far better to use TypeAsPayloadFilter and put the type into the payload. Then you can tweak scorePayload() to your hearts content to adjust the ranking without reindexing all documents. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3130) Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should give lower boosts
[ https://issues.apache.org/jira/browse/LUCENE-3130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037386#comment-13037386 ] Robert Muir commented on LUCENE-3130: - Hi Hoss Man, I don't think I agree that a boost attribute is the best way to implement this. A QP can already solve this issue today, simply by boosting down terms with positionIncrement = 0. This would solve all of the cases you listed, without making these tokenstreams more complicated. If such a QP really needs to know more than positionIncrement=0, then a better approach would be to set token types (need not be TypeAttribute, could be something more strongly-typed), to indicate synonym, phonetic variation, etc etc. But I really think the implementation details of QP should remain in QP, the analysis chain should instead be general and describe up the text. Otherwise, things get really confusing, e.g. what should a ShingleFilter do when it combines two tokens that have different BoostAttributes? But with types, this is no problem at all, because the ShingleFilter can simply set the type to 'shingle' and its unambiguous... its up to the consumer to do whatever it wants with this. > Use BoostAttribute in in TokenFilters to denote Terms that QueryParser should > give lower boosts > --- > > Key: LUCENE-3130 > URL: https://issues.apache.org/jira/browse/LUCENE-3130 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Hoss Man > > A recent thread asked if there was anyway to use QueryTime synonyms such that > matches on the original term specified by the user would score higher then > matches on the synonym. It occurred to me later that a float Attribute could > be set by the SynonymFilter in such situations, and QueryParser could use > that float as a boost in the resulting Query. IThis would be fairly > straightforward for the simple "synonyms => BooleamQuery" case, but we'd have > to decide how to handle the case of synonyms with multiple terms that produce > MTPQ, possibly just punt for now) > Likewise, there may be other TokenFilters that "inject" artificial tokens at > query time where it also might make sense to have a reduced "boost" factor... > * SynonymFilter > * CommonGramsFilter > * WordDelimiterFilter > * etc... > In all of these cases, the amount of the "boost" could me configured, and for > back compact could default to "1.0" (or null to not set a boost at all) > Furthermore: if we add a new BoostAttrToPayloadAttrFilter that just copied > the boost attribute into the payload attribute, these same filters could give > "penalizing" payloads to terms when used at index time) could give > "penalizing" payloads to terms. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org