[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12787021#action_12787021 ]
Hoss Man commented on SOLR-1553: -------------------------------- Thoughts while reading the code... * the code is kind of hard to read ... there's a serious dirth of comments * reads very kludgy, clearly a hacked up version of DisMax ... probably want to refactor some helper functions (that can then be documented) * the clause.field and getFieldName functionality is dangerous for people migrating from edismax->dismax (users guessing field names can query on fields the solr admin doesn't want them to query on) ... we need an option to turn that off. ** one really nice thing about the field query support though: it looks like it would really be easy to add support for arbitrary field name aliasing with something like f.someFieldAlias.qf=realFieldA^3+realFieldB^4 ** perhaps getFieldName should only work for fields explicitly enumerated in a param? * why is "TO" listed as an operator when building up the phrase boost fields? (line 296) ... if range queries are supported, then shouldn't the upper/lower bounds also be striped out of the clauses list? ** accepting range queries also seems like something that people should be able to disable * apparently "pf" was changed to iteratively build boosting phrase queries for every 'pair' of words, and "pf3" is a new param to build boosting phrase queries for every 'triple' of words in the input. while this certainly seems useful, it's not back-compatable .. why not restore 'pf' to it's original purpose, and add "pf2" for hte pairs? * what is the motivation for ExtendedSolrQueryParser.makeDismax? ... i see that the boost queries built from the pf and pf3 fields are put in BooleanQueries instead of DisjunctionMaxQueries ... but why? (if the user searches for a phrase that's common in many fields of one document, that document is going to get a huge score boost regardless of the "tie" value, which kind of defeats the point of what the dismax parser is trying to do) * we should remove the extremely legacy "/* legacy logic */" for dealing with "bq" ... almost no one should care about that, we really don't need to carry it forward in a new parser. * there are a lot of empty catch blocks that seem like they should at least log a warning or debug message. * ExtendedAnalyzer feels like a really big hack ... i'm not certain, but i don't think it works correctly if a CharFilter is declared. * we need to document all these new params ("pf3", "lowercaseOperators", "boost", Thoughts while testing it out on some really hairy edge cases that break the old dismax parser... * this is really cool * this is really freaking cool. * still has a problem with search strings like "foo &&" and "foo ||" ... i suspect it would be an easy fix to recognize these just like AND/OR are recognized and escaped. * once we fix some of hte issues mentioned above, we should absolutely register this using the name "dismax" by default, and register the old one as "oldDismax" with a note in CHANGES.txt telling people to use defType=oldDismax if they really need it. > extended dismax query parser > ---------------------------- > > Key: SOLR-1553 > URL: https://issues.apache.org/jira/browse/SOLR-1553 > Project: Solr > Issue Type: New Feature > Reporter: Yonik Seeley > Fix For: 1.5 > > Attachments: SOLR-1553.patch > > > An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.