Thanks! I'll check it out. On Fri, Aug 12, 2016 at 12:05 PM, Susheel Kumar <susheel2...@gmail.com> wrote:
> Not exactly sure what you are looking from chaining the results but similar > functionality is available in Streaming expressions where result of inner > expressions are passed to outer expressions and so on > https://cwiki.apache.org/confluence/display/solr/Streaming+Expressions > > HTH > Susheel > > On Fri, Aug 12, 2016 at 1:08 PM, John Bickerstaff < > j...@johnbickerstaff.com> > wrote: > > > Hossman - many thanks again for your comprehensive and very helpful > answer! > > > > All, > > > > I am (possibly mis-remembering) reading something about being able to > pass > > the results of one query to another query... Essentially "chaining" > result > > sets. > > > > I have looked in docs and can't find anything on a quick search -- I may > > have been reading about the Re-Ranking feature, which doesn't help me (I > > know because I just tried and it seems to return all results anyway, just > > re-ranking the number specified in the reRankDocs flag...) > > > > Is there a way to (cleanly) send the results of one query to another > query > > for further processing? Essentially, pass ONLY the results (including an > > empty set of results) to another query for processing? > > > > thanks... > > > > On Thu, Aug 11, 2016 at 6:19 PM, John Bickerstaff < > > j...@johnbickerstaff.com> > > wrote: > > > > > Thanks! > > > > > > To answer your questions, while I digest the rest of that > information... > > > > > > I'm using the hon-lucene-synonyms.5.0.4.jar from here: > > > https://github.com/healthonnet/hon-lucene-synonyms > > > > > > The config looks like this - and IIRC, is simply a copy from the > > > recommended cofig on the site mentioned above. > > > > > > <queryParser name="synonym_edismax" class="com.github.healthonnet. > > search. > > > SynonymExpandingExtendedDismaxQParserPlugin"> > > > <!-- You can define more than one synonym analyzer in the following > > > list. > > > For example, you might have one set of synonyms for English, > one > > > for French, > > > one for Spanish, etc. > > > --> > > > <lst name="synonymAnalyzers"> > > > <!-- Name your analyzer something useful, e.g. "analyzer_en", > > > "analyzer_fr", "analyzer_es", etc. > > > If you only have one, the name doesn't matter (hence > > > "myCoolAnalyzer"). > > > --> > > > <lst name="myCoolAnalyzer"> > > > <!-- We recommend a PatternTokenizerFactory that tokenizes > based > > > on whitespace and quotes. > > > This seems to work best with most people's synonym files. > > > For details, read the discussion here: > > > http://github.com/healthonnet/hon-lucene-synonyms/issues/26 > > > --> > > > <lst name="tokenizer"> > > > <str name="class">solr.PatternTokenizerFactory</str> > > > <str name="pattern"><![CDATA[(?:\s|\")+]]></str> > > > </lst> > > > <!-- The ShingleFilterFactory outputs synonyms of multiple > token > > > lengths (e.g. unigrams, bigrams, trigrams, etc.). > > > The default here is to assume you don't have any synonyms > > > longer than 4 tokens. > > > You can tweak this depending on what your synonyms look > > like. > > > E.g. if you only have unigrams, you can remove > > > it entirely, and if your synonyms are up to 7 tokens in > > > length, you should set the maxShingleSize to 7. > > > --> > > > <lst name="filter"> > > > <str name="class">solr.ShingleFilterFactory</str> > > > <str name="outputUnigramsIfNoShingles">true</str> > > > <str name="outputUnigrams">true</str> > > > <str name="minShingleSize">2</str> > > > <str name="maxShingleSize">4</str> > > > </lst> > > > <!-- This is where you set your synonym file. For the unit > tests > > > and "Getting Started" examples, we use example_synonym_file.txt. > > > This plugin will work best if you keep expand set to true > > and > > > have all your synonyms comma-separated (rather than =>-separated). > > > --> > > > <lst name="filter"> > > > <str name="class">solr.SynonymFilterFactory</str> > > > <str name="tokenizerFactory">solr. > > KeywordTokenizerFactory</str> > > > <str name="synonyms">example_synonym_file.txt</str> > > > <str name="expand">true</str> > > > <str name="ignoreCase">true</str> > > > </lst> > > > </lst> > > > </lst> > > > </queryParser> > > > > > > > > > > > > On Thu, Aug 11, 2016 at 6:01 PM, Chris Hostetter < > > hossman_luc...@fucit.org > > > > wrote: > > > > > >> > > >> : First let me say that this is very possibly the "x - y problem" so > let > > >> me > > >> : state up front what my ultimate need is -- then I'll ask about the > > >> thing I > > >> : imagine might help... which, of course, is heavily biased in the > > >> direction > > >> : of my experience coding Java and writing SQL... > > >> > > >> Thank you so much for asking your question this way! > > >> > > >> Right off the bat, the background you've provided seems supicious... > > >> > > >> : I have a piece of a query that calculates a score based on a > > "weighting" > > >> ... > > >> : The specific line is this: > > >> : <str name="bf">product(field(category_weight),20)</str> > > >> : > > >> : What I just realized is that when I query Solr for a string that has > > NO > > >> : matches in the entire corpus, I still get a slew of results because > > >> EVERY > > >> : doc has the weighting value in the category_weight field - and > > therefore > > >> : every doc gets some score. > > >> > > >> ...that is *NOT* how dismax and edisamx normally work. > > >> > > >> While both the "bf" abd "bq" params result in "additive" boosting, and > > the > > >> implementation of that "additive boost" comes from adding new optional > > >> clauses to the top level BooleanQuery that is executed, that only > > happens > > >> after the "main" query (from your "q" param) is added to that top > level > > >> BooleanQuery as a "mandaory" clause. > > >> > > >> So, for example, "bf=true()" and "bq=*:*" should match & boost every > > doc, > > >> but with the techprducts configs/data these requests still don't match > > >> anything... > > >> > > >> /select?defType=edismax&q=bogus&bf=true()&bq=*:*&debug=query > > >> /select?defType=dismax&q=bogus&bf=true()&bq=*:*&debug=query > > >> > > >> ...and if you look at the debug output, the parsed queries shows that > > the > > >> "bogus" part of the query is mandatory... > > >> > > >> +DisjunctionMaxQuery((text:bogus)) MatchAllDocsQuery(*:*) > > >> FunctionQuery(const(true)) > > >> > > >> (i didn't use "pf" in that example, but the effect is the same, the > "pf" > > >> based clauses are optional, while the "qf" based clauses are > mandatory) > > >> > > >> If you compare that example to your debug output, you'll notice a > > >> difference in structure -- it's a bit hard to see in your example, but > > if > > >> you simplify your qf, pf, and q fields it should be more obvious, but > > >> AFAICT the "main" parts of your query are getting wrapped in an extra > > >> layer of parents (ie: an extra BooleanQuery) which is *not* mandatory > in > > >> the top level query ... i don't see *any* mandatory clauses in your > top > > >> level BooleanQuery, which is why any match on a bf or bq function is > > >> enough to cause a document to match. > > >> > > >> I suspect the reason your parsed query structure is so diff has to do > > with > > >> this... > > >> > > >> : <str name="defType">synonym_edismax</str>> > > >> > > >> > > >> 1) how exactly is "synonym_edismax" defined in your solrconfig.xml? > > >> 2) what QParserPlugin are you using to implement that? > > >> > > >> I suspect whatever QParserPlugin you are using has a bug in it :) > > >> > > >> > > >> If you can't fix the bug, one possibile workaround would be to abandon > > bf > > >> and bq params completely, and instead wrap the query it produces in > in a > > >> {!boost} parser with whatever function you want (using functions like > > >> sum() or prod() to combine multiple functions, and query() to > > incorporate > > >> your current bq param). Doing this will require chanign how you > specify > > >> you input (example below) and it will result in *multiplicitive* > boosts > > -- > > >> so your scores will be much diff, and you will likely have to adjust > > your > > >> constants, but: 1) multiplicitive boosts are almost always what people > > >> *really* want anyway; 2) it will ensure the boosts are only applied > for > > >> things matching your main query, no matter how that query parser works > > or > > >> what bugs it has. > > >> > > >> Example of using {!boost} to wrap an arbitrary other parser... > > >> > > >> instead of... > > >> defType=foofoo > > >> q=barbarbar > > >> > > >> use... > > >> q={!boost b=$func defType=foofoo v=$qq} > > >> qq=barbarbar > > >> func=sum(something,somethingelse) > > >> > > >> https://cwiki.apache.org/confluence/display/solr/Other+Parsers > > >> https://cwiki.apache.org/confluence/display/solr/Function+Queries > > >> > > >> > > >> > > >> > > >> : > > >> : What I would like is to return zero results if there is no match for > > the > > >> : querystring. My collection is small enough that I don't care if the > > >> actual > > >> : calculation runs on each doc (although that's wasteful) -- I just > > don't > > >> : want to see results come back for zero matches to the querystring > > >> : > > >> : (The /select endpoint does this of course, but my custom endpoint > > >> includes > > >> : this "weighting" piece and therefore returns every doc in the corpus > > >> : because they all have the weighting. > > >> : > > >> : ==================== > > >> : Enter my imagined solution... The potential X-Y problem... > > >> : ==================== > > >> : > > >> : So - given that I come from a programming background, I immediately > > >> start > > >> : thinking of an if statement ... > > >> : > > >> : if(some_score_for_the_primary_search_string) { > > >> : run_the_category_weight_calculation; > > >> : } else { > > >> : do_NOT_run_category_weight_calc; > > >> : } > > >> : > > >> : > > >> : Another way of thinking of it would be something like the "WHERE" > > >> clause in > > >> : SQL... > > >> : > > >> : run_category_weight_calculation WHERE "searchstring" is found in > the > > >> : document, not otherwise. > > >> : > > >> : I'm aware that things could be handled in the client-side of my web > > app, > > >> : but if possible, I'd like the interface to SOLR to be as clean as > > >> possible, > > >> : and massage incoming SOLR data as little as possible. > > >> : > > >> : In other words, do NOT return any docs if the querystring (and any > > >> : synonyms) match zero docs. > > >> : > > >> : Here is the endpoint XML for the query. I've highlighted the > specific > > >> line > > >> : that is causing the unintended results... > > >> : > > >> : > > >> : <requestHandler name="/foo" class="solr.SearchHandler"> > > >> : <!-- default values for query parameters can be specified, these > > >> : will be overridden by parameters in the request > > >> : --> > > >> : <lst name="defaults"> > > >> : <str name="echoParams">all</str> > > >> : <int name="rows">20</int> > > >> : <!-- Query settings --> > > >> : <str name="df">text</str> > > >> : <!-- <str name="df">title</str> --> > > >> : <str name="defType">synonym_edismax</str>> > > >> : <str name="synonyms">true</str> > > >> : <!-- The line below balances out the weighting of exact matches > to > > >> the > > >> : synonym phrase entered by the user > > >> : with the category_weight calculation and the titleQuery > calc. > > >> : These numbers exist in a balance and > > >> : if one is raised or lowered, the others (probably) need to > > >> change > > >> : as well. It may be better to go with decimals > > >> : for all of them... .4 instead of 4 and 2 instead of 20 and > > 2.5 > > >> : instead of 25. > > >> : In the end, I'm not sure it really matters, but don't > change > > >> one > > >> : without changing the others > > >> : unless you've tested and are sure you want the results --> > > >> : <float name="synonyms.originalBoost">1.5</float> > > >> : <float name="synonyms.synonymBoost">1.1</float> > > >> : <str name="mm">75%</str> > > >> : <str name="q.alt">*:*</str> > > >> : <str name="rows">20</str> > > >> : <str name="fq">meta_doc_type:chapterDoc</str> > > >> : <str name="bq">{!synonym_edismax qf='title' synonyms='true' > > >> : synonyms.originalBoost='2.5' synonyms.synonymBoost='1.1' bf='' bq='' > > >> : v=$q}</str> > > >> : <str name="fl">id category_weight title category_ss score > > >> : contentType</str> > > >> : <str name="titleQuery">{!edismax qf='title' bf='' bq='' > > >> v=$q}</str> > > >> : ===================================================== > > >> : *<str name="bf">product(field(category_weight),20)</str>* > > >> : ===================================================== > > >> : <str name="bf">product(query($titleQuery),4)</str> > > >> : <str name="qf">text contentType^1000</str> > > >> : <str name="wt">python</str> > > >> : <str name="debug">true</str> > > >> : <str name="debug.explain.structured">true</str> > > >> : <str name="indent">true</str> > > >> : <str name="echoParams">all</str> > > >> : </lst> > > >> : </requestHandler> > > >> : > > >> : And here is the debug output for a query. (This was a test for > > >> synonyms, > > >> : which you'll see in the output.) The original query string was, of > > >> : course, "μ-heavy > > >> : chain disease" > > >> : > > >> : You'll note that although there is no score in the first doc explain > > for > > >> : the actual querystring, the highlighted section does get a score for > > >> : product(double(category_weight)=1.5,const(20)) > > >> : > > >> : ... which is the thing that is currently causing all the docs in the > > >> : collection to "match" even though the querystring is not in any of > > them. > > >> : > > >> : "debug":{ "rawquerystring":"\"μ-heavy chain disease\"", > > >> : "querystring":"\"μ-heavy > > >> : chain disease\"", "parsedquery":"(DisjunctionMaxQuery((text:\"μ > heavy > > >> chain > > >> : disease\" | (contentType:\"μ heavy chain disease\")^1000.0))^1.5 > > >> : ((+DisjunctionMaxQuery((text:\"mu heavy chain disease\" | > > >> (contentType:\"mu > > >> : heavy chain disease\")^1000.0)))/no_coord^1.1) > > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ > > >> : hcd\")^1000.0)))/no_coord^1.1) ((+DisjunctionMaxQuery((text:\"μ > heavy > > >> chain > > >> : disease\" | (contentType:\"μ heavy chain > > disease\")^1000.0)))/no_coord^ > > >> 1.1) > > >> : ((+DisjunctionMaxQuery((text:\"μ hcd\" | (contentType:\"μ > > >> : hcd\")^1000.0)))/no_coord^1.1)) ((DisjunctionMaxQuery((title:\"μ > > heavy > > >> : chain disease\"))^2.5 ((+DisjunctionMaxQuery((title:\"mu heavy > chain > > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ > > >> : hcd\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ heavy > chain > > >> : disease\")))/no_coord^1.1) ((+DisjunctionMaxQuery((title:\"μ > > >> : hcd\")))/no_coord^1.1))) > > >> : FunctionQuery(product(double(category_weight),const(20))) > > >> : FunctionQuery(product(query(+(title:\"μ heavy chain > > >> : disease\"),def=0.0),const(4)))", "parsedquery_toString":"((( > text:\"μ > > >> heavy > > >> : chain disease\" | (contentType:\"μ heavy chain > disease\")^1000.0))^1.5 > > >> : ((+(text:\"mu heavy chain disease\" | (contentType:\"mu heavy chain > > >> : disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | (contentType:\"μ > > >> : hcd\")^1000.0))^1.1) ((+(text:\"μ heavy chain disease\" | > > >> (contentType:\"μ > > >> : heavy chain disease\")^1000.0))^1.1) ((+(text:\"μ hcd\" | > > >> (contentType:\"μ > > >> : hcd\")^1000.0))^1.1)) ((((title:\"μ heavy chain disease\"))^2.5 > > >> : ((+(title:\"mu heavy chain disease\"))^1.1) ((+(title:\"μ > hcd\"))^1.1) > > >> : ((+(title:\"μ heavy chain disease\"))^1.1) ((+(title:\"μ > > hcd\"))^1.1))) > > >> : product(double(category_weight),const(20)) > product(query(+(title:\"μ > > >> heavy > > >> : chain disease\"),def=0.0),const(4))", "explain":{ " > > >> : 33d808fe-6ccf-4305-a643-48e94de34d18":{ "match":true, > "value":30.0, " > > >> : description":"sum of:", "details":[{ "match":true, "value":30.0, " > > >> : description":"FunctionQuery(product(double(category_weight), > > >> const(20))), > > >> : product of:", > > >> : ===================================================== > > >> : *"details":**[{ "match":true, "value":30.0, > > >> : "description":"product(double(category_weight)=1.5,const(20))"}, {* > > >> : ===================================================== > > >> : > > >> : "match":true, "value":1.0, "description":"boost"}, { "match":true, > > >> "value": > > >> : 1.0, "description":"queryNorm"}]}, { > > >> : > > >> > > >> -Hoss > > >> http://www.lucidworks.com/ > > > > > > > > > > > >