Re: Why do I get different results for the same query with two Solr versions?
Tulsi wrote > Can you post the managed schema and solrconfig content here ? Schema for the 4.6 index (I omitted all non-relevant data): Schema for the 7.5 index (I omitted all non-relevant data): About the solrconfig.xml file - I don't think I can share it because it may contain sensitive information. Is there something specific from this file that may be relevant for our discussion? Tulsi wrote > Do try the solr admin analysis screen > once as well to see the behaviour for this field. > https://lucene.apache.org/solr/guide/7_6/index.html I looked at the analysis screen, but it wasn't helpful. That's why I started using the "debug=query" parameter and the content of parsedquery. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Why do I get different results for the same query with two Solr versions?
Can you post the managed schema and solrconfig content here ? Do try the solr admin analysis screen once as well to see the behaviour for this field. https://lucene.apache.org/solr/guide/7_6/index.html On Sun, 27 Dec, 2020, 6:54 pm nettadalet, wrote: > Thank you, that was helpful! > > For Solr 4.6 I get > "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")" > > For Solr 7.5 I get > "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki > +TITLE_ItemCode_t:7)))" > > So this is the cause of the difference in the search result, but I still > don't know why the parsedquery is different between the two versions. > Any idea/guess? > Is it some internal implementation that changed sometime between 4.6 and > 7.5? > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Re:Re: Why do I get different results for the same query with two Solr versions?
Hi, thank for the comment, but I tried to use both "sow=false" and "saw=true" and I still get the same result. For query (TITLE_ItemCode_t:KI_7) I still see: Solr 4.6: "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")" Solr 7.5: "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki +TITLE_ItemCode_t:7)))" Tulsi wrote > Hi , > Yes this look like related to sow (split on whitespace) param default > behaviour change in solr 7. > > The sow parameter (short for "Split on Whitespace") now defaults to > false, which allows support for multi-word synonyms out of the box. > This parameter is used with the eDismax and standard/"lucene" query > parsers. If this parameter is not explicitly specified as true, query > text will not be split on whitespace before analysis. > > https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html > > > On Sun, 27 Dec, 2020, 8:25 pm nettadalet, < > nsteinberg@ > > wrote: > >> I added "defType=lucene" to both searches to make sure I use the same >> query >> parser, but it didn't change the results. >> >> >> >> -- >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >> -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re:Re: Re:Re: Why do I get different results for the same query with two Solr versions?
SOW default to false? but this seems to be true right?? For Solr 7.5 I get "parsedquery":"+(+(text1:ki7 (+text1:ki +text1:7)))" At 2020-12-28 01:13:29, "Tulsi Das" wrote: >Hi , >Yes this look like related to sow (split on whitespace) param default >behaviour change in solr 7. > >The sow parameter (short for "Split on Whitespace") now defaults to >false, which allows support for multi-word synonyms out of the box. >This parameter is used with the eDismax and standard/"lucene" query >parsers. If this parameter is not explicitly specified as true, query >text will not be split on whitespace before analysis. > >https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html > > >On Sun, 27 Dec, 2020, 8:25 pm nettadalet, wrote: > >> I added "defType=lucene" to both searches to make sure I use the same query >> parser, but it didn't change the results. >> >> >> >> -- >> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >>
Re: Re:Re: Why do I get different results for the same query with two Solr versions?
Hi , Yes this look like related to sow (split on whitespace) param default behaviour change in solr 7. The sow parameter (short for "Split on Whitespace") now defaults to false, which allows support for multi-word synonyms out of the box. This parameter is used with the eDismax and standard/"lucene" query parsers. If this parameter is not explicitly specified as true, query text will not be split on whitespace before analysis. https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html On Sun, 27 Dec, 2020, 8:25 pm nettadalet, wrote: > I added "defType=lucene" to both searches to make sure I use the same query > parser, but it didn't change the results. > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Re: Re:Re: Why do I get different results for the same query with two Solr versions?
I added "defType=lucene" to both searches to make sure I use the same query parser, but it didn't change the results. -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Re:Re: Why do I get different results for the same query with two Solr versions?
I'm not sure how to check the implementation of the query parser, or how to change the query parser that I use. I think I'm using the standard query parser. I use Solr Admin to run the queries. If I look at the URL, I see Solr 4.6: select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t Solr 7.5: select?q=TITLE_ItemCode_t:KI_7&fl=TITLE_ItemCode_t Should I change something? Where should I look? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re:Re: Why do I get different results for the same query with two Solr versions?
which query parser are you using? I think to answer your question, you need to check the implementation of the query parser At 2020-12-27 21:23:59, "nettadalet" wrote: >Thank you, that was helpful! > >For Solr 4.6 I get >"parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")" > >For Solr 7.5 I get >"parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki >+TITLE_ItemCode_t:7)))" > >So this is the cause of the difference in the search result, but I still >don't know why the parsedquery is different between the two versions. >Any idea/guess? >Is it some internal implementation that changed sometime between 4.6 and >7.5? > > > >-- >Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Why do I get different results for the same query with two Solr versions?
Thank you, that was helpful! For Solr 4.6 I get "parsedquery": "PhraseQuery(TITLE_ItemCode_t:\"ki 7\")" For Solr 7.5 I get "parsedquery":"+(+(TITLE_ItemCode_t:ki7 (+TITLE_ItemCode_t:ki +TITLE_ItemCode_t:7)))" So this is the cause of the difference in the search result, but I still don't know why the parsedquery is different between the two versions. Any idea/guess? Is it some internal implementation that changed sometime between 4.6 and 7.5? -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Why do I get different results for the same query with two Solr versions?
Hi, Try adding debug=true or debug=query in the url and see the formed query at the end . You will get to know why the results are different. On Thu, 24 Dec, 2020, 8:05 pm nettadalet, wrote: > Hello, > > I have the the same field type defined in Solr 4.6 and Solr 7.5. When I > search with both versions, I get different results, and I don't know why > > I have the following *field type definition in Solr 4.6*: > positionIncrementGap="1000"> > > > > words="stopwords.txt" /> > generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > > > > > > synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > ignoreCase="true" > words="stopwords.txt" > /> > generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="0"/> > > > > > > I have the following *field type definition in Solr 7.5*: > positionIncrementGap="1000"> > > > > words="stopwords.txt" /> > generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="0"/> > > > > > > > synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > ignoreCase="true" >words="stopwords.txt" >/> > generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="0"/> > > > > > * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of > solr.WordDelimiterGraphFilterFactory so the field types will be more alike, > but the result was the same. > > I have the following *6 values set for field text1 of type text_type1 for 6 > different documents* (the type(s) from above): > KI_d5e7b43a > KI_b7c490bd > KI_7df2f026 > KI_fa7d129d > KI_5867aec7 > KI_7c3c0b93 > > > My query is *text1=KI_7*. > Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93 > Using Solr 7.5, I get all 6 results. > > Questions: > 1. How come I get different results with the same data, when my fields > definitions are the same (as far as I can tell)? > > 2. What are the expected results? > I think that the results Solr 7.5 returns are the correct ones, since at > the > end of the of the analysis I get *KA* as a term and *7* as a term, both > during the indexing analysis and the query analysis, so, to my > understanding, all 6 results should be found. > Is this correct? if not, what am I missing? what don't I understand > correctly? > > I would very much appreciate a full/partial answer, but even a link that > could explain at least the expected results part would be great. > > Thanks in advance, I know this might be a tough one to answer [Hope not > :)] > > > > -- > Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html >
Why do I get different results for the same query with two Solr versions?
Hello, I have the the same field type defined in Solr 4.6 and Solr 7.5. When I search with both versions, I get different results, and I don't know why I have the following *field type definition in Solr 4.6*: I have the following *field type definition in Solr 7.5*: * I tried to use solr.WordDelimiterFilterFactory with Solr 7.5 instead of solr.WordDelimiterGraphFilterFactory so the field types will be more alike, but the result was the same. I have the following *6 values set for field text1 of type text_type1 for 6 different documents* (the type(s) from above): KI_d5e7b43a KI_b7c490bd KI_7df2f026 KI_fa7d129d KI_5867aec7 KI_7c3c0b93 My query is *text1=KI_7*. Using Solr 4.6, I get 2 result - KI_7df2f026, KI_7c3c0b93 Using Solr 7.5, I get all 6 results. Questions: 1. How come I get different results with the same data, when my fields definitions are the same (as far as I can tell)? 2. What are the expected results? I think that the results Solr 7.5 returns are the correct ones, since at the end of the of the analysis I get *KA* as a term and *7* as a term, both during the indexing analysis and the query analysis, so, to my understanding, all 6 results should be found. Is this correct? if not, what am I missing? what don't I understand correctly? I would very much appreciate a full/partial answer, but even a link that could explain at least the expected results part would be great. Thanks in advance, I know this might be a tough one to answer [Hope not :)] -- Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: different results in numFound vs using the cursor
> : I am going to adjust my schema, re-index, and try again. See if that > : doesn't fix this problem. I didn't know that having the uniqueKey be a > : textField was a bad idea. > > > https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey > > "The fieldType of uniqueKey must not be analyzed" > > (hence my comment baout "possible, but hard to get right ... you can use > something like the KeywordTokenizer, but at that point you might as well > use StrField except in some really esoteric special situations) > > Good news. I added a field called ID, and made it string. Then I deleted documents, re-indexed my data, and tried the search again. Now solrResults size and numFound size are exactly the same. Thanks for your help. Rhys
Re: different results in numFound vs using the cursor
: > whoa... that's not normal .. what *exactly* does the fieldType declaration : > (with all analyzers) look like, and what does the declaration : > look like? : > : > : : : NOTE: "text_general" != "text_gen_sort" Assuming your "text_general" declaration looks like it does in the _default config set, then using that for uniqueKey or sorting is definitly not a good idea. If you were *actually* using SortableTextField for your uniqueKeyField ... well, that should be ok to *sort* on, but i still wouldn't suggest using it as a uniqueKey field ... honestly not sure what behavior that might have with things like deleteById, etc... : I am going to adjust my schema, re-index, and try again. See if that : doesn't fix this problem. I didn't know that having the uniqueKey be a : textField was a bad idea. https://lucene.apache.org/solr/guide/8_3/other-schema-elements.html#OtherSchemaElements-UniqueKey "The fieldType of uniqueKey must not be analyzed" (hence my comment baout "possible, but hard to get right ... you can use something like the KeywordTokenizer, but at that point you might as well use StrField except in some really esoteric special situations) -Hoss http://www.lucidworks.com/
Re: different results in numFound vs using the cursor
On Tue, Nov 12, 2019 at 12:18 PM Chris Hostetter wrote: > > : > a) What is the fieldType of the uniqueKey field in use? > : > > : > : It is a textField > > whoa... that's not normal .. what *exactly* does the fieldType declaration > (with all analyzers) look like, and what does the declaration > look like? > > > you should really never use TextField for a uniqueKey ... it's possible, > but incredibly tricky to get "right". > > I am going to adjust my schema, re-index, and try again. See if that doesn't fix this problem. I didn't know that having the uniqueKey be a textField was a bad idea. > Independent from that, "sorting" on a TextField doesn't always do what you > might think (again: depending on the analysis in use) > > With a cursorMark you have other factors to consider: i bet what's > happening is that the post-analysis terms for your docs result it > duplicate values, so the cursorMark is skipping all docs that have hte > same (post analysis) sort value ... this could also manifest itself in > other weird ways, like trying to deleteById. > > Step #1: switch to using a simple StrField for your uniqueKey field and > see if htat solves all your problems. > > Thanks, doing this now. Rhys
Re: different results in numFound vs using the cursor
: > a) What is the fieldType of the uniqueKey field in use? : > : : It is a textField whoa... that's not normal .. what *exactly* does the fieldType declaration (with all analyzers) look like, and what does the declaration look like? you should really never use TextField for a uniqueKey ... it's possible, but incredibly tricky to get "right". Independent from that, "sorting" on a TextField doesn't always do what you might think (again: depending on the analysis in use) With a cursorMark you have other factors to consider: i bet what's happening is that the post-analysis terms for your docs result it duplicate values, so the cursorMark is skipping all docs that have hte same (post analysis) sort value ... this could also manifest itself in other weird ways, like trying to deleteById. Step #1: switch to using a simple StrField for your uniqueKey field and see if htat solves all your problems. -Hoss http://www.lucidworks.com/
Re: different results in numFound vs using the cursor
On Mon, Nov 11, 2019 at 8:32 PM Chris Hostetter wrote: > > Based on the info provided, it's hard to be certain, but reading between > the lines here are hte assumptions i'm making... > > 1) your core name is "dbtr" > 2) the uniqueId field for the "dbtr" core is "debtor_id" > > ..are those assumptions correct? > Yes they are. Sorry I didn't provide that from the beginning. > Two key pieces of information that doesn't seem to be assumable from the > imfo you've provided: > > a) What is the fieldType of the uniqueKey field in use? > It is a textField > b) how are you determining that "The numFound: 35008" > > I do a preliminary query to the solr core and print out the numFound from this: my $solrResponse = $ua->post( $solrURI ); my $decoded = decode_json( $solrResponse->{_content} ); my $numFound = $decoded->{response}{numFound}; > ... > > You show the code that prints out "size of solrResults: 22006" but nothing > in your code ever prints $numFound. there is a snippet of code at the top > I am printing numFound every time it loops. This should remain constant, because it is the total of all documents found. It's not really necessary that I am printing it. The number of docs is the size that I also print, and that is 1000 every time, until the last little bit, and then it is 6 docs found. > of your perl logic that seems disconnected from the rest of the code which > makes me think that before you do anything with a cursor you are already > parsing some *other* query response to get $numFound that way... > > I am running this query first, to get the cursor set: "http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*" This sets the cursor, and then returns a cursorMark that I start using in order to grab 1000 documents at a time. > ...what exactly does all the code *before* this look like? what is the > request that you are using to get that initial '$solrResponse' that you > are parsing to extract '$numFound' are you sure it's exactly the same as > the query whose cursor you are iterating over? > > query from before the loop: "http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=*" query in the loop: http://10.40.10.14:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=AoElMTg1MzE= I do have some logic to make sure i grab the first 1000 from the first query, but other than that, it's a simple loop. > It looks like you are (also) extracting 'my $numFound = > $decoded->{response}{numFound};' on every (cusor) request ... what do you > get if add this to your cursor loop... > >print STDERR "numFound = $numFound at '$cursor'"; > > numFound is always 35008 because that is how many total documents are found. The number of docs in the response is the number that I care about, because that shows me how many came back for this slice. > ...because unless documents are being added/deleted as you iterate over > hte cursor, the numFound value should be consistent on each request. > > numFound is consistently 35008. Thanks Rhys
Re: different results in numFound vs using the cursor
Based on the info provided, it's hard to be certain, but reading between the lines here are hte assumptions i'm making... 1) your core name is "dbtr" 2) the uniqueId field for the "dbtr" core is "debtor_id" ..are those assumptions correct? Two key pieces of information that doesn't seem to be assumable from the imfo you've provided: a) What is the fieldType of the uniqueKey field in use? b) how are you determining that "The numFound: 35008" ... You show the code that prints out "size of solrResults: 22006" but nothing in your code ever prints $numFound. there is a snippet of code at the top of your perl logic that seems disconnected from the rest of the code which makes me think that before you do anything with a cursor you are already parsing some *other* query response to get $numFound that way... : i am using this logic in perl: : : my $decoded = decode_json( $solrResponse->{_content} ); : my $numFound = $decoded->{response}{numFound}; : : $cursor = "*"; : $prevCursor = ''; : : while ( $prevCursor ne $cursor ) : { : my $solrURI = "\"http://[SOLR URL]:8983/solr/"; : $solrURI .= $fdat{core}; ... ...what exactly does all the code *before* this look like? what is the request that you are using to get that initial '$solrResponse' that you are parsing to extract '$numFound' are you sure it's exactly the same as the query whose cursor you are iterating over? It looks like you are (also) extracting 'my $numFound = $decoded->{response}{numFound};' on every (cusor) request ... what do you get if add this to your cursor loop... print STDERR "numFound = $numFound at '$cursor'"; ...because unless documents are being added/deleted as you iterate over hte cursor, the numFound value should be consistent on each request. -Hoss http://www.lucidworks.com/
different results in numFound vs using the cursor
i am using this logic in perl: my $decoded = decode_json( $solrResponse->{_content} ); my $numFound = $decoded->{response}{numFound}; $cursor = "*"; $prevCursor = ''; while ( $prevCursor ne $cursor ) { my $solrURI = "\"http://[SOLR URL]:8983/solr/"; $solrURI .= $fdat{core}; $solrSort = ( $fdat{core} eq 'dbtr' ) ? "debtor_id+asc" : "id+asc"; $solrOptions = "/select?indent=on&rows=$getrows&sort=$solrSort&q="; $solrURI .= $solrOptions; $solrURI .= $query; $solrURI .= ( $prevCursor eq '' ) ? "&cursorMark=*\"": "&cursorMark=$cursor\""; print STDERR "solrURI '$solrURI'\n"; my $solrResponse = $ua->post( $solrURI ); my $decoded = decode_json( $solrResponse->{_content} ); my $numFound = $decoded->{response}{numFound}; foreach my $d ( $decoded->{response}{docs} ) { my @docs = @$d; print STDERR "size of docs '" . scalar( @docs ) . "'\n"; foreach my $r ( @docs ) { if ( $fdat{cust_num} and $fdat{core} eq 'dbtr' ) { push ( @solrResults, $r->{debtor_id} ); } elsif ( $fdat{cust_num} and $fdat{core} eq 'debt' ) { push ( @solrResults, $r->{debt_id} ); } } } $prevCursor = ( $prevCursor eq '' ) ? "*" : $cursor; $cursor = $decoded->{nextCursorMark}; print STDERR "cursor '$cursor'\n"; print STDERR "prevCursor '$prevCursor'\n"; print STDERR "size of solrResults '" . scalar( @solrResults ) . "'\n"; } print out: http://[SOLR URL]:8983/solr/debt/select?indent=on&rows=1000&sort=id+asc&q=debt_id: 608384 OR debt_id: 393291&cursorMark=AoEmMzkzMjkx The numFound: 35008 final size of solrResults: 22006 Am I missing something I should be using with cursorMark? Or is this expected? I've checked my logic, and I'm using the cursors the way this page is using them in examples: https://lucene.apache.org/solr/guide/6_6/pagination-of-results.html Thanks Rhys
Re: Different results due to sharding and problems with interesting terms in MLT
Hi Salman, 1. For 1st one: One suggestion could be, don't create [@, ., -, _, +, #, *] as individual tokens. I guess you need to update your tokenizer in that case. 2. For the second issue, is the score of both the results same? If the score is same and the queries are same then the reason would be Lucene doc ID. I have also observed the same thing in Solr 7.6.0, and my reason for that was, docID for the same doc could be different in both the nodes. so for making the same record order what you can do is, add "id desc" as very last stage of sorting Regards, Lucky Sharma On Sat, 28 Sep, 2019, 8:22 am Salmaan Rashid Syed, < salmaan.ras...@mroads.com> wrote: > Hi Solr Users, > > I have two questions, > > 1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I > need to allow users to search on emails and skills, so I have allowed few > of the special characters such as [@, ., -, _, +, #, *]. I am not using > stemmer as it is removing letter "s" from many of the useful words like > "AngularJS" to "AngularJ". > > Now when I enter a processed text as query into the search bar, I get "." > as the "*most interesting term*" boosted by the highest order usually. I > can't figure out how to remove this from interesting terms without removing > it from the field I am searching in. > > 2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud > mode. I am getting different results for same query. > > I have come to know through reading forums and documentation that this is > happening due to sharding and due to calculation of stats on individual > sharding rather than on entire collection. So I implemented one of the > solutions mentioned in forum/documentations in solrconfig.xml as follows, > > > > It still doesn't works and gives different results for same query. Please > let me know what can be done to avoid these issues. > > Regards, > Salmaan >
Different results due to sharding and problems with interesting terms in MLT
Hi Solr Users, I have two questions, 1) I am working on Solr 7.6 and I have incorporated MLT feature into it. I need to allow users to search on emails and skills, so I have allowed few of the special characters such as [@, ., -, _, +, #, *]. I am not using stemmer as it is removing letter "s" from many of the useful words like "AngularJS" to "AngularJ". Now when I enter a processed text as query into the search bar, I get "." as the "*most interesting term*" boosted by the highest order usually. I can't figure out how to remove this from interesting terms without removing it from the field I am searching in. 2) I have 2 shards per collections on two nodes 8983 and 7574 in cloud mode. I am getting different results for same query. I have come to know through reading forums and documentation that this is happening due to sharding and due to calculation of stats on individual sharding rather than on entire collection. So I implemented one of the solutions mentioned in forum/documentations in solrconfig.xml as follows, It still doesn't works and gives different results for same query. Please let me know what can be done to avoid these issues. Regards, Salmaan
Re:Solr query fetching different results
Your query seems simple enough that this may not be your issue, but just mentioning it: Your collection has 1 shard. Depending on how the query is sent, queries to 1 shard collections can sometimes get interpreted as a "distributed query" and sometimes as a "non-distributed query". These have different code paths that should *in theory* give identical results. When we made some code extensions to Solr in our private plugins, we decided not to support both code paths and so instead we use shortCircuit=false (we sent this in the config of our ) to force use of the distributed query code path. (We want our change to work for both our 60 shard collection and our 1 shard collection.) This gives us more consistent results from different ways of invoking the search. But, again, your query seems too simple for this to be the cause -- why would the distributed vs non-distributed return different results for this?? From: solr-user@lucene.apache.org At: 09/19/19 06:20:30To: solr-user@lucene.apache.org Subject: Solr query fetching different results Hi all, There is something "strange' happening in our Solr cluster. If I execute a query from the server, via solarium client, I get one result. If I execute the same or similar query from admin Panel, I get another result. If I go to Admin Panel - Collections - Select Collection and click "Reload", and then repeat the query, the result I get is consistent with the one I get from the server via solarium client. So I picked the query that is getting executed, from Solr logs. Evidently, the query was going to different nodes. Query that went from Admin Panel, went to node 4 and fetched 0 documents 2019-09-19 05:02:04.549 INFO (qtp434091818-205178) [c:paymetryproducts s:shard1 r:*core_node4* x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request [paymetryproducts_shard1_replica_n2] webapp=/solr path=/select params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0* status=0 QTime=0 Query that went from solarium client running on a server, went to node 3 and fetched 4 documents 2019-09-19 05:06:41.511 INFO (qtp434091818-17) [c:paymetryproducts s:shard1 r:*core_node3* x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request [paymetryproducts_shard1_replica_n1] webapp=/solr path=/select params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=I D&start=0&rows=90&wt=json} *hits=4* status=0 QTime=104 What could be causing this strange behaviour? How can I fix this? SOlr Version - 7.3 Shard count: 1 replicationFactor: 2 maxShardsPerNode: 1 Regards, Jayadevan
Re: Solr query fetching different results
Multiple replicas of the same shard will execute their autocommits at different wall clock times. Thus there may be a _temporary_ time when newly-indexed document is found by a query that happens to get served by replica1 but not by replica2. If you have a timestamp in the doc, and a soft commit interval of, say, 1 minute, you can test whether this is the case by adding &fq=timestamp:[* TO NOW-2MINUTE]. In that case you should see identical returns. Best, Erick On Thu, Sep 19, 2019 at 1:20 AM Jayadevan Maymala wrote: > > Hi all, > > There is something "strange' happening in our Solr cluster. If I execute a > query from the server, via solarium client, I get one result. If I execute > the same or similar query from admin Panel, I get another result. If I go > to Admin Panel - Collections - Select Collection and click "Reload", and > then repeat the query, the result I get is consistent with the one I get > from the server via solarium client. So I picked the query that is getting > executed, from Solr logs. Evidently, the query was going to different nodes. > > Query that went from Admin Panel, went to node 4 and fetched 0 documents > 2019-09-19 05:02:04.549 INFO (qtp434091818-205178) > [c:paymetryproducts s:shard1 r:*core_node4* > x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request > [paymetryproducts_shard1_replica_n2] webapp=/solr path=/select > params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0* > status=0 QTime=0 > > > Query that went from solarium client running on a server, went to node 3 > and fetched 4 documents > > 2019-09-19 05:06:41.511 INFO (qtp434091818-17) > [c:paymetryproducts s:shard1 r:*core_node3* > x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request > [paymetryproducts_shard1_replica_n1] webapp=/solr path=/select > params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=ID&start=0&rows=90&wt=json} > *hits=4* status=0 QTime=104 > > What could be causing this strange behaviour? How can I fix this? > SOlr Version - 7.3 > Shard count: 1 > replicationFactor: 2 > maxShardsPerNode: 1 > > Regards, > Jayadevan
Solr query fetching different results
Hi all, There is something "strange' happening in our Solr cluster. If I execute a query from the server, via solarium client, I get one result. If I execute the same or similar query from admin Panel, I get another result. If I go to Admin Panel - Collections - Select Collection and click "Reload", and then repeat the query, the result I get is consistent with the one I get from the server via solarium client. So I picked the query that is getting executed, from Solr logs. Evidently, the query was going to different nodes. Query that went from Admin Panel, went to node 4 and fetched 0 documents 2019-09-19 05:02:04.549 INFO (qtp434091818-205178) [c:paymetryproducts s:shard1 r:*core_node4* x:paymetryproducts_shard1_replica_n2] o.a.s.c.S.Request [paymetryproducts_shard1_replica_n2] webapp=/solr path=/select params={q=category_id:5a0aeaeea6bc7239cc21ee39&_=1568868718031} *hits=0* status=0 QTime=0 Query that went from solarium client running on a server, went to node 3 and fetched 4 documents 2019-09-19 05:06:41.511 INFO (qtp434091818-17) [c:paymetryproducts s:shard1 r:*core_node3* x:paymetryproducts_shard1_replica_n1] o.a.s.c.S.Request [paymetryproducts_shard1_replica_n1] webapp=/solr path=/select params={q=category_id:5a0aeaeea6bc7239cc21ee39&json.nl=flat&omitHeader=true&fl=ID&start=0&rows=90&wt=json} *hits=4* status=0 QTime=104 What could be causing this strange behaviour? How can I fix this? SOlr Version - 7.3 Shard count: 1 replicationFactor: 2 maxShardsPerNode: 1 Regards, Jayadevan
Re: Consecutive calls to a query give different results
Here's Mike McCandless' blog on the topic: https://www.elastic.co/blog/lucenes-handling-of-deleted-documents The same options he mentions are available in Solr as both use Lucene under the covers. The long and short of it is that you can have a significant amount of deleted documents in your index, depending on the update pattern. One thing Mike doesn't mention is at the root of why I'm so negative about optimize (and forceMerge is just an optimize that only mashes segments together if they have > X% deleted docs). Let's say your max segment size is 5G. And you optimize an index down to a single 100G segment. That segment will _not_ be merged until it has < 2.5G live docs. That's not a typo. 97.5% deleted docs.. You could ameliorate this somewhat by specifying the number of segments after optimizing (default is 1). Say you determine that you have 100G of live data, specify 20 segments for optimize. This would be better I'd guess, but haven't tested personally. Best, Erick On Fri, Sep 8, 2017 at 10:36 AM, Webster Homer wrote: > Thank you, Erick Erickson and Shawn Heisey for your excellent answers. > For some of our collections, it would seem that an occasional optimize > would be a good thing. However we have some collections that are updated > constantly > > Would using the commit expungeDeletes help mitigate the issue? > > I also came across a discussion of Lucene merge policies. and the > TieredMergePolicy. > Is there documentation about this? I notice that a couple of our replicas > in some of our collections have ~30% deleted documents which I would think > would contribute to the problem. > I have at least 3 collections that are updated constantly, and would not > lend themselves to being optimized what is the best approach for these? > > Thanks > > On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey wrote: > >> On 9/7/2017 8:54 AM, Webster Homer wrote: >> > I am not concerned about deleted documents. I am concerned that the same >> > search gives different results after each search. The top document seems >> to >> > cycle between 3 different documents >> > >> > I have an enhanced collections info api call that calls the core admin >> api >> > to get the index information for the replica. >> > When I said the numdocs were the same I meant exactly that. maxdocs and >> > deleted documents are not the same for the replicas, but the number of >> > numdocs is. >> > >> > Or are you saying that the search is looking at deleted documents >> wouldn't >> > that be a very significant bug? >> >> Lucene score calculations take a lot of information in the index into >> account when calculating the score. That includes deleted documents, >> because they are part of the index. When you delete a document, Lucene >> just makes a note saying "internal document ID number is deleted." >> The actual information for that document is not removed from the index, >> because doing so could take a very long time. >> >> When you make queries against a replicated SolrCloud, the queries are >> load balanced across the entire cloud, so different queries will hit >> different replicas. With different numbers of deleted documents in >> different replicas (which is not unusual), the scores are going to come >> out a little bit different on each query. If you're sorting by score >> (which is the default sort), that *can* affect the order. Your replicas >> have a fairly high percentage of deleted documents, so there is a lot of >> extra information affecting the scores. The relative difference in the >> deleted document count between the replicas is high as well, so multiple >> queries could be substantially different. >> >> It is not a bug that Lucene and Solr look at deleted documents. >> Removing deleted document information from things like the score >> calculation would be VERY computationally intense, bordering on the >> impossible. To assure good performance, Lucene doesn't even try. >> Because the way Lucene tracks deleted documents is with a list of >> internal Lucene document IDs, those documents are easily removed from >> *results*, but their contents are an integral part of the index and that >> information can only be truly removed by completely rewriting (merging) >> the index segments. >> >> You can get rid of all deleted documents with an optimize operation, >> which is a forced merge of the entire index down to one segment -- but >> just like it sounds, that is a complete rewrite of the index. It >> involves a huge amount of CPU resources and disk I
Re: Consecutive calls to a query give different results
Thank you, Erick Erickson and Shawn Heisey for your excellent answers. For some of our collections, it would seem that an occasional optimize would be a good thing. However we have some collections that are updated constantly Would using the commit expungeDeletes help mitigate the issue? I also came across a discussion of Lucene merge policies. and the TieredMergePolicy. Is there documentation about this? I notice that a couple of our replicas in some of our collections have ~30% deleted documents which I would think would contribute to the problem. I have at least 3 collections that are updated constantly, and would not lend themselves to being optimized what is the best approach for these? Thanks On Fri, Sep 8, 2017 at 9:47 AM, Shawn Heisey wrote: > On 9/7/2017 8:54 AM, Webster Homer wrote: > > I am not concerned about deleted documents. I am concerned that the same > > search gives different results after each search. The top document seems > to > > cycle between 3 different documents > > > > I have an enhanced collections info api call that calls the core admin > api > > to get the index information for the replica. > > When I said the numdocs were the same I meant exactly that. maxdocs and > > deleted documents are not the same for the replicas, but the number of > > numdocs is. > > > > Or are you saying that the search is looking at deleted documents > wouldn't > > that be a very significant bug? > > Lucene score calculations take a lot of information in the index into > account when calculating the score. That includes deleted documents, > because they are part of the index. When you delete a document, Lucene > just makes a note saying "internal document ID number is deleted." > The actual information for that document is not removed from the index, > because doing so could take a very long time. > > When you make queries against a replicated SolrCloud, the queries are > load balanced across the entire cloud, so different queries will hit > different replicas. With different numbers of deleted documents in > different replicas (which is not unusual), the scores are going to come > out a little bit different on each query. If you're sorting by score > (which is the default sort), that *can* affect the order. Your replicas > have a fairly high percentage of deleted documents, so there is a lot of > extra information affecting the scores. The relative difference in the > deleted document count between the replicas is high as well, so multiple > queries could be substantially different. > > It is not a bug that Lucene and Solr look at deleted documents. > Removing deleted document information from things like the score > calculation would be VERY computationally intense, bordering on the > impossible. To assure good performance, Lucene doesn't even try. > Because the way Lucene tracks deleted documents is with a list of > internal Lucene document IDs, those documents are easily removed from > *results*, but their contents are an integral part of the index and that > information can only be truly removed by completely rewriting (merging) > the index segments. > > You can get rid of all deleted documents with an optimize operation, > which is a forced merge of the entire index down to one segment -- but > just like it sounds, that is a complete rewrite of the index. It > involves a huge amount of CPU resources and disk I/O, and can severely > impact normal indexing and query operations while it's happening. If > the collection is extremely large, an optimize could take hours. For > indexes that change rapidly, optimize is strongly discouraged, except as > an occasional "clean things up" operation, run during non-peak times. > > Thanks, > Shawn > > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Consecutive calls to a query give different results
On 9/7/2017 8:54 AM, Webster Homer wrote: > I am not concerned about deleted documents. I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents > > I have an enhanced collections info api call that calls the core admin api > to get the index information for the replica. > When I said the numdocs were the same I meant exactly that. maxdocs and > deleted documents are not the same for the replicas, but the number of > numdocs is. > > Or are you saying that the search is looking at deleted documents wouldn't > that be a very significant bug? Lucene score calculations take a lot of information in the index into account when calculating the score. That includes deleted documents, because they are part of the index. When you delete a document, Lucene just makes a note saying "internal document ID number is deleted." The actual information for that document is not removed from the index, because doing so could take a very long time. When you make queries against a replicated SolrCloud, the queries are load balanced across the entire cloud, so different queries will hit different replicas. With different numbers of deleted documents in different replicas (which is not unusual), the scores are going to come out a little bit different on each query. If you're sorting by score (which is the default sort), that *can* affect the order. Your replicas have a fairly high percentage of deleted documents, so there is a lot of extra information affecting the scores. The relative difference in the deleted document count between the replicas is high as well, so multiple queries could be substantially different. It is not a bug that Lucene and Solr look at deleted documents. Removing deleted document information from things like the score calculation would be VERY computationally intense, bordering on the impossible. To assure good performance, Lucene doesn't even try. Because the way Lucene tracks deleted documents is with a list of internal Lucene document IDs, those documents are easily removed from *results*, but their contents are an integral part of the index and that information can only be truly removed by completely rewriting (merging) the index segments. You can get rid of all deleted documents with an optimize operation, which is a forced merge of the entire index down to one segment -- but just like it sounds, that is a complete rewrite of the index. It involves a huge amount of CPU resources and disk I/O, and can severely impact normal indexing and query operations while it's happening. If the collection is extremely large, an optimize could take hours. For indexes that change rapidly, optimize is strongly discouraged, except as an occasional "clean things up" operation, run during non-peak times. Thanks, Shawn
Re: Consecutive calls to a query give different results
We have several cloud collections, but this one is updated once a day with a partial load, and once a week with a full load, followed by a delete which is based upon an index_date field (timestamp of the solr record). For this and related collections optimizing once per day is probably acceptable. We do have other collections that are updated every 15 minutes, I don't think those would be able to be optimized from what you write. On Thu, Sep 7, 2017 at 5:10 PM, Erick Erickson wrote: > bq: So apparently it IS essential to run optimize after a data load > > Don't do this if you can avoid it, you run the risk of excessive > amounts of your index consisting of deleted documents unless you are > following a process whereby you periodically (and I'm talking at least > hours, if not once per day) index data then don't change the index for > a bunch more hours. > > You're missing the point when it comes to deleted docs. Different > replicas of the _same_ shard commit at different wall clock times due > to network delays. Therefore, which segments are merged will not be > identical between replicas when a commit happens, since commits are > local. > > So replica1 may merge segments 1, 3, 6 in to segment 7 > replica2 may merge segments 1, 2, 4 into segment 7 > > Here's the key: Now replica1 may have 100 deleted documents (ones > marked as deleted but still in segments 2, 4 and 5 > replica2 may have 90 deleted > documents (the ones still in segments 3, 5 and 6) > > The statistics in the term frequency and document frequency for some > terms are _not_ the same. Therefore the scoring will be slightly > different. Therefore, depending on which replica serves the query, the > order of docs may be somewhat different if the scores are close. > > optimizing squeezes all the deleted documents out of all the replicas > so the scores become identical. > > This doesn't happen, of course, if you have only one replica. > > Best, > Erick > > On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer > wrote: > > We have several solr clouds, a couple of them have only 1 replica per > > shard. We have never observed the problem when we have a single replica > > only when there are multiple replicas per shard. > > > > On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer > > wrote: > > > >> the scores are not the same > >> Doc > >> 305340 432.44238 > >> C2646 428.24185 > >> 12837 430.61722 > >> > >> One other thing. I just ran optimize and now document 305340 is > >> consistently the top score. > >> So apparently it IS essential to run optimize after a data load > >> > >> Note we see this behavior fairly commonly on our solr cloud instances. > >> This was not the first time. This particular situation was on a > development > >> system > >> > >> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer > >> wrote: > >> > >>> the scores are not the same > >>> Doc > >>> 305340 432.44238 > >>> > >>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < > >>> hastings.recurs...@gmail.com> wrote: > >>> > >>>> "I am concerned that the same > >>>> search gives different results after each search. The top document > seems > >>>> to > >>>> cycle between 3 different documents" > >>>> > >>>> > >>>> if you do debug query on the search, are the scores for the top 3 > >>>> documents > >>>> the same or not? you can easily have three documents with the same > >>>> score, > >>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can > >>>> expect > >>>> 1-1-1 to rotate based on whatever. use a second element like id to > your > >>>> ranking perhaps. > >>>> > >>>> > >>>> > >>>> > >>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer < > webster.ho...@sial.com> > >>>> wrote: > >>>> > >>>> > I am not concerned about deleted documents. I am concerned that the > >>>> same > >>>> > search gives different results after each search. The top document > >>>> seems to > >>>> > cycle between 3 different documents > >>>> > > >>>> > I have an enhanced collections info api call that calls the core > admin > >>>> api > >>>
Re: Consecutive calls to a query give different results
bq: So apparently it IS essential to run optimize after a data load Don't do this if you can avoid it, you run the risk of excessive amounts of your index consisting of deleted documents unless you are following a process whereby you periodically (and I'm talking at least hours, if not once per day) index data then don't change the index for a bunch more hours. You're missing the point when it comes to deleted docs. Different replicas of the _same_ shard commit at different wall clock times due to network delays. Therefore, which segments are merged will not be identical between replicas when a commit happens, since commits are local. So replica1 may merge segments 1, 3, 6 in to segment 7 replica2 may merge segments 1, 2, 4 into segment 7 Here's the key: Now replica1 may have 100 deleted documents (ones marked as deleted but still in segments 2, 4 and 5 replica2 may have 90 deleted documents (the ones still in segments 3, 5 and 6) The statistics in the term frequency and document frequency for some terms are _not_ the same. Therefore the scoring will be slightly different. Therefore, depending on which replica serves the query, the order of docs may be somewhat different if the scores are close. optimizing squeezes all the deleted documents out of all the replicas so the scores become identical. This doesn't happen, of course, if you have only one replica. Best, Erick On Thu, Sep 7, 2017 at 8:13 AM, Webster Homer wrote: > We have several solr clouds, a couple of them have only 1 replica per > shard. We have never observed the problem when we have a single replica > only when there are multiple replicas per shard. > > On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer > wrote: > >> the scores are not the same >> Doc >> 305340 432.44238 >> C2646 428.24185 >> 12837 430.61722 >> >> One other thing. I just ran optimize and now document 305340 is >> consistently the top score. >> So apparently it IS essential to run optimize after a data load >> >> Note we see this behavior fairly commonly on our solr cloud instances. >> This was not the first time. This particular situation was on a development >> system >> >> On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer >> wrote: >> >>> the scores are not the same >>> Doc >>> 305340 432.44238 >>> >>> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < >>> hastings.recurs...@gmail.com> wrote: >>> >>>> "I am concerned that the same >>>> search gives different results after each search. The top document seems >>>> to >>>> cycle between 3 different documents" >>>> >>>> >>>> if you do debug query on the search, are the scores for the top 3 >>>> documents >>>> the same or not? you can easily have three documents with the same >>>> score, >>>> so when you have a result set that is ranked 1-1-1-2-3-4 you can >>>> expect >>>> 1-1-1 to rotate based on whatever. use a second element like id to your >>>> ranking perhaps. >>>> >>>> >>>> >>>> >>>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer >>>> wrote: >>>> >>>> > I am not concerned about deleted documents. I am concerned that the >>>> same >>>> > search gives different results after each search. The top document >>>> seems to >>>> > cycle between 3 different documents >>>> > >>>> > I have an enhanced collections info api call that calls the core admin >>>> api >>>> > to get the index information for the replica. >>>> > When I said the numdocs were the same I meant exactly that. maxdocs and >>>> > deleted documents are not the same for the replicas, but the number of >>>> > numdocs is. >>>> > >>>> > Or are you saying that the search is looking at deleted documents >>>> wouldn't >>>> > that be a very significant bug? >>>> > >>>> > The four replicas: >>>> > shard1 >>>> > core_node1 >>>> > "numDocs": 383817, >>>> > "maxDocs": 611592, >>>> > "deletedDocs": 227775, >>>> > "size": "2.49 GB", >>>> > "lastModified": "2017-09-07T08:18:03.639Z", >>>> > "current": true, >>>> > "version": 35644, >>>> > "segmentCount&quo
Re: Consecutive calls to a query give different results
We have several solr clouds, a couple of them have only 1 replica per shard. We have never observed the problem when we have a single replica only when there are multiple replicas per shard. On Thu, Sep 7, 2017 at 10:08 AM, Webster Homer wrote: > the scores are not the same > Doc > 305340 432.44238 > C2646 428.24185 > 12837 430.61722 > > One other thing. I just ran optimize and now document 305340 is > consistently the top score. > So apparently it IS essential to run optimize after a data load > > Note we see this behavior fairly commonly on our solr cloud instances. > This was not the first time. This particular situation was on a development > system > > On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer > wrote: > >> the scores are not the same >> Doc >> 305340 432.44238 >> >> On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < >> hastings.recurs...@gmail.com> wrote: >> >>> "I am concerned that the same >>> search gives different results after each search. The top document seems >>> to >>> cycle between 3 different documents" >>> >>> >>> if you do debug query on the search, are the scores for the top 3 >>> documents >>> the same or not? you can easily have three documents with the same >>> score, >>> so when you have a result set that is ranked 1-1-1-2-3-4 you can >>> expect >>> 1-1-1 to rotate based on whatever. use a second element like id to your >>> ranking perhaps. >>> >>> >>> >>> >>> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer >>> wrote: >>> >>> > I am not concerned about deleted documents. I am concerned that the >>> same >>> > search gives different results after each search. The top document >>> seems to >>> > cycle between 3 different documents >>> > >>> > I have an enhanced collections info api call that calls the core admin >>> api >>> > to get the index information for the replica. >>> > When I said the numdocs were the same I meant exactly that. maxdocs and >>> > deleted documents are not the same for the replicas, but the number of >>> > numdocs is. >>> > >>> > Or are you saying that the search is looking at deleted documents >>> wouldn't >>> > that be a very significant bug? >>> > >>> > The four replicas: >>> > shard1 >>> > core_node1 >>> > "numDocs": 383817, >>> > "maxDocs": 611592, >>> > "deletedDocs": 227775, >>> > "size": "2.49 GB", >>> > "lastModified": "2017-09-07T08:18:03.639Z", >>> > "current": true, >>> > "version": 35644, >>> > "segmentCount": 28 >>> > >>> > core_node3 >>> > "numDocs": 383817, >>> > "maxDocs": 571737, >>> > "deletedDocs": 187920, >>> > "size": "2.85 GB", >>> > "lastModified": "2017-09-07T08:18:03.634Z", >>> > "current": false, >>> > "version": 35562, >>> > "segmentCount": 36 >>> > shard2 >>> > core_node2 >>> > "numDocs": 385326, >>> > "maxDocs": 529214, >>> > "deletedDocs": 143888, >>> > "size": "2.13 GB", >>> > "lastModified": "2017-09-07T08:18:03.632Z", >>> > "current": true, >>> > "version": 34783, >>> > "segmentCount": 24 >>> > core_node4 >>> > "numDocs": 385326, >>> > "maxDocs": 488201, >>> > "deletedDocs": 102875, >>> > "size": "1.96 GB", >>> > "lastModified": "2017-09-07T08:18:03.633Z", >>> > "current": true, >>> > "version": 34932, >>> > "segmentCount": 21 >>> > >>> > >>> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley >>> wrote: >>> > >>> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson < >>> erickerick...@gmail.com >>> > > >>> > > wrote: >>> > > > bq: and deleted documents are irrelevant to term statistics... >>> > > > >>> &
Re: Consecutive calls to a query give different results
the scores are not the same Doc 305340 432.44238 C2646 428.24185 12837 430.61722 One other thing. I just ran optimize and now document 305340 is consistently the top score. So apparently it IS essential to run optimize after a data load Note we see this behavior fairly commonly on our solr cloud instances. This was not the first time. This particular situation was on a development system On Thu, Sep 7, 2017 at 10:04 AM, Webster Homer wrote: > the scores are not the same > Doc > 305340 432.44238 > > On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < > hastings.recurs...@gmail.com> wrote: > >> "I am concerned that the same >> search gives different results after each search. The top document seems >> to >> cycle between 3 different documents" >> >> >> if you do debug query on the search, are the scores for the top 3 >> documents >> the same or not? you can easily have three documents with the same score, >> so when you have a result set that is ranked 1-1-1-2-3-4 you can >> expect >> 1-1-1 to rotate based on whatever. use a second element like id to your >> ranking perhaps. >> >> >> >> >> On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer >> wrote: >> >> > I am not concerned about deleted documents. I am concerned that the same >> > search gives different results after each search. The top document >> seems to >> > cycle between 3 different documents >> > >> > I have an enhanced collections info api call that calls the core admin >> api >> > to get the index information for the replica. >> > When I said the numdocs were the same I meant exactly that. maxdocs and >> > deleted documents are not the same for the replicas, but the number of >> > numdocs is. >> > >> > Or are you saying that the search is looking at deleted documents >> wouldn't >> > that be a very significant bug? >> > >> > The four replicas: >> > shard1 >> > core_node1 >> > "numDocs": 383817, >> > "maxDocs": 611592, >> > "deletedDocs": 227775, >> > "size": "2.49 GB", >> > "lastModified": "2017-09-07T08:18:03.639Z", >> > "current": true, >> > "version": 35644, >> > "segmentCount": 28 >> > >> > core_node3 >> > "numDocs": 383817, >> > "maxDocs": 571737, >> > "deletedDocs": 187920, >> > "size": "2.85 GB", >> > "lastModified": "2017-09-07T08:18:03.634Z", >> > "current": false, >> > "version": 35562, >> > "segmentCount": 36 >> > shard2 >> > core_node2 >> > "numDocs": 385326, >> > "maxDocs": 529214, >> > "deletedDocs": 143888, >> > "size": "2.13 GB", >> > "lastModified": "2017-09-07T08:18:03.632Z", >> > "current": true, >> > "version": 34783, >> > "segmentCount": 24 >> > core_node4 >> > "numDocs": 385326, >> > "maxDocs": 488201, >> > "deletedDocs": 102875, >> > "size": "1.96 GB", >> > "lastModified": "2017-09-07T08:18:03.633Z", >> > "current": true, >> > "version": 34932, >> > "segmentCount": 21 >> > >> > >> > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley wrote: >> > >> > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson < >> erickerick...@gmail.com >> > > >> > > wrote: >> > > > bq: and deleted documents are irrelevant to term statistics... >> > > > >> > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? >> > > >> > > One can make it work either way ;-) >> > > Whether a document is marked as deleted or not has no effect on term >> > > statistics (i.e. irrelevant) >> > > OR documents marked for deletion still count in term statistics (i.e. >> > > relevant) >> > > >> > > I guess I used the former because we don't go out of our way to still >> > > include deleted documents... it's just a side effect of the index >> > > structure that we don't (and can't easily) update statistics when a >> > > docume
Re: Consecutive calls to a query give different results
the scores are not the same Doc 305340 432.44238 On Thu, Sep 7, 2017 at 10:02 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > "I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents" > > > if you do debug query on the search, are the scores for the top 3 documents > the same or not? you can easily have three documents with the same score, > so when you have a result set that is ranked 1-1-1-2-3-4 you can expect > 1-1-1 to rotate based on whatever. use a second element like id to your > ranking perhaps. > > > > > On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer > wrote: > > > I am not concerned about deleted documents. I am concerned that the same > > search gives different results after each search. The top document seems > to > > cycle between 3 different documents > > > > I have an enhanced collections info api call that calls the core admin > api > > to get the index information for the replica. > > When I said the numdocs were the same I meant exactly that. maxdocs and > > deleted documents are not the same for the replicas, but the number of > > numdocs is. > > > > Or are you saying that the search is looking at deleted documents > wouldn't > > that be a very significant bug? > > > > The four replicas: > > shard1 > > core_node1 > > "numDocs": 383817, > > "maxDocs": 611592, > > "deletedDocs": 227775, > > "size": "2.49 GB", > > "lastModified": "2017-09-07T08:18:03.639Z", > > "current": true, > > "version": 35644, > > "segmentCount": 28 > > > > core_node3 > > "numDocs": 383817, > > "maxDocs": 571737, > > "deletedDocs": 187920, > > "size": "2.85 GB", > > "lastModified": "2017-09-07T08:18:03.634Z", > > "current": false, > > "version": 35562, > > "segmentCount": 36 > > shard2 > > core_node2 > > "numDocs": 385326, > > "maxDocs": 529214, > > "deletedDocs": 143888, > > "size": "2.13 GB", > > "lastModified": "2017-09-07T08:18:03.632Z", > > "current": true, > > "version": 34783, > > "segmentCount": 24 > > core_node4 > > "numDocs": 385326, > > "maxDocs": 488201, > > "deletedDocs": 102875, > > "size": "1.96 GB", > > "lastModified": "2017-09-07T08:18:03.633Z", > > "current": true, > > "version": 34932, > > "segmentCount": 21 > > > > > > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley wrote: > > > > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson < > erickerick...@gmail.com > > > > > > wrote: > > > > bq: and deleted documents are irrelevant to term statistics... > > > > > > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? > > > > > > One can make it work either way ;-) > > > Whether a document is marked as deleted or not has no effect on term > > > statistics (i.e. irrelevant) > > > OR documents marked for deletion still count in term statistics (i.e. > > > relevant) > > > > > > I guess I used the former because we don't go out of our way to still > > > include deleted documents... it's just a side effect of the index > > > structure that we don't (and can't easily) update statistics when a > > > document is marked as deleted. > > > > > > -Yonik > > > > > > > > > > Erick > > > > > > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley > > wrote: > > > >> Different replicas of the same shard can have different numbers of > > > >> deleted documents (really just marked as deleted), and deleted > > > >> documents are irrelevant to term statistics (like the number of > > > >> documents a term appears in). Documents marked for deletion stop > > > >> contributing to corpus statistics when they are actually removed > (via > > > >> expunge deletes, merges, optimizes). > > > >> -Yonik > > > >> > > > >> > > > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer < >
Re: Consecutive calls to a query give different results
"I am concerned that the same search gives different results after each search. The top document seems to cycle between 3 different documents" if you do debug query on the search, are the scores for the top 3 documents the same or not? you can easily have three documents with the same score, so when you have a result set that is ranked 1-1-1-2-3-4 you can expect 1-1-1 to rotate based on whatever. use a second element like id to your ranking perhaps. On Thu, Sep 7, 2017 at 10:54 AM, Webster Homer wrote: > I am not concerned about deleted documents. I am concerned that the same > search gives different results after each search. The top document seems to > cycle between 3 different documents > > I have an enhanced collections info api call that calls the core admin api > to get the index information for the replica. > When I said the numdocs were the same I meant exactly that. maxdocs and > deleted documents are not the same for the replicas, but the number of > numdocs is. > > Or are you saying that the search is looking at deleted documents wouldn't > that be a very significant bug? > > The four replicas: > shard1 > core_node1 > "numDocs": 383817, > "maxDocs": 611592, > "deletedDocs": 227775, > "size": "2.49 GB", > "lastModified": "2017-09-07T08:18:03.639Z", > "current": true, > "version": 35644, > "segmentCount": 28 > > core_node3 > "numDocs": 383817, > "maxDocs": 571737, > "deletedDocs": 187920, > "size": "2.85 GB", > "lastModified": "2017-09-07T08:18:03.634Z", > "current": false, > "version": 35562, > "segmentCount": 36 > shard2 > core_node2 > "numDocs": 385326, > "maxDocs": 529214, > "deletedDocs": 143888, > "size": "2.13 GB", > "lastModified": "2017-09-07T08:18:03.632Z", > "current": true, > "version": 34783, > "segmentCount": 24 > core_node4 > "numDocs": 385326, > "maxDocs": 488201, > "deletedDocs": 102875, > "size": "1.96 GB", > "lastModified": "2017-09-07T08:18:03.633Z", > "current": true, > "version": 34932, > "segmentCount": 21 > > > On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley wrote: > > > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson > > > wrote: > > > bq: and deleted documents are irrelevant to term statistics... > > > > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? > > > > One can make it work either way ;-) > > Whether a document is marked as deleted or not has no effect on term > > statistics (i.e. irrelevant) > > OR documents marked for deletion still count in term statistics (i.e. > > relevant) > > > > I guess I used the former because we don't go out of our way to still > > include deleted documents... it's just a side effect of the index > > structure that we don't (and can't easily) update statistics when a > > document is marked as deleted. > > > > -Yonik > > > > > > > Erick > > > > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley > wrote: > > >> Different replicas of the same shard can have different numbers of > > >> deleted documents (really just marked as deleted), and deleted > > >> documents are irrelevant to term statistics (like the number of > > >> documents a term appears in). Documents marked for deletion stop > > >> contributing to corpus statistics when they are actually removed (via > > >> expunge deletes, merges, optimizes). > > >> -Yonik > > >> > > >> > > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer > > > wrote: > > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 > > >>> replicas (total of 4 nodes). > > >>> > > >>> If I run the query multiple times I see the three different top > scoring > > >>> results. > > >>> No data load is running, all data has been commited > > >>> > > >>> I get these three different hits with their scores: > > >>> copperiinitratehemipentahydrate2325919004194430.61722 > > >>> copperiinitrateoncelite1234598765 > > 432.44238 > > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.
Re: Consecutive calls to a query give different results
I am not concerned about deleted documents. I am concerned that the same search gives different results after each search. The top document seems to cycle between 3 different documents I have an enhanced collections info api call that calls the core admin api to get the index information for the replica. When I said the numdocs were the same I meant exactly that. maxdocs and deleted documents are not the same for the replicas, but the number of numdocs is. Or are you saying that the search is looking at deleted documents wouldn't that be a very significant bug? The four replicas: shard1 core_node1 "numDocs": 383817, "maxDocs": 611592, "deletedDocs": 227775, "size": "2.49 GB", "lastModified": "2017-09-07T08:18:03.639Z", "current": true, "version": 35644, "segmentCount": 28 core_node3 "numDocs": 383817, "maxDocs": 571737, "deletedDocs": 187920, "size": "2.85 GB", "lastModified": "2017-09-07T08:18:03.634Z", "current": false, "version": 35562, "segmentCount": 36 shard2 core_node2 "numDocs": 385326, "maxDocs": 529214, "deletedDocs": 143888, "size": "2.13 GB", "lastModified": "2017-09-07T08:18:03.632Z", "current": true, "version": 34783, "segmentCount": 24 core_node4 "numDocs": 385326, "maxDocs": 488201, "deletedDocs": 102875, "size": "1.96 GB", "lastModified": "2017-09-07T08:18:03.633Z", "current": true, "version": 34932, "segmentCount": 21 On Thu, Sep 7, 2017 at 7:58 AM, Yonik Seeley wrote: > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson > wrote: > > bq: and deleted documents are irrelevant to term statistics... > > > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? > > One can make it work either way ;-) > Whether a document is marked as deleted or not has no effect on term > statistics (i.e. irrelevant) > OR documents marked for deletion still count in term statistics (i.e. > relevant) > > I guess I used the former because we don't go out of our way to still > include deleted documents... it's just a side effect of the index > structure that we don't (and can't easily) update statistics when a > document is marked as deleted. > > -Yonik > > > > Erick > > > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: > >> Different replicas of the same shard can have different numbers of > >> deleted documents (really just marked as deleted), and deleted > >> documents are irrelevant to term statistics (like the number of > >> documents a term appears in). Documents marked for deletion stop > >> contributing to corpus statistics when they are actually removed (via > >> expunge deletes, merges, optimizes). > >> -Yonik > >> > >> > >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer > wrote: > >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 > >>> replicas (total of 4 nodes). > >>> > >>> If I run the query multiple times I see the three different top scoring > >>> results. > >>> No data load is running, all data has been commited > >>> > >>> I get these three different hits with their scores: > >>> copperiinitratehemipentahydrate2325919004194430.61722 > >>> copperiinitrateoncelite1234598765 > 432.44238 > >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185 > >>> > >>> How is it that the same search against the same data can give different > >>> responses? > >>> I looked at the specific cores they look OK the numdocs for the > replicas in > >>> a shard match > >>> > >>> This is the query: > >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial- > catalog-product/select?defType=edismax&fl=searchmv_ > en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,% > 20search_en_p_pri_name,%20search_pno%20[explain% > 20style=nl]&group.field=id_s&group.limit=30&group=true& > group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op= > OR&q=copper%20nitrate&qf=search_pid > >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400% > 20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr% > 20searchmv_p_skus_genr%20searchmv_user_term^200% > 20search_lform^190%20searchmv_en_acronym^180%20search_en_ > root_name^170
Re: Consecutive calls to a query give different results
Whew! I haven't been lying to people for _years_.. On Thu, Sep 7, 2017 at 5:58 AM, Yonik Seeley wrote: > On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson > wrote: >> bq: and deleted documents are irrelevant to term statistics... >> >> Did you mean "relevant"? Or do I have to adjust my thinking _again_? > > One can make it work either way ;-) > Whether a document is marked as deleted or not has no effect on term > statistics (i.e. irrelevant) > OR documents marked for deletion still count in term statistics (i.e. > relevant) > > I guess I used the former because we don't go out of our way to still > include deleted documents... it's just a side effect of the index > structure that we don't (and can't easily) update statistics when a > document is marked as deleted. > > -Yonik > > >> Erick >> >> On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: >>> Different replicas of the same shard can have different numbers of >>> deleted documents (really just marked as deleted), and deleted >>> documents are irrelevant to term statistics (like the number of >>> documents a term appears in). Documents marked for deletion stop >>> contributing to corpus statistics when they are actually removed (via >>> expunge deletes, merges, optimizes). >>> -Yonik >>> >>> >>> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer >>> wrote: I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 replicas (total of 4 nodes). If I run the query multiple times I see the three different top scoring results. No data load is running, all data has been commited I get these three different hits with their scores: copperiinitratehemipentahydrate2325919004194430.61722 copperiinitrateoncelite1234598765 432.44238 copperiinitratehydrate18756anhydrousbasis13778319 428.24185 How is it that the same search against the same data can give different responses? I looked at the specific cores they look OK the numdocs for the replicas in a shard match This is the query: http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Re: Consecutive calls to a query give different results
On Thu, Sep 7, 2017 at 12:47 AM, Erick Erickson wrote: > bq: and deleted documents are irrelevant to term statistics... > > Did you mean "relevant"? Or do I have to adjust my thinking _again_? One can make it work either way ;-) Whether a document is marked as deleted or not has no effect on term statistics (i.e. irrelevant) OR documents marked for deletion still count in term statistics (i.e. relevant) I guess I used the former because we don't go out of our way to still include deleted documents... it's just a side effect of the index structure that we don't (and can't easily) update statistics when a document is marked as deleted. -Yonik > Erick > > On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: >> Different replicas of the same shard can have different numbers of >> deleted documents (really just marked as deleted), and deleted >> documents are irrelevant to term statistics (like the number of >> documents a term appears in). Documents marked for deletion stop >> contributing to corpus statistics when they are actually removed (via >> expunge deletes, merges, optimizes). >> -Yonik >> >> >> On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer wrote: >>> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 >>> replicas (total of 4 nodes). >>> >>> If I run the query multiple times I see the three different top scoring >>> results. >>> No data load is running, all data has been commited >>> >>> I get these three different hits with their scores: >>> copperiinitratehemipentahydrate2325919004194430.61722 >>> copperiinitrateoncelite1234598765 432.44238 >>> copperiinitratehydrate18756anhydrousbasis13778319 428.24185 >>> >>> How is it that the same search against the same data can give different >>> responses? >>> I looked at the specific cores they look OK the numdocs for the replicas in >>> a shard match >>> >>> This is the query: >>> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid >>> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json >>> >>> -- >>> >>> >>> This message and any attachment are confidential and may be privileged or >>> otherwise protected from disclosure. If you are not the intended recipient, >>> you must not copy this message or attachment or disclose the contents to >>> any other person. If you have received this transmission in error, please >>> notify the sender immediately and delete the message and any attachment >>> from your system. Merck KGaA, Darmstadt, Germany and any of its >>> subsidiaries do not accept liability for any omissions or errors in this >>> message which may arise as a result of E-Mail-transmission or for damages >>> resulting from any unauthorized changes of the content of this message and >>> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >>> subsidiaries do not guarantee that this message is free of viruses and does >>> not accept liability for any damages caused by any virus transmitted >>> therewith. >>> >>> Click http://www.emdgroup.com/disclaimer to access the German, French, >>> Spanish and Portuguese versions of this disclaimer.
Re: Consecutive calls to a query give different results
bq: and deleted documents are irrelevant to term statistics... Did you mean "relevant"? Or do I have to adjust my thinking _again_? Erick On Wed, Sep 6, 2017 at 7:48 PM, Yonik Seeley wrote: > Different replicas of the same shard can have different numbers of > deleted documents (really just marked as deleted), and deleted > documents are irrelevant to term statistics (like the number of > documents a term appears in). Documents marked for deletion stop > contributing to corpus statistics when they are actually removed (via > expunge deletes, merges, optimizes). > -Yonik > > > On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer wrote: >> I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 >> replicas (total of 4 nodes). >> >> If I run the query multiple times I see the three different top scoring >> results. >> No data load is running, all data has been commited >> >> I get these three different hits with their scores: >> copperiinitratehemipentahydrate2325919004194430.61722 >> copperiinitrateoncelite1234598765 432.44238 >> copperiinitratehydrate18756anhydrousbasis13778319 428.24185 >> >> How is it that the same search against the same data can give different >> responses? >> I looked at the specific cores they look OK the numdocs for the replicas in >> a shard match >> >> This is the query: >> http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid >> ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json >> >> -- >> >> >> This message and any attachment are confidential and may be privileged or >> otherwise protected from disclosure. If you are not the intended recipient, >> you must not copy this message or attachment or disclose the contents to >> any other person. If you have received this transmission in error, please >> notify the sender immediately and delete the message and any attachment >> from your system. Merck KGaA, Darmstadt, Germany and any of its >> subsidiaries do not accept liability for any omissions or errors in this >> message which may arise as a result of E-Mail-transmission or for damages >> resulting from any unauthorized changes of the content of this message and >> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its >> subsidiaries do not guarantee that this message is free of viruses and does >> not accept liability for any damages caused by any virus transmitted >> therewith. >> >> Click http://www.emdgroup.com/disclaimer to access the German, French, >> Spanish and Portuguese versions of this disclaimer.
Re: Consecutive calls to a query give different results
Different replicas of the same shard can have different numbers of deleted documents (really just marked as deleted), and deleted documents are irrelevant to term statistics (like the number of documents a term appears in). Documents marked for deletion stop contributing to corpus statistics when they are actually removed (via expunge deletes, merges, optimizes). -Yonik On Wed, Sep 6, 2017 at 5:51 PM, Webster Homer wrote: > I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 > replicas (total of 4 nodes). > > If I run the query multiple times I see the three different top scoring > results. > No data load is running, all data has been commited > > I get these three different hits with their scores: > copperiinitratehemipentahydrate2325919004194430.61722 > copperiinitrateoncelite1234598765 432.44238 > copperiinitratehydrate18756anhydrousbasis13778319 428.24185 > > How is it that the same search against the same data can give different > responses? > I looked at the specific cores they look OK the numdocs for the replicas in > a shard match > > This is the query: > http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid > ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.emdgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Consecutive calls to a query give different results
I am using Solr 6.2.0 configured as a solr cloud with 2 shards and 4 replicas (total of 4 nodes). If I run the query multiple times I see the three different top scoring results. No data load is running, all data has been commited I get these three different hits with their scores: copperiinitratehemipentahydrate2325919004194430.61722 copperiinitrateoncelite1234598765 432.44238 copperiinitratehydrate18756anhydrousbasis13778319 428.24185 How is it that the same search against the same data can give different responses? I looked at the specific cores they look OK the numdocs for the replicas in a shard match This is the query: http://ae1c-ecomdev-msc01.sial.com:8983/solr/sial-catalog-product/select?defType=edismax&fl=searchmv_en_keywords,%20searchmv_keywords,searchmv_pno,%20searchmv_en_s_pri_name,%20search_en_p_pri_name,%20search_pno%20[explain%20style=nl]&group.field=id_s&group.limit=30&group=true&group.sort=sort_ds%20asc&indent=on&mm=2%3C-25%25&q.op=OR&q=copper%20nitrate&qf=search_pid ^500%20search_concat_pno^400%20searchmv_concat_sku^400%20searchmv_pno^300%20search_concat_pno_genr^100%20searchmv_pno_genr%20searchmv_p_skus_genr%20searchmv_user_term^200%20search_lform^190%20searchmv_en_acronym^180%20search_en_root_name^170%20searchmv_en_s_pri_name^160%20search_en_p_pri_name^150%20searchmv_en_synonyms^145%20searchmv_en_keywords^140%20search_en_sortkey^120%20searchmv_p_skus^100%20searchmv_chem_comp^90%20searchmv_en_name_suf%20searchmv_cas_number^80%20searchmv_component_cas^70%20search_beilstein^50%20search_color_idx^40%20search_ecnumber^30%20search_egecnumber^30%20search_femanumber^20%20searchmv_isbn^10%20search_mdl_number%20searchmv_en_page_title%20searchmv_en_descriptions%20searchmv_en_attributes%20searchmv_rtecs%20searchmv_lookahead_terms%20searchmv_xref_comparable_pno%20searchmv_xref_comparable_sku%20searchmv_xref_equivalent_pno%20searchmv_xref_exact_pno%20searchmv_xref_exact_sku%20searchmv_component_molform&rows=30&sort=score%20desc,sort_en_name%20asc,sort_ds%20asc,search_pid%20asc&wt=json -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.emdgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Solr suggester query with quotes produces different results
Hi guys, I have the Suggester configured using the FreeTextFactory. Noticed that if I dont use quotation marks, I only get single term results. If i use quotation marks around my query, then I only get results that are comprised of multiple terms. There is no configuration that would return both types of results with a single query. Thanks Angel
Re: Solr Cloud Replica Cores Give different Results for the Same query
ler.component.ShardFieldSortedHitQueue$S >>> hardComparator.sortVal(ShardFieldSortedHitQueue.java:146)\n\tat >>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$ >>> 1.compare(ShardFieldSortedHitQueue.java:167)\n\tat >>> org.apache.solr.handler.component.ShardFieldSortedHitQueue$ >>> 1.compare(ShardFieldSortedHitQueue.java:159)\n\tat >>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l >>> essThan(ShardFieldSortedHitQueue.java:91)\n\tat >>> org.apache.solr.handler.component.ShardFieldSortedHitQueue.l >>> essThan(ShardFieldSortedHitQueue.java:33)\n\tat >>> org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat >>> org.apache.solr.handler.component.QueryComponent.mergeIds( >>> QueryComponent.java:1098)\n\tat org.apache.solr.handler.compon >>> ent.QueryComponent.handleRegularResponses(QueryComponent.java:758)\n\tat >>> org.apache.solr.handler.component.QueryComponent.handleRespo >>> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon >>> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat >>> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req >>> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore. >>> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo >>> lrCall.execute(HttpSolrCall.java:652)\n\tat >>> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat >>> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat >>> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >>> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv >>> letHandler.doHandle(ServletHandler.java:581)\n\tat >>> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >>> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat >>> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandler.doHandle( >>> ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv >>> letHandler.doScope(ServletHandler.java:511)\n\tat >>> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandler.doScope( >>> ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl >>> er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat >>> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha >>> ndle(ContextHandlerCollection.java:213)\n\tat >>> org.eclipse.jetty.server.handler.HandlerCollection.handle( >>> HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl >>> er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat >>> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat >>> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat >>> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat >>> org.eclipse.jetty.io.AbstractConnection$ReadCallback. >>> succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io >>> .FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io >>> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat >>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >>> .produceAndRun(ExecuteProduceConsume.java:246)\n\tat >>> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >>> .run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread. >>> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat >>> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat >>> java.lang.Thread.run(Thread.java:745)\n", "code":500}} >>> >>> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson >>> wrote: >>> >>>> Let's back up a bit. You say "This seems to cause two replicas to >>>> return different hits depending upon which one is queried." >>>> >>>> OK, _how_ are they different? I've been assuming different numbers of >>>> hits. If you're getting the same number of hits but different document >>>> ordering, that's a completely different issue and may be easily >>>> explainable. If this is true, skip the rest of this message. I only >>>> realized we may be using
Re: Solr Cloud Replica Cores Give different Results for the Same query
ponent.java:758)\n\tat >> org.apache.solr.handler.component.QueryComponent.handleRespo >> nses(QueryComponent.java:737)\n\tat org.apache.solr.handler.compon >> ent.SearchHandler.handleRequestBody(SearchHandler.java:428)\n\tat >> org.apache.solr.handler.RequestHandlerBase.handleRequest(Req >> uestHandlerBase.java:154)\n\tat org.apache.solr.core.SolrCore. >> execute(SolrCore.java:2089)\n\tat org.apache.solr.servlet.HttpSo >> lrCall.execute(HttpSolrCall.java:652)\n\tat >> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)\n\tat >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)\n\tat >> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)\n\tat >> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilte >> r(ServletHandler.java:1668)\n\tat org.eclipse.jetty.servlet.Serv >> letHandler.doHandle(ServletHandler.java:581)\n\tat >> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat >> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)\n\tat >> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat >> org.eclipse.jetty.server.handler.ContextHandler.doHandle( >> ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet.Serv >> letHandler.doScope(ServletHandler.java:511)\n\tat >> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\n\tat >> org.eclipse.jetty.server.handler.ContextHandler.doScope( >> ContextHandler.java:1092)\n\tat org.eclipse.jetty.server.handl >> er.ScopedHandler.handle(ScopedHandler.java:141)\n\tat >> org.eclipse.jetty.server.handler.ContextHandlerCollection.ha >> ndle(ContextHandlerCollection.java:213)\n\tat >> org.eclipse.jetty.server.handler.HandlerCollection.handle( >> HandlerCollection.java:119)\n\tat org.eclipse.jetty.server.handl >> er.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat >> org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat >> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat >> org.eclipse.jetty.io.AbstractConnection$ReadCallback. >> succeeded(AbstractConnection.java:273)\n\tat org.eclipse.jetty.io >> .FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io >> .SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >> .produceAndRun(ExecuteProduceConsume.java:246)\n\tat >> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume >> .run(ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread. >> QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat >> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)\n\tat >> java.lang.Thread.run(Thread.java:745)\n", "code":500}} >> >> On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson >> wrote: >> >>> Let's back up a bit. You say "This seems to cause two replicas to >>> return different hits depending upon which one is queried." >>> >>> OK, _how_ are they different? I've been assuming different numbers of >>> hits. If you're getting the same number of hits but different document >>> ordering, that's a completely different issue and may be easily >>> explainable. If this is true, skip the rest of this message. I only >>> realized we may be using a different definition of "different hits" >>> part way through writing this reply. >>> >>> >>> >>> Having the timestamp as a string isn't a problem, you can do something >>> very similar with wildcards and the like if it's a string that sorts >>> the same way the timestamp would. And it's best if it's created >>> upstream anyway that way it's guaranteed to be the same for the doc on >>> all replicas. >>> >>> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple >>> copyfield to a date field would do the trick. >>> >>> But there's no real reason to do any of that. Given that you see this >>> when there's no indexing going on then there's no point to those >>> tests, those were just for a way to examine your nodes while there was >>> active indexing. >>> >>> How do you fix this problem when you see it? If it goes away by itself >>> that would gives at least a start on where to look. If you have to >>> manually intervene it would be good to
Re: Solr Cloud Replica Cores Give different Results for the Same query
rvletHandler$CachedChain.doFilter(ServletHandler.java:1668)\n\tat > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\n\tat > org.eclipse.jetty.security.SecurityHandler.handle( > SecurityHandler.java:548)\n\tat org.eclipse.jetty.server. > session.SessionHandler.doHandle(SessionHandler.java:226)\n\tat > org.eclipse.jetty.server.handler.ContextHandler. > doHandle(ContextHandler.java:1160)\n\tat org.eclipse.jetty.servlet. > ServletHandler.doScope(ServletHandler.java:511)\n\tat > org.eclipse.jetty.server.session.SessionHandler. > doScope(SessionHandler.java:185)\n\tat org.eclipse.jetty.server. > handler.ContextHandler.doScope(ContextHandler.java:1092)\n\tat > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle( > ContextHandlerCollection.java:213)\n\tat org.eclipse.jetty.server. > handler.HandlerCollection.handle(HandlerCollection.java:119)\n\tat > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)\n\tat > org.eclipse.jetty.server.Server.handle(Server.java:518)\n\tat > org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\n\tat > org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)\n\tat > org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded( > AbstractConnection.java:273)\n\tat org.eclipse.jetty.io. > FillInterest.fillable(FillInterest.java:95)\n\tat org.eclipse.jetty.io. > SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)\n\tat > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume. > produceAndRun(ExecuteProduceConsume.java:246)\n\tat > org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run( > ExecuteProduceConsume.java:156)\n\tat org.eclipse.jetty.util.thread. > QueuedThreadPool.runJob(QueuedThreadPool.java:654)\n\tat > org.eclipse.jetty.util.thread.QueuedThreadPool$3.run( > QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n", > "code":500}} > > On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson > wrote: > >> Let's back up a bit. You say "This seems to cause two replicas to >> return different hits depending upon which one is queried." >> >> OK, _how_ are they different? I've been assuming different numbers of >> hits. If you're getting the same number of hits but different document >> ordering, that's a completely different issue and may be easily >> explainable. If this is true, skip the rest of this message. I only >> realized we may be using a different definition of "different hits" >> part way through writing this reply. >> >> >> >> Having the timestamp as a string isn't a problem, you can do something >> very similar with wildcards and the like if it's a string that sorts >> the same way the timestamp would. And it's best if it's created >> upstream anyway that way it's guaranteed to be the same for the doc on >> all replicas. >> >> If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple >> copyfield to a date field would do the trick. >> >> But there's no real reason to do any of that. Given that you see this >> when there's no indexing going on then there's no point to those >> tests, those were just for a way to examine your nodes while there was >> active indexing. >> >> How do you fix this problem when you see it? If it goes away by itself >> that would gives at least a start on where to look. If you have to >> manually intervene it would be good to know what you do. >> >> The CDCR pattern is docs to from the leader on the source cluster to >> the leader on the target cluster. Once the target leader gets the >> docs, it's supposed to send the doc to all the replicas. >> >> To try to narrow down the issue, next time it occurs can you look at >> _both_ the source and target clusters and see if they _both_ show the >> same discrepancy? What I'm looking for is whether both are >> self-consistent. That is, all the replicas for shardN on the source >> cluster show the same documents (M). All the replicas for shardN on >> the target cluster show the same number of docs (N). I'm not as >> concerned if M != N at this point. Note I'm looking at the number of >> hits here, not say the document ordering. >> >> To do this you'll have to do the trick I mentioned where you query >> each replica separately. >> >> And are you absolutely sure that your differe
Re: Solr Cloud Replica Cores Give different Results for the Same query
adPool$3.run(QueuedThreadPool.java:572)\n\tat java.lang.Thread.run(Thread.java:745)\n", "code":500}} On Wed, Dec 14, 2016 at 7:41 PM, Erick Erickson wrote: > Let's back up a bit. You say "This seems to cause two replicas to > return different hits depending upon which one is queried." > > OK, _how_ are they different? I've been assuming different numbers of > hits. If you're getting the same number of hits but different document > ordering, that's a completely different issue and may be easily > explainable. If this is true, skip the rest of this message. I only > realized we may be using a different definition of "different hits" > part way through writing this reply. > > > > Having the timestamp as a string isn't a problem, you can do something > very similar with wildcards and the like if it's a string that sorts > the same way the timestamp would. And it's best if it's created > upstream anyway that way it's guaranteed to be the same for the doc on > all replicas. > > If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple > copyfield to a date field would do the trick. > > But there's no real reason to do any of that. Given that you see this > when there's no indexing going on then there's no point to those > tests, those were just for a way to examine your nodes while there was > active indexing. > > How do you fix this problem when you see it? If it goes away by itself > that would gives at least a start on where to look. If you have to > manually intervene it would be good to know what you do. > > The CDCR pattern is docs to from the leader on the source cluster to > the leader on the target cluster. Once the target leader gets the > docs, it's supposed to send the doc to all the replicas. > > To try to narrow down the issue, next time it occurs can you look at > _both_ the source and target clusters and see if they _both_ show the > same discrepancy? What I'm looking for is whether both are > self-consistent. That is, all the replicas for shardN on the source > cluster show the same documents (M). All the replicas for shardN on > the target cluster show the same number of docs (N). I'm not as > concerned if M != N at this point. Note I'm looking at the number of > hits here, not say the document ordering. > > To do this you'll have to do the trick I mentioned where you query > each replica separately. > > And are you absolutely sure that your different results are coming > from the _same_ cluster? If you're comparing a query from the source > cluster with a query from the target cluster, that's different than if > the queries come from the same cluster. > > Best, > Erick > > On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer > wrote: > > Thanks for the quick feedback. > > > > We are not doing continuous indexing, we do a complete load once a week > and > > then have a daily partial load for any documents that have changed since > > the load. These partial loads take only a few minutes every morning. > > > > The problem is we see this discrepancy long after the data load > completes. > > > > We have a source collection that uses cdcr to replicate to the target. I > > see the current=false setting in both the source and target collections. > > Only the target collection is being heavily searched so that is where my > > concern is. So what could cause this kind of issue? > > Do we have a configuration problem? > > > > It doesn't happen all the time, so I don't currently have a reproducible > > test case, yet. > > > > I will see about adding the timestamp, we have one, but it was created > as a > > string, and was generated by our ETL job > > > > On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson > > > wrote: > > > >> The commit points on different replicas will trip at different wall > >> clock times so the leader and replica may return slightly different > >> results depending on whether doc X was included in the commit on one > >> replica but not on the second. After the _next_ commit interval (2 > >> seconds in your case), doc X will be committed on the second replica: > >> that is it's not lost. > >> > >> Here's a couple of ways to verify: > >> > >> 1> turn off indexing and wait a few seconds. The replicas should have > >> the exact same documents. "A few seconds" is your autocommit (soft in > >> your case) interval + autowarm time. This last is unknown, but you can > >&g
Re: Solr Cloud Replica Cores Give different Results for the Same query
Let's back up a bit. You say "This seems to cause two replicas to return different hits depending upon which one is queried." OK, _how_ are they different? I've been assuming different numbers of hits. If you're getting the same number of hits but different document ordering, that's a completely different issue and may be easily explainable. If this is true, skip the rest of this message. I only realized we may be using a different definition of "different hits" part way through writing this reply. Having the timestamp as a string isn't a problem, you can do something very similar with wildcards and the like if it's a string that sorts the same way the timestamp would. And it's best if it's created upstream anyway that way it's guaranteed to be the same for the doc on all replicas. If the date is in canonical form (-MM-DDTHH:MM:SSZ) then a simple copyfield to a date field would do the trick. But there's no real reason to do any of that. Given that you see this when there's no indexing going on then there's no point to those tests, those were just for a way to examine your nodes while there was active indexing. How do you fix this problem when you see it? If it goes away by itself that would gives at least a start on where to look. If you have to manually intervene it would be good to know what you do. The CDCR pattern is docs to from the leader on the source cluster to the leader on the target cluster. Once the target leader gets the docs, it's supposed to send the doc to all the replicas. To try to narrow down the issue, next time it occurs can you look at _both_ the source and target clusters and see if they _both_ show the same discrepancy? What I'm looking for is whether both are self-consistent. That is, all the replicas for shardN on the source cluster show the same documents (M). All the replicas for shardN on the target cluster show the same number of docs (N). I'm not as concerned if M != N at this point. Note I'm looking at the number of hits here, not say the document ordering. To do this you'll have to do the trick I mentioned where you query each replica separately. And are you absolutely sure that your different results are coming from the _same_ cluster? If you're comparing a query from the source cluster with a query from the target cluster, that's different than if the queries come from the same cluster. Best, Erick On Wed, Dec 14, 2016 at 2:48 PM, Webster Homer wrote: > Thanks for the quick feedback. > > We are not doing continuous indexing, we do a complete load once a week and > then have a daily partial load for any documents that have changed since > the load. These partial loads take only a few minutes every morning. > > The problem is we see this discrepancy long after the data load completes. > > We have a source collection that uses cdcr to replicate to the target. I > see the current=false setting in both the source and target collections. > Only the target collection is being heavily searched so that is where my > concern is. So what could cause this kind of issue? > Do we have a configuration problem? > > It doesn't happen all the time, so I don't currently have a reproducible > test case, yet. > > I will see about adding the timestamp, we have one, but it was created as a > string, and was generated by our ETL job > > On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson > wrote: > >> The commit points on different replicas will trip at different wall >> clock times so the leader and replica may return slightly different >> results depending on whether doc X was included in the commit on one >> replica but not on the second. After the _next_ commit interval (2 >> seconds in your case), doc X will be committed on the second replica: >> that is it's not lost. >> >> Here's a couple of ways to verify: >> >> 1> turn off indexing and wait a few seconds. The replicas should have >> the exact same documents. "A few seconds" is your autocommit (soft in >> your case) interval + autowarm time. This last is unknown, but you can >> check your admin/plugins-stats search handler times, it's reported >> there. Now issue your queries. If the replicas don't report the same >> docs A Bad Thing that should be worrying. BTW, with a 2 second soft >> commit interval, which is really aggressive, you _better not_ have >> very large autowarm intervals! >> >> 2> Include a timestamp in your docs when they are indexed. There's an >> automatic way to do that BTW now do your queries and append an FQ >> clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas >> should have the same counts
Re: Solr Cloud Replica Cores Give different Results for the Same query
Thanks for the quick feedback. We are not doing continuous indexing, we do a complete load once a week and then have a daily partial load for any documents that have changed since the load. These partial loads take only a few minutes every morning. The problem is we see this discrepancy long after the data load completes. We have a source collection that uses cdcr to replicate to the target. I see the current=false setting in both the source and target collections. Only the target collection is being heavily searched so that is where my concern is. So what could cause this kind of issue? Do we have a configuration problem? It doesn't happen all the time, so I don't currently have a reproducible test case, yet. I will see about adding the timestamp, we have one, but it was created as a string, and was generated by our ETL job On Wed, Dec 14, 2016 at 3:42 PM, Erick Erickson wrote: > The commit points on different replicas will trip at different wall > clock times so the leader and replica may return slightly different > results depending on whether doc X was included in the commit on one > replica but not on the second. After the _next_ commit interval (2 > seconds in your case), doc X will be committed on the second replica: > that is it's not lost. > > Here's a couple of ways to verify: > > 1> turn off indexing and wait a few seconds. The replicas should have > the exact same documents. "A few seconds" is your autocommit (soft in > your case) interval + autowarm time. This last is unknown, but you can > check your admin/plugins-stats search handler times, it's reported > there. Now issue your queries. If the replicas don't report the same > docs A Bad Thing that should be worrying. BTW, with a 2 second soft > commit interval, which is really aggressive, you _better not_ have > very large autowarm intervals! > > 2> Include a timestamp in your docs when they are indexed. There's an > automatic way to do that BTW now do your queries and append an FQ > clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas > should have the same counts unless you are deleting documents. I > mention deletes on the off chance that you're deleting documents that > fall in the interval and then the same as above could theoretically > occur. Updates should be fine. > > BTW, I've seen continuous monitoring of this done by automated > scripts. The key is to get the shard URL and ping that with > &distrib=false. It'll look something like > http://host:port/solr/collection_shard1_replica1 People usually > just use *:* and compare numFound. > > Best, > Erick > > > > On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer > wrote: > > We are using Solr Cloud 6.2 > > > > We have been noticing an issue where the index in a core shows as > current = > > false > > > > We have autocommit set for 15 seconds, and soft commit at 2 seconds > > > > This seems to cause two replicas to return different hits depending upon > > which one is queried. > > > > What would lead to the indexes not being "current"? The documentation on > > the meaning of current is vague. > > > > The collections in our cloud have two shards each with two replicas. I > see > > this with several of the collections. > > > > We don't know how they get like this but it's troubling > > > > -- > > > > > > This message and any attachment are confidential and may be privileged or > > otherwise protected from disclosure. If you are not the intended > recipient, > > you must not copy this message or attachment or disclose the contents to > > any other person. If you have received this transmission in error, please > > notify the sender immediately and delete the message and any attachment > > from your system. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not accept liability for any omissions or errors in this > > message which may arise as a result of E-Mail-transmission or for damages > > resulting from any unauthorized changes of the content of this message > and > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > > subsidiaries do not guarantee that this message is free of viruses and > does > > not accept liability for any damages caused by any virus transmitted > > therewith. > > > > Click http://www.merckgroup.com/disclaimer to access the German, French, > > Spanish and Portuguese versions of this disclaimer. > -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient,
Re: Solr Cloud Replica Cores Give different Results for the Same query
The commit points on different replicas will trip at different wall clock times so the leader and replica may return slightly different results depending on whether doc X was included in the commit on one replica but not on the second. After the _next_ commit interval (2 seconds in your case), doc X will be committed on the second replica: that is it's not lost. Here's a couple of ways to verify: 1> turn off indexing and wait a few seconds. The replicas should have the exact same documents. "A few seconds" is your autocommit (soft in your case) interval + autowarm time. This last is unknown, but you can check your admin/plugins-stats search handler times, it's reported there. Now issue your queries. If the replicas don't report the same docs A Bad Thing that should be worrying. BTW, with a 2 second soft commit interval, which is really aggressive, you _better not_ have very large autowarm intervals! 2> Include a timestamp in your docs when they are indexed. There's an automatic way to do that BTW now do your queries and append an FQ clause like &fq=timestamp:[* TO some_point_in_the_past]. The replicas should have the same counts unless you are deleting documents. I mention deletes on the off chance that you're deleting documents that fall in the interval and then the same as above could theoretically occur. Updates should be fine. BTW, I've seen continuous monitoring of this done by automated scripts. The key is to get the shard URL and ping that with &distrib=false. It'll look something like http://host:port/solr/collection_shard1_replica1 People usually just use *:* and compare numFound. Best, Erick On Wed, Dec 14, 2016 at 1:10 PM, Webster Homer wrote: > We are using Solr Cloud 6.2 > > We have been noticing an issue where the index in a core shows as current = > false > > We have autocommit set for 15 seconds, and soft commit at 2 seconds > > This seems to cause two replicas to return different hits depending upon > which one is queried. > > What would lead to the indexes not being "current"? The documentation on > the meaning of current is vague. > > The collections in our cloud have two shards each with two replicas. I see > this with several of the collections. > > We don't know how they get like this but it's troubling > > -- > > > This message and any attachment are confidential and may be privileged or > otherwise protected from disclosure. If you are not the intended recipient, > you must not copy this message or attachment or disclose the contents to > any other person. If you have received this transmission in error, please > notify the sender immediately and delete the message and any attachment > from your system. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not accept liability for any omissions or errors in this > message which may arise as a result of E-Mail-transmission or for damages > resulting from any unauthorized changes of the content of this message and > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > subsidiaries do not guarantee that this message is free of viruses and does > not accept liability for any damages caused by any virus transmitted > therewith. > > Click http://www.merckgroup.com/disclaimer to access the German, French, > Spanish and Portuguese versions of this disclaimer.
Solr Cloud Replica Cores Give different Results for the Same query
We are using Solr Cloud 6.2 We have been noticing an issue where the index in a core shows as current = false We have autocommit set for 15 seconds, and soft commit at 2 seconds This seems to cause two replicas to return different hits depending upon which one is queried. What would lead to the indexes not being "current"? The documentation on the meaning of current is vague. The collections in our cloud have two shards each with two replicas. I see this with several of the collections. We don't know how they get like this but it's troubling -- This message and any attachment are confidential and may be privileged or otherwise protected from disclosure. If you are not the intended recipient, you must not copy this message or attachment or disclose the contents to any other person. If you have received this transmission in error, please notify the sender immediately and delete the message and any attachment from your system. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not accept liability for any omissions or errors in this message which may arise as a result of E-Mail-transmission or for damages resulting from any unauthorized changes of the content of this message and any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its subsidiaries do not guarantee that this message is free of viruses and does not accept liability for any damages caused by any virus transmitted therewith. Click http://www.merckgroup.com/disclaimer to access the German, French, Spanish and Portuguese versions of this disclaimer.
Different results for comma and whitespace separated query string using eDisMax Query Parser
Hi, different results are obtained for a query separated by comma and one separated by whitespace, "q":"foo,bar", "q":"foo bar", although solr.StandardTokenizerFactory is utilized. The eDisMax Query Parser is used. Fields of interest are determined by the 'qf' parameter. "defType":"edismax", "qf":"STREET_NAME COMMPART_NAME", The different results are also reflected within the parsedquery debug output: Whitespace: "rawquerystring":"foo bar", "querystring":"foo bar", "parsedquery":"(+(DisjunctionMaxQuery((STREET_NAME:foo | COMMPART_NAME:foo)) DisjunctionMaxQuery((STREET_NAME:bar | COMMPART_NAME:bar/no_coord", "parsedquery_toString":"+((STREET_NAME:foo | COMMPART_NAME:foo) (STREET_NAME:bar | COMMPART_NAME:bar))", "explain":{}, "QParser":"ExtendedDismaxQParser", Comma: "rawquerystring":"foo,bar", "querystring":"foo,bar", "parsedquery":"(+DisjunctionMaxQuery(((STREET_NAME:foo STREET_NAME:bar) | (COMMPART_NAME:foo COMMPART_NAME:bar/no_coord", "parsedquery_toString":"+((STREET_NAME:foo STREET_NAME:bar) | (COMMPART_NAME:foo COMMPART_NAME:bar))", "explain":{}, "QParser":"ExtendedDismaxQParser", The way I understand the standard tokenizer, both query strings should be split in the same way, treating whitespace and punctuation as delimiters. However, obviously, different separators result in different evaluations. In the first case, the score values of both DisjunctionMaxQuery evaluations are added together. In the second case, only one (the maximum) of these score values is returned. Any ideas what I am missing here? I am using Solr 6.2.0. Configuration details: and Thanks and all the best, Frank -- Frank Zirkelbach LEW Verteilnetz GmbH (LVN), GIS/NIS Schaezlerstraße 3, 86150 Augsburg Tel. intern: 71-1379 Tel. extern: +49-821-328-1379 Fax extern: +49-821-328-1360 mailto:frank.zirkelb...@lew-verteilnetz.de www.lew-verteilnetz.de Vorsitzender des Aufsichtsrats: Dr. Markus Litpher; Geschäftsführer: Manfred Lux, Theo Schmidtner, Eugen Wiedemann Sitz der Gesellschaft: Augsburg; USt-IdNr. DE240432124 Handelsregister HRB 20929, Registergericht: Amtsgericht Augsburg
Re: Solr MLT with stream.body returns different results on each shard
: I have a fresh install of Solr 5.2.1 with about 3 million docs freshly : indexed (I can also reproduce this issue on 4.10.0). When I use the Solr : MorelikeThisHandler with content stream I'm getting different results per : shard. I haven't looked at the code recently but i'm 99% certain that the MLT handler in general doesn't work with distributed (ie: sharded) queries. (unlike the MLT component and the recently added MLT qparser) I suspect that in the specific case of stream.body, what you are seeing is that the interesting terms are being computed relative the local tf/idf stats for that shard, and then only local results from that shard are being returned. : I also looked at using a standard MLT query, but I need to be able to : stream in a fairly large block of text for comparison that is not in the : index (different type of document). A standard MLT query Until/unless the MLT parser supports arbitrary text (there's some mention of this in SOLR-7639 but i'm not sure what the status of that is) you might find that just POSTing all of your text as a regular query (q) using dismax or edismax is suitable for your needs -- that's essentially the equivilent of what MLTHandler does with a stream.body, except it tries to only focus on "interesting terms" based on tf/idf, but if your fields are all configured with stopword files anyway, then the results and performance may be similar. -Hoss http://www.lucidworks.com/
Solr MLT with stream.body returns different results on each shard
I have a fresh install of Solr 5.2.1 with about 3 million docs freshly indexed (I can also reproduce this issue on 4.10.0). When I use the Solr MorelikeThisHandler with content stream I'm getting different results per shard. I also looked at using a standard MLT query, but I need to be able to stream in a fairly large block of text for comparison that is not in the index (different type of document). A standard MLT query http://testsolr2:8983/solr/mega/select?q=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score appears to return consistent results between shards. Any reason why the content stream query would be different between shards? Thank you for your help! Aaron *Content Stream Example:* http://testsolr1:8983/solr/mega/mlt?stream.body=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score *Returns: * 0 3 http://testsolr2:8983/solr/mega/mlt?stream.body=electronics&mlt.flt=text&mlt.mintf=0&fl=id,score *Returns: * 0 1
Solr Clustering component different results than Carrot workbench
Though I am interacting with Dawid (creator of Carrot2) on Carrot2 mailing list however just wanted to post my problem to a wider audience. I am using Solr 4.7 (on both windows and linux) and saved my lingo-attributes.xml file from the workbench which I am using in Solr. Note that for testing I am just having one solr Index and all the queries are getting fired on that. Now the clusters that I am getting are good in the workbench (carrot) but pathetic in Solr. In the logs (jetty) I can see: Loaded Solr resource: clustering/carrot2/lingo-attributes.xml, so that indicates that my attribute file is being loaded. I am really confused what is accounting for the difference in the two outputs (workbench vs Solr). Again to reiterate the data sources are same (just one solr index and same queries with 100 results). This is happening on both Linux and Windows. Given below is my search component and request handler configuration: lingo org.carrot2.clustering.lingo.LingoClusteringAlgorithm 30 clustering/carrot2 true true org.carrot2.clustering.lingo.LingoClusteringAlgorithm clustering/carrot2 film_id description true false 100 clustering
Re: Join and non-Join query give different results
I have figured it out. The reason is simply the type of join in Solr. It is an outer join. Since both filter queries are executed separately, a house that has available documents with discount > 1 or (sd_year:2014 AND sd_month:11) will be returned even though my intention was applying bother conditions at the same time. However, in the second case, both conditions are applied at same time to find available documents, then houses based on the matching available documents are returned. Since there is no any available document that satisfies both conditions, then there is no any matching house which gives zero results. It really took sometime to figure this out, I hope this will help someone else. -- View this message in context: http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922p4148131.html Sent from the Solr - User mailing list archive at Nabble.com.
Join and non-Join query give different results
Hi everyone, I am trying to link two types of documents in my Solr index. The parent is named "house" and the child is named "available". So, I want to return a list of houses that have available documents with some filtering. However, the following query gives me around 18 documents, which is wrong. It should return 0 documents. q=*:* &fq={!join from=house_id_fk to=house_id}doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS] &fq={!join from=house_id_fk to=house_id}doctype:available AND sd_year:2014 AND sd_month:11 To debug it, I tried first to check whether there is any available documents with the given filter queries. So, I tried the following query: q=*:* &fq=doctype:available AND discount:[1 TO *] AND start_date:[NOW/DAY TO NOW/DAY%2B21DAYS] &fq=doctype:available AND sd_year:2014 AND sd_month:11 The query gives 0 results, which is correct. So as you can see both queries are the same, the different is using the join query parser. I am a bit confused, why the first query gives results. My understanding is that this should not happen because the second query shows that there is no any available documents that satisfy the given filter queries. -- View this message in context: http://lucene.472066.n3.nabble.com/Join-and-non-Join-query-give-different-results-tp4146922.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debug different Results from different Request Handlers
Thank you Erik (and to steffkes who helped me on the IRC #Solr Chat). Sorry for the delay in responding, but I got this to work. Your suggestion about adding debug=true to the query helped me. Since I was adding this to the Velocity request handler, I could not see the debug results, but when I added wt=xml i.e. /products?q=hp|lync& debug=true&wt=xml, I could see the Parsed Query as well as the Parser used for each handler. Thanks also to steffkes who answered my question in the original post (on IRC) i.e. both of my handlers go through org.apache.solr.servlet.SolrDispatchFilter, particularly it’s the doFilter() method that I was looking for. Also as steffkes pointed out, (from my original post), the /products request handler uses the ExtendedDismaxQParser whereas the second /search or /select request handler uses the LuceneQParser. It seems that these two parsers handle the | sign very differently. For my limited private installation, I decided to get to the base class of ExtendedDismaxQParser & LuceneQParser i.e. QParser. There in the constructor, I strip out the | sign from the qstr parameter. This is probably the dirtiest way to get this to work, but it works for now. Thanks again to you all. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4142716.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debug different Results from different Request Handlers
If you want the two request handlers to have the same behavior, but just the velocity stuff be different, than remove everything except echoParams, wt, v.template, v.base_dir, v.layout, (and title if your templates are using it, the default does). You can see which query parser is being used by adding debug=true to the request (or debugQuery=true, legacy param). Erik On Jun 14, 2014, at 1:47 PM, O. Olson wrote: > Thank you Erik. I tried /products?q=hp|lync&wt=xml and I show no results i.e. > numFound="0", so I think there is something wrong. You are correct, that the > VRW is not the problem but the Query Parser. Could you please let me know > how to determine the query parser? > > For most part I have not changed these request handlers from the Solr > examples. The Request Handler that uses Apache Velocity looks like: > > > > explicit > velocity > browse > true > VMTemplates > layout > Solritas > edismax > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > title^10.0 description^5.0 keywords^5.0 author^2.0 > resourcename^1.0 > > text > 100% > *:* > 10 > *,score > > text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 > title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 > >name="mlt.fl">text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename > 3 > on > CategoryID > on > false > 5 > 2 > 5 > true > true > 5 > 3 > > > spellcheck > > > > And the regular XML handler looks like: > > class="org.apache.solr.handler.component.SearchHandler"> > > explicit > > > > Does this show which is the Query Parser? I can post more of my > solrconfig.xml if necessary. > > I am curious where the Query Parser hands over the parameters to the Solr > engine that would be common irrespective of Request Handler i.e. I am trying > to put debugging statements into the common code so that these can dump out > intermediate results to the log. > > Thanks again Erik. > O. O. > > > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debug different Results from different Request Handlers
Thank you Erik. I tried /products?q=hp|lync&wt=xml and I show no results i.e. numFound="0", so I think there is something wrong. You are correct, that the VRW is not the problem but the Query Parser. Could you please let me know how to determine the query parser? For most part I have not changed these request handlers from the Solr examples. The Request Handler that uses Apache Velocity looks like: explicit velocity browse true VMTemplates layout Solritas edismax text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 text 100% *:* 10 *,score text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4 title^10.0 description^5.0 keywords^5.0 author^2.0 resourcename^1.0 text,features,name,sku,id,manu,cat,title,description,keywords,author,resourcename 3 on CategoryID on false 5 2 5 true true 5 3 spellcheck And the regular XML handler looks like: explicit Does this show which is the Query Parser? I can post more of my solrconfig.xml if necessary. I am curious where the Query Parser hands over the parameters to the Solr engine that would be common irrespective of Request Handler i.e. I am trying to put debugging statements into the common code so that these can dump out intermediate results to the log. Thanks again Erik. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Debug-different-Results-from-different-Request-Handlers-tp4141804p4141859.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Debug different Results from different Request Handlers
Try /products?wt=xml and compare. VRW is just a writer; it doesn't affect the results in any way. Let's see the rest of those handler definitions - different query parser is my hunch. Or maybe your velocity template is not showing the actual results? Erik > On Jun 13, 2014, at 22:44, "O. Olson" wrote: > > Hi, > > In my solrcofig.xml I have one Request Handler displaying the results using > Apache Velocity: > > > > And another with regular XML: > class="org.apache.solr.handler.component.SearchHandler"> > > I am seeing different results when I use these two handlers. > > Search Query: hp|lync (Or on the URL q=hp%7Elync) > > I see 0 results when I use the first handler (Velocity), but I see many > results (10’s) with the second handler. I am trying to debug why this problem > occurs. I am certain the problem is with the first handler, and I would be > grateful if anyone can help me debug this. I do not know Solr well enough, so > a few pointers could help. > > 1. First, I would like to know if class="solr.SearchHandler" and > class="org.apache.solr.handler.component.SearchHandler" are the same? If no, > what does "solr.SearchHandler" refer to? > > 2. Second, I am working with the source of Solr 4.7 (yes, it is a bit old, > but I don’t think it is fundamentally changed). I have put log.debug() > statements in the org.apache.solr.response.VelocityResponseWriter.write() > method to verify that my query is not getting mangled with the URL encoding, > and it is not. So, since I am getting different results for the same queries, > I am curious to see what the core Solr engine is receiving when I run the > same query from different handlers. Could someone tell me the class which has > the core Solr engine that is used irrespective of which Request Handler makes > the request? I am trying to put debug statements into this class to log the > value of the query parameter that it receives. The results are different, so > I think one or more parameters are different. > > Thank you in advance, > O. O. >
Debug different Results from different Request Handlers
Hi, In my solrcofig.xml I have one Request Handler displaying the results using Apache Velocity: And another with regular XML: I am seeing different results when I use these two handlers. Search Query: hp|lync (Or on the URL q=hp%7Elync) I see 0 results when I use the first handler (Velocity), but I see many results (10’s) with the second handler. I am trying to debug why this problem occurs. I am certain the problem is with the first handler, and I would be grateful if anyone can help me debug this. I do not know Solr well enough, so a few pointers could help. 1. First, I would like to know if class="solr.SearchHandler" and class="org.apache.solr.handler.component.SearchHandler" are the same? If no, what does "solr.SearchHandler" refer to? 2. Second, I am working with the source of Solr 4.7 (yes, it is a bit old, but I don’t think it is fundamentally changed). I have put log.debug() statements in the org.apache.solr.response.VelocityResponseWriter.write() method to verify that my query is not getting mangled with the URL encoding, and it is not. So, since I am getting different results for the same queries, I am curious to see what the core Solr engine is receiving when I run the same query from different handlers. Could someone tell me the class which has the core Solr engine that is used irrespective of which Request Handler makes the request? I am trying to put debug statements into this class to log the value of the query parameter that it receives. The results are different, so I think one or more parameters are different. Thank you in advance, O. O.
Re: Luke and SOLR search giving different results
Thanks Shawn and Jack, I changed solrconfig to set defaul query field (qf) to field content. It works fine now. Erol Akarsu On Mon, Dec 3, 2012 at 5:03 PM, Shawn Heisey wrote: > On 12/3/2012 1:44 PM, Erol Akarsu wrote: > >> I tried as search query not "baş" but "features:baş" in field "q" in >> SOLR >> GUI. And, I got result! >> >> In the one document, I had some fields type of text_eng, text_general and >> one field features type of text_tr. If I don't specify field name, SOLR >> use >> EnglishAnalyzer. If I do, it uses the analyzer specific to field specified >> in search query string. >> > > Your config is set up to search against a field named "text" by default - > either by a setting in schema.xml or a "df" parameter in your search > handler definition in solrconfig.xml. If you are using (e)dismax, it might > be qf/pf parameters instead of df. > > The field named text is not properly set up for this search. Your > attachment at the beginning of this thread indicates that either you do not > have a text field for this document at all, or that field is not stored. > If the text field is a copyField as Jack has mentioned, note that it > doesn't matter what analysis you are doing on features -- the copy is done > before analysis, so it is completely separate. > > Thanks, > Shawn > >
Re: Luke and SOLR search giving different results
On 12/3/2012 1:44 PM, Erol Akarsu wrote: I tried as search query not "baş" but "features:baş" in field "q" in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Your config is set up to search against a field named "text" by default - either by a setting in schema.xml or a "df" parameter in your search handler definition in solrconfig.xml. If you are using (e)dismax, it might be qf/pf parameters instead of df. The field named text is not properly set up for this search. Your attachment at the beginning of this thread indicates that either you do not have a text field for this document at all, or that field is not stored. If the text field is a copyField as Jack has mentioned, note that it doesn't matter what analysis you are doing on features -- the copy is done before analysis, so it is completely separate. Thanks, Shawn
Re: Luke and SOLR search giving different results
As I pointed out in my message, your query is indicating that "text" is your default search field. So, either choose a different default search field, or assure that the "text" field has the desired field type. If you want to change the default search field, eEither use a "df" request parameter or change the "df" default value for the request handler in the solrconfig.xml. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 3:44 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I see interesting stuff here now. I tried as search query not "baş" but "features:baş" in field "q" in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Is this true? Erol Akarsu On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu wrote: Jack, I have these in schema.xml that defines "features" as type of text_tr But unfortunately, this fails. On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote: Ah! See where it says "name="parsedquery_toString">**text:baş"? Your query is against the "text" field, which probably doesn't have the Turkish analysis. There is probably a copyField from "features" to "text". You use the "text_tr" field type for "features", but probably not for the "text" field. -- Jack Krupansky -----Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding="UTF-8" to all elements in server.xml in Tomcat 7. As you see below, when I search word "baş" with debug mode I can see empty response. But when I search word "baştan", I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search "baştan" but not the root word "baş". Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change "Analyser to use for query parsing" to EnglishAnalyser, then it can not find word "baş" but it can with TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason 0 58 true baş xml baş baş text:baş **text:baş LuceneQParser 38.0 16.0 3.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 10.0 0 2 true baştan xml htt://111.a.b1 6H500F0 tr Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300 Maxtor Corp. maxtor electronics hard drive SATA 3.0Gb/s, NCQ 8.5ms seek 16MB cache Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izle
Re: Luke and SOLR search giving different results
Jack, I see interesting stuff here now. I tried as search query not "baş" but "features:baş" in field "q" in SOLR GUI. And, I got result! In the one document, I had some fields type of text_eng, text_general and one field features type of text_tr. If I don't specify field name, SOLR use EnglishAnalyzer. If I do, it uses the analyzer specific to field specified in search query string. Is this true? Erol Akarsu On Mon, Dec 3, 2012 at 1:30 PM, Erol Akarsu wrote: > Jack, > > I have these in schema.xml that defines "features" as type of text_tr > > But unfortunately, this fails. > > > multiValued="true"/> > > > > positionIncrementGap="100"> > > > > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > language="Turkish"/> > > > > > > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > language="Turkish"/> > > > > > > > On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote: > >> Ah! See where it says "**text:baş"? >> Your query is against the "text" field, which probably doesn't have the >> Turkish analysis. >> >> There is probably a copyField from "features" to "text". You use the >> "text_tr" field type for "features", but probably not for the "text" field. >> >> >> -- Jack Krupansky >> >> -Original Message- From: Erol Akarsu >> Sent: Monday, December 03, 2012 1:06 PM >> >> To: solr-user@lucene.apache.org >> Subject: Re: Luke and SOLR search giving different results >> >> Jack, >> >> I have already set tomcat server fro UTF-Encoding before. I have added >> URIEncoding="UTF-8" to all elements in server.xml in Tomcat >> 7. >> >> As you see below, when I search word "baş" with debug mode I can see >> empty response. But when I search word "baştan", I can get correct >> response. >> >> It seems to me that TurkishAnalyser is not being used in SOLR search >> because we can make only full word search "baştan" but not the root word >> "baş". Probably, English Analyzer is being used and could not find the >> root >> word. For example, in Luke, if I change "Analyser to use for query >> parsing" >> to EnglishAnalyser, then it can not find word "baş" but it can with >> TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. >> >> Is this assumption true? I could not find any other reason >> >> >> >> >> >>0 >>58 >> >>true >>baş >>xml >> >> >> >> >>baş >>baş >>text:baş >>**text:baş >> >>LuceneQParser >> >>38.0 >> >>16.0 >>> name="org.apache.solr.handler.**component.QueryComponent"> >>3.0 >> >>> name="org.apache.solr.handler.**component.FacetComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.HighlightComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.StatsComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.DebugComponent"> >>0.0 >> >> >> >>10.0 >>> name="org.apache.solr.handler.**component.QueryComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.FacetComponent"> >>0.0 >> >>> name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >>0.0 >> >>>
Re: Luke and SOLR search giving different results
Jack, I have these in schema.xml that defines "features" as type of text_tr But unfortunately, this fails. On Mon, Dec 3, 2012 at 1:15 PM, Jack Krupansky wrote: > Ah! See where it says "**text:baş"? > Your query is against the "text" field, which probably doesn't have the > Turkish analysis. > > There is probably a copyField from "features" to "text". You use the > "text_tr" field type for "features", but probably not for the "text" field. > > > -- Jack Krupansky > > -Original Message- From: Erol Akarsu > Sent: Monday, December 03, 2012 1:06 PM > > To: solr-user@lucene.apache.org > Subject: Re: Luke and SOLR search giving different results > > Jack, > > I have already set tomcat server fro UTF-Encoding before. I have added > URIEncoding="UTF-8" to all elements in server.xml in Tomcat > 7. > > As you see below, when I search word "baş" with debug mode I can see > empty response. But when I search word "baştan", I can get correct > response. > > It seems to me that TurkishAnalyser is not being used in SOLR search > because we can make only full word search "baştan" but not the root word > "baş". Probably, English Analyzer is being used and could not find the root > word. For example, in Luke, if I change "Analyser to use for query parsing" > to EnglishAnalyser, then it can not find word "baş" but it can with > TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. > > Is this assumption true? I could not find any other reason > > > > > >0 >58 > >true >baş >xml > > > > >baş >baş >text:baş >**text:baş > >LuceneQParser > >38.0 > >16.0 > name="org.apache.solr.handler.**component.QueryComponent"> >3.0 > > name="org.apache.solr.handler.**component.FacetComponent"> >0.0 > > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >0.0 > > name="org.apache.solr.handler.**component.HighlightComponent"> >0.0 > > name="org.apache.solr.handler.**component.StatsComponent"> >0.0 > > name="org.apache.solr.handler.**component.DebugComponent"> >0.0 > > > >10.0 > name="org.apache.solr.handler.**component.QueryComponent"> >0.0 > > name="org.apache.solr.handler.**component.FacetComponent"> >0.0 > > name="org.apache.solr.handler.**component.**MoreLikeThisComponent"> >0.0 > > name="org.apache.solr.handler.**component.HighlightComponent"> >0.0 > > name="org.apache.solr.handler.**component.StatsComponent"> >0.0 > > name="org.apache.solr.handler.**component.DebugComponent"> >10.0 > > > > > > > > >0 >2 > >true >baştan >xml > > > > >htt://111.a.b1 >6H500F0 >tr >Maxtor DiamondMax 11 - hard drive - 500 GB - > SATA-300 > >Maxtor Corp. >maxtor > >electronics >hard drive > > >SATA 3.0Gb/s, NCQ >8.5ms seek >16MB cache > >Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim > senaryoyu!" diyerek >baştan savma reklamlarla kotarmaya bakıyor işi. > Futbolcu Arda Turan >ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un > oynatıldığı &
Re: Luke and SOLR search giving different results
Ah! See where it says "text:baş"? Your query is against the "text" field, which probably doesn't have the Turkish analysis. There is probably a copyField from "features" to "text". You use the "text_tr" field type for "features", but probably not for the "text" field. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 1:06 PM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, I have already set tomcat server fro UTF-Encoding before. I have added URIEncoding="UTF-8" to all elements in server.xml in Tomcat 7. As you see below, when I search word "baş" with debug mode I can see empty response. But when I search word "baştan", I can get correct response. It seems to me that TurkishAnalyser is not being used in SOLR search because we can make only full word search "baştan" but not the root word "baş". Probably, English Analyzer is being used and could not find the root word. For example, in Luke, if I change "Analyser to use for query parsing" to EnglishAnalyser, then it can not find word "baş" but it can with TurkishAnalyser" only. I guess SOLR is not using TurkishAnalyzer. Is this assumption true? I could not find any other reason 0 58 true baş xml baş baş text:baş text:baş LuceneQParser 38.0 16.0 3.0 0.0 0.0 0.0 0.0 0.0 10.0 0.0 0.0 0.0 0.0 0.0 10.0 0 2 true baştan xml htt://111.a.b1 6H500F0 tr Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300 Maxtor Corp. maxtor electronics hard drive SATA 3.0Gb/s, NCQ 8.5ms seek 16MB cache Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" dedirterek. 350.0 350,USD 6 true 2006-02-13T15:26:37Z 1420300467908378624 baştan baştan text:baştan text:baştan 0.028767452 = (MATCH) weight(text:baştan in 0) [DefaultSimilarity], result of: 0.028767452 = fieldWeight in 0, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 0.30685282 = idf(docFreq=1, maxDocs=1) 0.09375 = fieldNorm(doc=0) LuceneQParser 2.0 1.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0
Re: Luke and SOLR search giving different results
expected term that matches what Luke reports for the > index and what Solr Admin Analysis also reports for index analysis. > > -- Jack Krupansky > > -Original Message- From: Erol Akarsu > Sent: Monday, December 03, 2012 11:35 AM > > To: solr-user@lucene.apache.org > Subject: Re: Luke and SOLR search giving different results > > Jack, > > Yes. > > I expect SOLR should give same search results as Luked does. > > Term analyzer gives correct answer in SOLR as expected. But SOLR does not > return correct search results. > > I don't know why. > > Erol Akarsu > > On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky * > *wrote: > > So, does that highlight the problem for you or not? Is the term analyzed >> as you expected? >> >> -- Jack Krupansky >> >> From: Erol Akarsu >> Sent: Monday, December 03, 2012 8:44 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Luke and SOLR search giving different results >> >> Jack, >> >> Thanks for help. >> >> I removed data folder of SOLR and indexed this sample doc from scratch, >> there was no document in SOLR but only one. >> >> When I analysed , I can see stemming is correct and I can see these for >> words "bul", "baş" ,"gör" and "umut" in SF row >> I attached analyse screens >> >> Erol Akarsu >> >> >> On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky >> wrote: >> >> Have you tried using the Solr Admin Analysis page, using the word and a >> few words of context for index analysis and the word alone for query >> analysis? >> >> And be sure to fully reindex if you change ANYTHING in the schema fields >> or field types. >> >> -- Jack Krupansky >> >> From: Erol Akarsu >> Sent: Sunday, December 02, 2012 10:38 PM >> To: solr-user@lucene.apache.org >> Subject: Luke and SOLR search giving different results >> >> >> Hi, >> >> I am trying to apply SOLR for Turkish Language for my research. >> >> Instead of using language identification, I manually assigned Turkish >> language for a sample test document. I have configured SOLR schema.xml, >> activated the part below. I have added the attached document >> testTurkishDoc.xml that is inserted to SOLR database. >> >> But searching for raw Lucene index through Luke and SOLR 4.0 search >> though GUI is giving different results. In picture Selection_006.png, the >> word "baş" is listed as top term. I search the word "baş" in Luke and I >> got >> the result result that is only document, shown in Selection_004.png. >> >> But in SOLR GUI, I am getting empty result for word "baş" in picture >> Selection_002.png. >> >> In the text we have features field, that has word "baştan" that is >> being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI >> is >> doing search different than Luke. I could not figure it out why I could >> not >> find it while getting in Luke. The same thing happens for words "umut", >> "bul" and "gör". >> >> I will appreciate if you can help me to get same results from SOLR UI. >> >> >> >> Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" >> diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda >> Turan >> ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim >> firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı >> reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, >> sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir >> de >> Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, >> Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" >> dedirterek. >> >> >> >> >> Added to schema.xml for SOLR: >> >> > multiValued="true"/> >> > positionIncrementGap="100"> >> >> >> >> > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/> >> > language="Turkish"/> >> >> >> >> >> > words="lang/stopwords_tr.txt" enablePositionIncrements="**true"/> >> > language="Turkish"/> >> >> >> >> >> >> >> >
Re: Luke and SOLR search giving different results
Two points: 1. Possibly an encoding problem with your container? Is UTF-8 encoding enabled? 2. Add &debugQuery=true to your query (from the browser) and see if the parser_query has the expected term that matches what Luke reports for the index and what Solr Admin Analysis also reports for index analysis. -- Jack Krupansky -Original Message- From: Erol Akarsu Sent: Monday, December 03, 2012 11:35 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Yes. I expect SOLR should give same search results as Luked does. Term analyzer gives correct answer in SOLR as expected. But SOLR does not return correct search results. I don't know why. Erol Akarsu On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky wrote: So, does that highlight the problem for you or not? Is the term analyzed as you expected? -- Jack Krupansky From: Erol Akarsu Sent: Monday, December 03, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words "bul", "baş" ,"gör" and "umut" in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word "baş" is listed as top term. I search the word "baş" in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word "baş" in picture Selection_002.png. In the text we have features field, that has word "baştan" that is being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words "umut", "bul" and "gör". I will appreciate if you can help me to get same results from SOLR UI. Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" dedirterek. Added to schema.xml for SOLR:
Re: Luke and SOLR search giving different results
Jack, Yes. I expect SOLR should give same search results as Luked does. Term analyzer gives correct answer in SOLR as expected. But SOLR does not return correct search results. I don't know why. Erol Akarsu On Mon, Dec 3, 2012 at 11:21 AM, Jack Krupansky wrote: > So, does that highlight the problem for you or not? Is the term analyzed > as you expected? > > -- Jack Krupansky > > From: Erol Akarsu > Sent: Monday, December 03, 2012 8:44 AM > To: solr-user@lucene.apache.org > Subject: Re: Luke and SOLR search giving different results > > Jack, > > Thanks for help. > > I removed data folder of SOLR and indexed this sample doc from scratch, > there was no document in SOLR but only one. > > When I analysed , I can see stemming is correct and I can see these for > words "bul", "baş" ,"gör" and "umut" in SF row > I attached analyse screens > > Erol Akarsu > > > On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky > wrote: > > Have you tried using the Solr Admin Analysis page, using the word and a > few words of context for index analysis and the word alone for query > analysis? > > And be sure to fully reindex if you change ANYTHING in the schema fields > or field types. > > -- Jack Krupansky > > From: Erol Akarsu > Sent: Sunday, December 02, 2012 10:38 PM > To: solr-user@lucene.apache.org > Subject: Luke and SOLR search giving different results > > > Hi, > > I am trying to apply SOLR for Turkish Language for my research. > > Instead of using language identification, I manually assigned Turkish > language for a sample test document. I have configured SOLR schema.xml, > activated the part below. I have added the attached document > testTurkishDoc.xml that is inserted to SOLR database. > > But searching for raw Lucene index through Luke and SOLR 4.0 search > though GUI is giving different results. In picture Selection_006.png, the > word "baş" is listed as top term. I search the word "baş" in Luke and I got > the result result that is only document, shown in Selection_004.png. > > But in SOLR GUI, I am getting empty result for word "baş" in picture > Selection_002.png. > > In the text we have features field, that has word "baştan" that is > being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is > doing search different than Luke. I could not figure it out why I could not > find it while getting in Luke. The same thing happens for words "umut", > "bul" and "gör". > > I will appreciate if you can help me to get same results from SOLR UI. > > > > Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" > diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan > ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim > firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı > reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, > sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de > Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, > Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" > dedirterek. > > > > > Added to schema.xml for SOLR: > >multiValued="true"/> >positionIncrementGap="100"> > > > >words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> >language="Turkish"/> > > > > >words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> >language="Turkish"/> > > > > > >
Re: Luke and SOLR search giving different results
So, does that highlight the problem for you or not? Is the term analyzed as you expected? -- Jack Krupansky From: Erol Akarsu Sent: Monday, December 03, 2012 8:44 AM To: solr-user@lucene.apache.org Subject: Re: Luke and SOLR search giving different results Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words "bul", "baş" ,"gör" and "umut" in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote: Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word "baş" is listed as top term. I search the word "baş" in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word "baş" in picture Selection_002.png. In the text we have features field, that has word "baştan" that is being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words "umut", "bul" and "gör". I will appreciate if you can help me to get same results from SOLR UI. Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek. Added to schema.xml for SOLR:
Re: Luke and SOLR search giving different results
Jack, Thanks for help. I removed data folder of SOLR and indexed this sample doc from scratch, there was no document in SOLR but only one. When I analysed , I can see stemming is correct and I can see these for words "bul", "baş" ,"gör" and "umut" in SF row I attached analyse screens Erol Akarsu On Sun, Dec 2, 2012 at 11:00 PM, Jack Krupansky wrote: > Have you tried using the Solr Admin Analysis page, using the word and a > few words of context for index analysis and the word alone for query > analysis? > > And be sure to fully reindex if you change ANYTHING in the schema fields > or field types. > > -- Jack Krupansky > > From: Erol Akarsu > Sent: Sunday, December 02, 2012 10:38 PM > To: solr-user@lucene.apache.org > Subject: Luke and SOLR search giving different results > > Hi, > > I am trying to apply SOLR for Turkish Language for my research. > > Instead of using language identification, I manually assigned Turkish > language for a sample test document. I have configured SOLR schema.xml, > activated the part below. I have added the attached document > testTurkishDoc.xml that is inserted to SOLR database. > > But searching for raw Lucene index through Luke and SOLR 4.0 search though > GUI is giving different results. In picture Selection_006.png, the word > "baş" is listed as top term. I search the word "baş" in Luke and I got the > result result that is only document, shown in Selection_004.png. > > But in SOLR GUI, I am getting empty result for word "baş" in picture > Selection_002.png. > > In the text we have features field, that has word "baştan" that is being > derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing > search different than Luke. I could not figure it out why I could not find > it while getting in Luke. The same thing happens for words "umut", "bul" > and "gör". > > I will appreciate if you can help me to get same results from SOLR UI. > > > >Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" > diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan > ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim > firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı > reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, > sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de > Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, > Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" > dedirterek. > > > > > Added to schema.xml for SOLR: > > multiValued="true"/> > positionIncrementGap="100"> > > > > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > language="Turkish"/> > > > > > words="lang/stopwords_tr.txt" enablePositionIncrements="true"/> > language="Turkish"/> > > > > >
Re: Luke and SOLR search giving different results
Have you tried using the Solr Admin Analysis page, using the word and a few words of context for index analysis and the word alone for query analysis? And be sure to fully reindex if you change ANYTHING in the schema fields or field types. -- Jack Krupansky From: Erol Akarsu Sent: Sunday, December 02, 2012 10:38 PM To: solr-user@lucene.apache.org Subject: Luke and SOLR search giving different results Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word "baş" is listed as top term. I search the word "baş" in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word "baş" in picture Selection_002.png. In the text we have features field, that has word "baştan" that is being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words "umut", "bul" and "gör". I will appreciate if you can help me to get same results from SOLR UI. Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek. Added to schema.xml for SOLR:
Luke and SOLR search giving different results
Hi, I am trying to apply SOLR for Turkish Language for my research. Instead of using language identification, I manually assigned Turkish language for a sample test document. I have configured SOLR schema.xml, activated the part below. I have added the attached document testTurkishDoc.xml that is inserted to SOLR database. But searching for raw Lucene index through Luke and SOLR 4.0 search though GUI is giving different results. In picture Selection_006.png, the word "baş" is listed as top term. I search the word "baş" in Luke and I got the result result that is only document, shown in Selection_004.png. But in SOLR GUI, I am getting empty result for word "baş" in picture Selection_002.png. In the text we have features field, that has word "baştan" that is being derived from root word "baş" in Turkish Grammar. Somehow, SOLR GUI is doing search different than Luke. I could not figure it out why I could not find it while getting in Luke. The same thing happens for words "umut", "bul" and "gör". I will appreciate if you can help me to get same results from SOLR UI. Firmalarsa "Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!" diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye'ye getirilen Paris Hilton'un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda'nın kabinde papağan gibi tekrarladığı "My darling!" repliği, sonunda Paris'i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris'in ancak 5 kez izledikten sonra anlaşılan "Paris seçti, firma yaptı, Arda bayıldı." sözleriyle kazındı hafızalara, "Keşke unutabilsek!" dedirterek. Added to schema.xml for SOLR: htt://111.a.b1 6H500F0 tr Maxtor DiamondMax 11 - hard drive - 500 GB - SATA-300 Maxtor Corp. maxtor electronics hard drive SATA 3.0Gb/s, NCQ 8.5ms seek 16MB cache 350 6 true Firmalarsa “Nasılsa buldum oynatacak ünlüyü, neyleyim senaryoyu!” diyerek baştan savma reklamlarla kotarmaya bakıyor işi. Futbolcu Arda Turan ve büyük umutlarla Türkiye’ye getirilen Paris Hilton’un oynatıldığı giyim firması reklamı da tam bir fiyasko. Birbirinden ünlü bu iki ismin oynadığı reklam Arda’nın kabinde papağan gibi tekrarladığı “My darling!” repliği, sonunda Paris’i görünce anlam veremediğimiz uyduruk bayılma sahnesi, bir de Paris’in ancak 5 kez izledikten sonra anlaşılan “Paris seçti, firma yaptı, Arda bayıldı.” sözleriyle kazındı hafızalara, “Keşke unutabilsek!” dedirterek. 2006-02-13T15:26:37Z
Re: synonyms.txt: different results on admin and on site..
you are right about wildcards and analysis stuff... so any way of putting wildcards in for analysis? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3322026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms.txt: different results on admin and on site..
Wildcard terms are not analyzed, so your synonyms.txt may come into play here, have you check the analysis for deniz* ? François On Sep 7, 2011, at 10:08 PM, deniz wrote: > well yea you are right... i realised that lack of detail issue here... so > here it comes... > > > This is from my schema.xml and basically i have a synonyms.txt file which > contains > > deniz,denis,denise > > > After posting here, I have checked some stuff that I have faced before, > while trying to add accented letters to the system... so it seems like same > or similar stuff... so... > > As i want to support partial matches, the search string is modified on php > side. if user enters deniz, it is sent to solr as deniz* > > when i check on solr admin, i was able to make searches with > deniz,denise,denis and they all return correct results, but when i put the > wildcard, i get nothing... > > so with the above settings; > > deniz > denise > denis > works smoothly > > deniz* > denise* > denis* > returns nothing... > > > should i implement some kinda analyzer or tokenizer or any kinda component > to overtime this thing? > > > > > > > > > > > Rob Casson wrote: >> >> you should probably post your schema.xml and some parts of your >> synonyms.txt. it could be differences between your index and query >> analysis chains, synonym expansion errors, etc, but folks will likely >> need more details to help you out. >> >> cheers, >> rob >> >> On Wed, Sep 7, 2011 at 9:46 PM, deniz <denizdurmu...@gmail.com> >> wrote: >>> could it be related with analysis issue about synonyms once again? >>> >>> >>> >>> - >>> Zeki ama calismiyor... Calissa yapar... >>> -- >>> View this message in context: >>> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html >>> Sent from the Solr - User mailing list archive at Nabble.com. >>> >> > > > - > Zeki ama calismiyor... Calissa yapar... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms.txt: different results on admin and on site..
well yea you are right... i realised that lack of detail issue here... so here it comes... This is from my schema.xml and basically i have a synonyms.txt file which contains deniz,denis,denise After posting here, I have checked some stuff that I have faced before, while trying to add accented letters to the system... so it seems like same or similar stuff... so... As i want to support partial matches, the search string is modified on php side. if user enters deniz, it is sent to solr as deniz* when i check on solr admin, i was able to make searches with deniz,denise,denis and they all return correct results, but when i put the wildcard, i get nothing... so with the above settings; deniz denise denis works smoothly deniz* denise* denis* returns nothing... should i implement some kinda analyzer or tokenizer or any kinda component to overtime this thing? Rob Casson wrote: > > you should probably post your schema.xml and some parts of your > synonyms.txt. it could be differences between your index and query > analysis chains, synonym expansion errors, etc, but folks will likely > need more details to help you out. > > cheers, > rob > > On Wed, Sep 7, 2011 at 9:46 PM, deniz <denizdurmu...@gmail.com> > wrote: >> could it be related with analysis issue about synonyms once again? >> >> >> >> - >> Zeki ama calismiyor... Calissa yapar... >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html >> Sent from the Solr - User mailing list archive at Nabble.com. >> > - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms.txt: different results on admin and on site..
you should probably post your schema.xml and some parts of your synonyms.txt. it could be differences between your index and query analysis chains, synonym expansion errors, etc, but folks will likely need more details to help you out. cheers, rob On Wed, Sep 7, 2011 at 9:46 PM, deniz wrote: > could it be related with analysis issue about synonyms once again? > > > > - > Zeki ama calismiyor... Calissa yapar... > -- > View this message in context: > http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: synonyms.txt: different results on admin and on site..
could it be related with analysis issue about synonyms once again? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318464.html Sent from the Solr - User mailing list archive at Nabble.com.
synonyms.txt: different results on admin and on site..
hi all... i have checked the list about the issue in the title, but couldnt find any related info... so my problem is: i change sysnonyms.txt and then reload the core without restarting the server. new synonyms works smoothly if i use admin interface of solr, however when i use the site which is written in php, i got nothing when i use one of the synonyms that i have added. any ideas why this is happening? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/synonyms-txt-different-results-on-admin-and-on-site-tp3318338p3318338.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching similar values for same field results in different results
That was it! thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2206087.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching similar values for same field results in different results
You have a problem with the analysis chain. When you do a query, the EnglishPorterFilter is cutting off the last part of your word, but you're not doing the same when indexing. I think that removing that filter from the chain will solve your problem. Remember that there are two different analysis chains, one for indexing time and one for querying time. I think that you didn't see the shortened word in analysis.jsp because you entered the text in the "Field Value (Index)" text box, so it was using the indexing time analysis chain. If you want to see the results of applying the querying time analysis chain, you should enter the text in the "Field Value (Query)" text box. Good luck, Juan Grande On Thu, Jan 6, 2011 at 10:58 AM, PeterKerk wrote: > > @iorixxx: > I ran: http://localhost:8983/solr/db/update/?optimize=true > This is the response: > > >0 >58 > > > > Then I ran: > > http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw > > This is response: > > >366 >153 > 16 > > > > So, it seems that nothing has changed there, and it looks like also before > the optimize operation the results were shown correct? > > when you say http caching, you mean the caching by the browser? Or does > Solr > have some caching by default? If the latter, how can I clear that cache? > > > @Erick: I added debugquery > > For "Strand en Zee" I see this: > > PhraseQuery(themes:"strand en zee") > > > Looks correct. > > > For "Kasteel en Landgoed" I see this: > > PhraseQuery(themes:"kasteel en landgo") > > > Which isnt correct! So it seems herein lies the problem. > > Now Im wondering why the value is cut off...this is my schema.xml: > > > > > words="stopwords_dutch.txt"/> > generateWordParts="1" > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > > > > > > ignoreCase="true" expand="true"/> > words="stopwords_dutch.txt"/> > generateWordParts="1" > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > multiValued="true" /> > multiValued="true"/> > > > I checked analysis.jsp: > filled in Field: "themes" > and Field value: "Kasteel en Landgoed" > > and schema.jsp, but I didnt see any weird results > > Now, Im wondering what else it could be.. > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Searching similar values for same field results in different results
@iorixxx: I ran: http://localhost:8983/solr/db/update/?optimize=true This is the response: 0 58 Then I ran: http://localhost:8983/solr/db/select/?indent=on&facet=on&q=*:*&facet.field=themes_raw This is response: 366 153 16 So, it seems that nothing has changed there, and it looks like also before the optimize operation the results were shown correct? when you say http caching, you mean the caching by the browser? Or does Solr have some caching by default? If the latter, how can I clear that cache? @Erick: I added debugquery For "Strand en Zee" I see this: PhraseQuery(themes:"strand en zee") Looks correct. For "Kasteel en Landgoed" I see this: PhraseQuery(themes:"kasteel en landgo") Which isnt correct! So it seems herein lies the problem. Now Im wondering why the value is cut off...this is my schema.xml: I checked analysis.jsp: filled in Field: "themes" and Field value: "Kasteel en Landgoed" and schema.jsp, but I didnt see any weird results Now, Im wondering what else it could be.. -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2205706.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching similar values for same field results in different results
Often adding &debugQuery=on to the URL can show you very useful information that helps pinpoint the problem. I confess I don't see anything amiss in what you've shown though. Also, look at the "schema browser" page off the admin page, and look at your "themes" field to see what is actually in your index, it may surprise you.. Finally, the admin/analysis page (turn debug on) may also help you to see exactly what tokenization is happening when indexing and querying. I'd guess that the behavior isn't exactly what you expect. Best Erick On Wed, Jan 5, 2011 at 10:47 AM, PeterKerk wrote: > > Something weird is happening. > > I have locations that can have 1 or more themes. > A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee" > > I checked in the database, there are many locations that have 1 or more of > these themes assigned to it. > > Also in the response xml when I do a general search I get: > > > > >366 >153<- 153 found >16 <- 16 found > > > > When I request this: > > http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title > I get 16 results. Which is expected. > > When I request this: > > http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title > I get 0 results!!! > > why?!? > > > definition in schema.xml: > > > multiValued="true" /> > multiValued="true"/> > > > > Why are these results differing? > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Re: Searching similar values for same field results in different results
> > uhm...how do I perform an optimize operation? :) http://localhost:8983/solr/db/update/?optimize=true
Re: Searching similar values for same field results in different results
uhm...how do I perform an optimize operation? :) -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199795.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Searching similar values for same field results in different results
> Something weird is happening. > > I have locations that can have 1 or more themes. > A theme can be: "Kasteel en Landgoed", or a theme can be > "Strand en Zee" > > I checked in the database, there are many locations that > have 1 or more of > these themes assigned to it. > > Also in the response xml when I do a general search I get: > > > > > 366 > 153 <- 153 > found > 16 <- 16 found > > > > When I request this: > http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title > I get 16 results. Which is expected. > > When I request this: > http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title > I get 0 results!!! > > why?!? May be you deleted those documents? Deleted terms can appear in facet section until you optimize. Can you run these queries after an optimize operation? What is the output of this after an optimize : facet=on&q=*:*&facet.field=themes_raw Also using browser to query/test solr sometimes gives old results due to http caching.
Searching similar values for same field results in different results
Something weird is happening. I have locations that can have 1 or more themes. A theme can be: "Kasteel en Landgoed", or a theme can be "Strand en Zee" I checked in the database, there are many locations that have 1 or more of these themes assigned to it. Also in the response xml when I do a general search I get: 366 153<- 153 found 16 <- 16 found When I request this: http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Strand%20en%20Zee%22&q=*:*&fl=id,title I get 16 results. Which is expected. When I request this: http://localhost:8983/solr/db/select/?indent=on&facet=true&fq=themes:%22Kasteel%20en%20Landgoed%22&q=*:*&fl=id,title I get 0 results!!! why?!? definition in schema.xml: Why are these results differing? -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-similar-values-for-same-field-results-in-different-results-tp2199269p2199269.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different Results..
--- On Wed, 12/22/10, satya swaroop wrote: > From: satya swaroop > Subject: Different Results.. > To: solr-user@lucene.apache.org > Date: Wednesday, December 22, 2010, 10:44 AM > Hi All, > i am getting > different results when i used with some escape keys.. > for example::: > 1) when i use this request > http://localhost:8080/solr/select?q=erlang!ericson > > the result obtained is > > start="0"> > > 2) when the request is > http://localhost:8080/solr/select?q=erlang/ericson > > the result is > > name="response" numFound="1" start="0"> > > > My query here is, do solr consider both the queries > differently and what do > it consider for !,/ and all other escape characters. > First of all ! has a special meaning. it means NOT. It is part of the query syntax. It is equivalent to minus - operator. q=erlang!ericson is parsed into : defaultSearchField:erlang -defaultSearchField:ericson You can see this by appending &debugQuery=on to your search URL. So you need to escape ! in your case. q=erlang\!ericson will return same result set as q=erlang/ericson You can see the complete list of special charter list. http://lucene.apache.org/java/2_9_1/queryparsersyntax.html#Escaping Special Characters
Re: Different Results..
We need more information about the the analyzers and tokenizers of the default field of your search Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/12/22 satya swaroop > Hi All, > i am getting different results when i used with some escape keys.. > for example::: > 1) when i use this request >http://localhost:8080/solr/select?q=erlang!ericson > the result obtained is > > > 2) when the request is > http://localhost:8080/solr/select?q=erlang/ericson >the result is > > > > My query here is, do solr consider both the queries differently and what do > it consider for !,/ and all other escape characters. > > > Regards, > satya >
Different Results..
Hi All, i am getting different results when i used with some escape keys.. for example::: 1) when i use this request http://localhost:8080/solr/select?q=erlang!ericson the result obtained is 2) when the request is http://localhost:8080/solr/select?q=erlang/ericson the result is My query here is, do solr consider both the queries differently and what do it consider for !,/ and all other escape characters. Regards, satya
Re: different results depending on result format
OK I solved the problem. It turns out that I was connecting to the server using its FQDN (rosen.ifactory.com). When, instead, I connect to it using the name "rosen" (which maps to the same IP using the default domain name configured in my resolver, ifactory.com), I get results back. I am looking into the virtual hosts config in tomcat; it seems as if there must indeed be another solr instance running; in fact I'm now concerned there might be two solr instances running against the same data folder. yargh. -Mike On 10/22/2010 09:05 AM, Mike Sokolov wrote: Yes - I really only have the one solr instance. And I have plenty of other cases where I am getting good results back via solrj. It's really a mystery. Unfortunately I have to catch up on other stuff I have been neglecting, but I'll follow up when I'm able to get a solution... -Mike On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote: strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolov wrote: quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Re: different results depending on result format
Yes - I really only have the one solr instance. And I have plenty of other cases where I am getting good results back via solrj. It's really a mystery. Unfortunately I have to catch up on other stuff I have been neglecting, but I'll follow up when I'm able to get a solution... -Mike On 10/22/2010 06:58 AM, Savvas-Andreas Moysidis wrote: strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolov wrote: quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
Re: different results depending on result format
strange..are you absolutely sure the two queries are directed to the same Solr instance? I'm running the same query from the admin page (which specifies the xml format) and I get the exact same results as solrj. On 21 October 2010 22:25, Mike Sokolov wrote: > quick follow-up: I also notice that the query from solrj gets version=1, > whereas the admin webapp puts version=2.2 on the query string, although this > param doesn't seem to change the xml results at all. Does this indicate an > older version of solrj perhaps? > > -Mike > > > On 10/21/2010 04:47 PM, Mike Sokolov wrote: > >> I'm experiencing something really weird: I get different results depending >> on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I >> spent quite a while staring at query params to make sure everything else is >> the same, and they do seem to be. At first I thought the problem related to >> the javabin format change that has been talked about recently, but I am >> using solr 1.4.0 and solrj 1.4.0. >> >> Notice in the two entries that the wt param is different and the hits >> result count is different. >> >> Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute >> INFO: [bopp.ba] webapp=/solr path=/select/ >> params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} >> hits=261 status=0 QTime=1 >> Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute >> INFO: [bopp.ba] webapp=/solr path=/select >> params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} >> hits=57 status=0 QTime=0 >> >> >> The xml format results seem to be the correct ones. So one thought I had >> is that I could somehow fall back to using xml format in solrj, but I tried >> SolrQuery.set('wt','xml') and that didn't have the desired effect (I get >> '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still >> javabin). >> >> >> Am I crazy? Is this a known issue? >> >> Thanks for any suggestions >> >>
Re: different results depending on result format
quick follow-up: I also notice that the query from solrj gets version=1, whereas the admin webapp puts version=2.2 on the query string, although this param doesn't seem to change the xml results at all. Does this indicate an older version of solrj perhaps? -Mike On 10/21/2010 04:47 PM, Mike Sokolov wrote: I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions
different results depending on result format
I'm experiencing something really weird: I get different results depending on whether I specify wt=javabin, and retrieve using SolrJ, or wt=xml. I spent quite a while staring at query params to make sure everything else is the same, and they do seem to be. At first I thought the problem related to the javabin format change that has been talked about recently, but I am using solr 1.4.0 and solrj 1.4.0. Notice in the two entries that the wt param is different and the hits result count is different. Oct 21, 2010 4:22:19 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select/ params={wt=xml&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=261 status=0 QTime=1 Oct 21, 2010 4:22:28 PM org.apache.solr.core.SolrCore execute INFO: [bopp.ba] webapp=/solr path=/select params={wt=javabin&rows=20&start=0&facet=true&facet.field=ref_taxid_ms&q=*:*&fl=uri,meta_ss&version=1} hits=57 status=0 QTime=0 The xml format results seem to be the correct ones. So one thought I had is that I could somehow fall back to using xml format in solrj, but I tried SolrQuery.set('wt','xml') and that didn't have the desired effect (I get '&wt=javabin&wt=javabin' in the log - ie the param is repeated, but still javabin). Am I crazy? Is this a known issue? Thanks for any suggestions -- Michael Sokolov Engineering Director www.ifactory.com @iFactoryBoston PubFactory: the revolutionary e-publishing platform from iFactory
Re: SolrJ - how separte different results from the same facet query?
I am interested in this as well ... Im also having the issue of understanding if a result has been elevated by the QueryElevation component. It should like SolrJ would need to know about some type of metadata contained within the docs but I haven't seen SolrJ dealing w/ payloads specifically yet. I also can't tell if these would require some feature request on those components or if it's something that is too custom that it would require writing new components. It sounds like retrieving a document should answer questions like ... "did this document come from a facet query?" "was this document elevated?" Etc. Maybe something the Debug component can handle if it can write payloads back to the results, etc. - Jon On Mar 15, 2010, at 7:56 AM, Saïd Radhouani wrote: > I'm faceting with a two different query ranges while using addFacetQuery. I > wonder wether it's possible using SolrJ to extract the result of each query > range separately. Here's is an example: > > addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc. > addFacetQuery("length:[* TO 5]");addFacetQuery("length:[5 TO 10]"); etc. > > When I use getFacetQuery, SolrJ gives me the responses of both query ranges > (prices and lengths) mixed in the same list. I wonder wether it's possible > to tell SolrJ to extract the response of a specific query range, i.e., tell > it to extract the price-based response in a list and the length-based > response in another list. It would be helpful to have something like > getFacetQuery(field=price), getFacetQuery(field=length), etc. > > Any ideas? > > Thanks.
SolrJ - how separte different results from the same facet query?
I'm faceting with a two different query ranges while using addFacetQuery. I wonder wether it's possible using SolrJ to extract the result of each query range separately. Here's is an example: addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc. addFacetQuery("length:[* TO 5]");addFacetQuery("length:[5 TO 10]"); etc. When I use getFacetQuery, SolrJ gives me the responses of both query ranges (prices and lengths) mixed in the same list. I wonder wether it's possible to tell SolrJ to extract the response of a specific query range, i.e., tell it to extract the price-based response in a list and the length-based response in another list. It would be helpful to have something like getFacetQuery(field=price), getFacetQuery(field=length), etc. Any ideas? Thanks.
SolrJ - separte different results from the same facet query?
I'm faceting with a two different query ranges while using addFacetQuery. I wonder wether it's possible using SolrJ to extract the result of each query range separately. Here's is my example: addFacetQuery("price:[* TO 150]"); addFacetQuery("price:[151 TO 300]"); etc. addFacetQuery("date:[* TO NOW]"); When I use getFacetQuery, SolrJ gives me the responses of both query ranges (prices and dates) mixed in the same list. I wonder wether it's possible to tell SolrJ to extract the response of a specific query range, i.e., tell it to extract the price-based response in a list and the date-based response in another list. It would be helpful to have something like getFacetQuery(field=price). Any ideas? Thanks.
Re: Different results return for capital and small letters.
Tushar, Could you ask on solr-user in the future, please? Your last sentence got cut off. Do you have LowerCaseFilter in both the index and query-time analyzer sections? Perhaps you should just paste that section of the config. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Tushar_Gandhi > To: solr-...@lucene.apache.org > Sent: Wednesday, December 31, 2008 3:26:32 AM > Subject: Different results return for capital and small letters. > > > Hi, >I am using solr 1.3. > I am facing a problem with the ordering of the results returned by the > solr. > Whenever I search for "cats", it is giving me the result. Nextly whenever I > am searching "CATS", I am getting same result but ordering is different. Is > this the behavior of the Solr ? Is there is any priority for searching > depending on the cases? > I want same result for both. What should I do if this is default behavior of > solr? > Is there is any problem with my indexing? > Also, I already have LowerCaseFilter configuration for the > Thanks, > Tushar > -- > View this message in context: > http://www.nabble.com/Different-results-return-for-capital-and-small-letters.-tp21228594p21228594.html > Sent from the Solr - Dev mailing list archive at Nabble.com.