Re: SolrCloud query results order master vs replica
Thank you Sir for that confirmation! Nic On Wed, 2/5/14, Chris Hostetter wrote: Subject: Re: SolrCloud query results order master vs replica To: solr-user@lucene.apache.org Received: Wednesday, February 5, 2014, 11:33 AM : Just to make sure I interpret the results correctly: : - they all have a score of 1.7046129 : - the order they are presented in is therefore not related to the score, : it is just the order in which the data is internally stored (like an SQL : SELECT statement without ORDER BY clause) The order they are presented *is* related to the score -- but since the scores are all identical, and no secondary sort is specified, the behavior is undefined -- and can varry depending on the replica used. : - If I want to force a sort operation, I should add a sort parameter : in the query. The first sort will be done by score and then documents : with the same score will be sorted by my sort=?? paremeter? : - or will my sort parameter overwrite the score sorting? if you specify a sort param, it should be the full sort you want -- it won't be "appended" to the default score sort ... so if, for example, you wanted to sort by score, with a secondary fallback sort by your "id" field, use something like... sort=score desc, id asc -Hoss http://www.lucidworks.com/
Re: SolrCloud query results order master vs replica
eq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=15029)\n", "563023": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 15698) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 15698, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=15698)\n", "894824": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 19256) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 19256, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=19256)\n", "540476": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 20843) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 20843, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=20843)\n", "671271": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 23778) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 23778, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=23778)\n", "527929": "\n1.7046129 = (MATCH) weight(prod_doc:tylenol in 25053) [DefaultSimilarity], result of:\n 1.7046129 = fieldWeight in 25053, product of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n 6.8184514 = idf(docFreq=374, maxDocs=126169)\n0.25 = fieldNorm(doc=25053)\n" Just to make sure I interpret the results correctly: - they all have a score of 1.7046129 - the order they are presented in is therefore not related to the score, it is just the order in which the data is internally stored (like an SQL SELECT statement without ORDER BY clause) Follow up question: - If I want to force a sort operation, I should add a sort parameter in the query. The first sort will be done by score and then documents with the same score will be sorted by my sort=?? paremeter? - or will my sort parameter overwrite the score sorting? Thank you again for your help, Nic. On Mon, 2/3/14, Erick Erickson wrote: Subject: Re: SolrCloud query results order master vs replica To: solr-user@lucene.apache.org Received: Monday, February 3, 2014, 2:19 PM This should only be happening if the scores are _exactly_ the same, which is actually quite rare. In that case, the tied scores are broken by the internal Lucene document ID, and the relative order of the docs on the two machines isn't guaranteed to be the same, the internal ID can change during segment merging, which is NOT the same on both machines. But this should be relatively rare. If you're doing *:* queries or other such, then they aren't scored (see ConstantScoreQuery). So in practical terms, I suspect you're seeing some kind of test artifact. Try adding &debug=all to the query and you'll see how documents are scored. Best, Erick On Mon, Feb 3, 2014 at 6:57 AM, M. Flatterie wrote: > Greetings, > > My setup is: > - SolrCloud V4.3 > - On collection > - one shard > - 1 master, 1 replica > > so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger). > > My question: > - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents. > - if I do the same query on the replica, I get the same number of results but the docs are in a different order. > - I do not specify a sort parameter in my query, simply a q=. > - obviously if I force a sort order, everything is ok, same results, same order from both instances. > - am I wrong in expecting the same results, in the SAME order? > > Follow up question if the order is not guaranteed: > - should I force the dev. to use an explicit sort order? > - if we force the sort, we then bypass the ranking / score order do we not? > - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master? > > Other useful information: > - the admin page shows same number of documents in both instances. > - logs are clean, load and replication and queries worked ok. > - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency. > > Thank you for your help! > > Nic >
SolrCloud query results order master vs replica
Greetings, My setup is: - SolrCloud V4.3 - On collection - one shard - 1 master, 1 replica so each instance contains the entire index. The index is rather small and the replica is used for robustness. There is no need (IMHO) to split shard the index (yet, until the index gets bigger). My question: - if I do a query on a product name (that is what the index is about) on the master I get a certain number of results and the documents. - if I do the same query on the replica, I get the same number of results but the docs are in a different order. - I do not specify a sort parameter in my query, simply a q=. - obviously if I force a sort order, everything is ok, same results, same order from both instances. - am I wrong in expecting the same results, in the SAME order? Follow up question if the order is not guaranteed: - should I force the dev. to use an explicit sort order? - if we force the sort, we then bypass the ranking / score order do we not? - should I force all queries to go to the master and fall back on the replica only in the context of a total loss of the master? Other useful information: - the admin page shows same number of documents in both instances. - logs are clean, load and replication and queries worked ok. - the web application that queries SOLR round robins between the two instances, so getting results in a different order is bad for consistency. Thank you for your help! Nic
Re: Migrating from 4.2.1 to 4.3.0
Great it works, I am back on track! Thank you!!! Nic From: Shawn Heisey To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 4:25:09 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 On 5/16/2013 1:40 PM, M. Flatterie wrote: > Oups sorry about that, since it was referring context I thought it was the > Tomcat one. > > Here is the /home/solradm1/solr.xml file (comments removed!) > > > > host="${host:}" hostPort="8180" hostContext="${hostContext:}" >zkClientTimeout="${zkClientTimeout:15000}"> > > value="/home/solradm1/WebOrder_Collection/data" /> > value="/home/solradm1/WebOrder_Collection/ulog" /> > > > The hostContext attribute needs changing. It should be this instead: hostContext="${hostContext:/solr}" Looks like the previous version wasn't taking this attribute from your config, but the new version is. This is probably a bug that was fixed in 4.3. Thanks, Shawn
Re: Migrating from 4.2.1 to 4.3.0
Oups sorry about that, since it was referring context I thought it was the Tomcat one. Here is the /home/solradm1/solr.xml file (comments removed!) Note: I configure solr.data.dir and solr.ulog.dir so I can run two instances on the same system and separate the data and ulog directories between the instances. Nic. From: Shawn Heisey To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 3:29:41 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 On 5/16/2013 12:37 PM, M. Flatterie wrote: > Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 > to 4.3.0): > > docBase="/home/tcatadm1/apache-tomcat-7.0.39/webapps/solr.war" debug="0" >crossContext="true"> > value="/home/solradm1" override="true"/> > That is not the solr.xml Mark is referring to. This solr.xml configures tomcat to load Solr. You will have /home/solradm1/solr.xml as well, that is the one we are concerned with. Thanks, Shawn
Re: Migrating from 4.2.1 to 4.3.0
Afternoon, my solr.xml file is the following (and has not changed from 4.2.1 to 4.3.0): From: Mark Miller To: solr-user@lucene.apache.org Sent: Thursday, May 16, 2013 2:28:52 PM Subject: Re: Migrating from 4.2.1 to 4.3.0 Your solr webapp context appears to be "" rather than "solr". There was a JIRA issue in 4.3 that may have affected this, but I only saw it from a distance, so just a guess. What does it say in solr.xml for the context (an attribute on ) - Mark On May 16, 2013, at 2:02 PM, "M. Flatterie" wrote: > Greetings, I just started with Solr a couple weeks ago, with version 4.2.1. > > I installed the following setup: > - ZooKeeper: 3 instances ensemble > - Solr: on Tomcat, 4 instances > - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica > > - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica > > - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica > > > With version 4.2.1 everything works fine. But I do have a problem if I query > instance 3 for something in the WebOrder_Collection. I found that this is a > bug in 4.2.1,. I must query instances 1 or 2 to get results from > WebOrder_Collection. > > > Now that I have upgraded to 4.3.0 I have the following problem. My replicas > will not recover. The recovery will retry, and retry, ... forever. > > Details. If I look at the Zoo, I see that: > - node_name > 10.0.2.15:8180_solr in solr 4.2.1 > 10.0.2.15:8180_ in solr 4.3.0 > - base_url > http://10.0.2.15:8180/solr in solr 4.2.1 > > http://10.0.2.15:8180 in solr 4.3.0 > > My solr logs show this: > > 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy – > Error while trying to recover. > core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: > Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found > at >org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) > at >org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) > at >org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) > at >org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) > at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) > > > I have not been able to find more info than that. The Solr cloud diagram > shows instance1 as active and leader, instance 2 as recovering. My > solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag. > > > Any idea? I hope that it is a configuration issue on my part... > > Thank you for any help, Nic.
Migrating from 4.2.1 to 4.3.0
Greetings, I just started with Solr a couple weeks ago, with version 4.2.1. I installed the following setup: - ZooKeeper: 3 instances ensemble - Solr: on Tomcat, 4 instances - WebOrder_Collection: instances 1 and 2, 1 shard, 1 master, 1 replica - other_collectionA: instances 3 and 4, 1 shard, 1 master, 1 replica - other_CollectionB: instances 3 and 4, 1 shard, 1 master, 1 replica With version 4.2.1 everything works fine. But I do have a problem if I query instance 3 for something in the WebOrder_Collection. I found that this is a bug in 4.2.1,. I must query instances 1 or 2 to get results from WebOrder_Collection. Now that I have upgraded to 4.3.0 I have the following problem. My replicas will not recover. The recovery will retry, and retry, ... forever. Details. If I look at the Zoo, I see that: - node_name 10.0.2.15:8180_solr in solr 4.2.1 10.0.2.15:8180_ in solr 4.3.0 - base_url http://10.0.2.15:8180/solr in solr 4.2.1 http://10.0.2.15:8180 in solr 4.3.0 My solr logs show this: 8869687 [RecoveryThread] ERROR org.apache.solr.cloud.RecoveryStrategy – Error while trying to recover. core=WebOrder_Collection:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Server at http://10.0.2.15:8180 returned non ok status:404, message:Not Found at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:372) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:202) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:346) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:223) I have not been able to find more info than that. The Solr cloud diagram shows instance1 as active and leader, instance 2 as recovering. My solrconfig.xml are identical, except for the LUCENE_42 or LUCENE_43 tag. Any idea? I hope that it is a configuration issue on my part... Thank you for any help, Nic.