Shawn, Yes , I tried those two queries with distrib=false , I get 0 results for first and 1 result for the second query( (i.e. server 3 shard 2 replica 2) consistently.
However if I run the same second query (i.e. server 3 shard 2 replica 2) with distrib=true, I sometimes get a result and sometimes not , should'nt this query always return a result when its pointing to a core that seems to have that document regardless of distrib=true or false ? Unfortunately I dont see anything particular in the logs to point to any information. BTW you asked me to replace the request handler , I use the select request handler ,so I cannot replace it with anything else , is that a problem ? Thanks. On Thu, Oct 16, 2014 at 12:05 AM, Shawn Heisey <apa...@elyograg.org> wrote: > On 10/15/2014 9:26 PM, S.L wrote: > >> Look at the logging information I provided below , looks like the results >> are only being returned back for this solrCloud cluster if the request >> goes to one of the two replicas of a shard. >> >> I have verified that numDocs in the replicas for a given shard is same but >> there is difference in the maxDoc and deletedDocs, does this signal the >> replicas being out of sync ? >> >> Even if the numDocs are same , how do we guarantee that those docs are >> identical and have the same uniquekeys , is there a way to verify this ? I >> am suspecting that as the numDocs is same across the replicas , and still >> only when the request goes to one of the replicas of the shard that I >> get >> a result back , the documents with in those replicas with in a shard are >> not an exact replica set of each other. >> >> I suspect the issue I am facing in 4.10.1 cloud is related to >> https://issues.apache.org/jira/browse/SOLR-4924 . >> >> Can anyone please let me know , how to solve this issue of intermittent no >> results for a query ? >> > > query with no results hits these cores: > server 2 shard 3 replica1 > server 3 shard 1 replica 1 > server 1 shard 2 replica 1 > > query with 1 result hits these cores: > server 2 shard 1 replica 2 > server 3 shard 2 replica 2 (found 1) > server 1 shard 3 replica 2 > > Here's some URLs for some testing. They are directed at specific shard > replicas and are specifically NOT distributed queries: > > http://server1.mydomain.com:8081/solr/dyCollection1_ > shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4- > 8ff7dffb8bfb&distrib=false > > http://server3.mydomain.com:8081/solr/dyCollection1_ > shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4- > 8ff7dffb8bfb&distrib=false > > If you run these queries (replacing server names and the /select request > handler as appropriate), do you get 0 results on the first one and 1 result > on the second one? If you do, then you've definitely got replicas out of > sync. If you get 1 result on both queries, then something else is > breaking. If by chance you have taken steps to fix this particular ID, > pick another one that you know has a problem. > > There is no automated way to detect replicas out of sync. You could > request all docs on both replicas with distrib=false&fl=id&sort=id+asc, > then compare the two lists. Depending on how many docs you have, those > queries could take a while to run. > > If the replicas are out of sync, are there any ERROR entries in the Solr > log, especially at the time that the problem docs were indexed? > > Thanks, > Shawn > >