Shawn,

Yes , I tried those two queries with distrib=false , I get 0 results for
first and 1 result  for the second query( (i.e. server 3 shard 2 replica
2)  consistently.

However if I run the same second query (i.e. server 3 shard 2 replica 2)
with distrib=true, I sometimes get a result and sometimes not , should'nt
this query always return a result when its pointing to a core that seems to
have that document regardless of distrib=true or false ?

Unfortunately I dont see anything particular in the logs to point to any
information.

BTW you asked me to replace the request handler , I use the select request
handler ,so I cannot replace it with anything else , is that  a problem ?

Thanks.

On Thu, Oct 16, 2014 at 12:05 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/15/2014 9:26 PM, S.L wrote:
>
>> Look at the logging information I provided below , looks like the results
>> are only being returned back for this solrCloud cluster  if the request
>> goes to one of the two replicas of a shard.
>>
>> I have verified that numDocs in the replicas for a given shard is same but
>> there is difference in the maxDoc and deletedDocs, does this signal the
>> replicas being out of sync ?
>>
>> Even if the numDocs are same , how do we guarantee that those docs are
>> identical and have the same uniquekeys , is there a way to verify this ? I
>> am suspecting that  as the numDocs is same across the replicas , and still
>> only when the request goes to one of  the  replicas of the shard that I
>> get
>> a result back , the documents with in those replicas with in a shard are
>> not an exact replica set of each other.
>>
>> I suspect the issue I am facing in 4.10.1 cloud is related to
>> https://issues.apache.org/jira/browse/SOLR-4924  .
>>
>> Can anyone please let me know , how to solve this issue of intermittent no
>> results for a query ?
>>
>
> query with no results hits these cores:
> server 2 shard 3 replica1
> server 3 shard 1 replica 1
> server 1 shard 2 replica 1
>
> query with 1 result hits these cores:
> server 2 shard 1 replica 2
> server 3 shard 2 replica 2 (found 1)
> server 1 shard 3 replica 2
>
> Here's some URLs for some testing.  They are directed at specific shard
> replicas and are specifically NOT distributed queries:
>
> http://server1.mydomain.com:8081/solr/dyCollection1_
> shard2_replica1/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> http://server3.mydomain.com:8081/solr/dyCollection1_
> shard2_replica2/select?q=*:*&fq=id:e8995da8-7d98-4010-93b4-
> 8ff7dffb8bfb&distrib=false
>
> If you run these queries (replacing server names and the /select request
> handler as appropriate), do you get 0 results on the first one and 1 result
> on the second one?  If you do, then you've definitely got replicas out of
> sync.  If you get 1 result on both queries, then something else is
> breaking.  If by chance you have taken steps to fix this particular ID,
> pick another one that you know has a problem.
>
> There is no automated way to detect replicas out of sync.  You could
> request all docs on both replicas with distrib=false&fl=id&sort=id+asc,
> then compare the two lists.  Depending on how many docs you have, those
> queries could take a while to run.
>
> If the replicas are out of sync, are there any ERROR entries in the Solr
> log, especially at the time that the problem docs were indexed?
>
> Thanks,
> Shawn
>
>

Reply via email to