Hi Erick,

Before I tried your suggestion of  issung a commit=true update, I realized that 
for eaach shard there was atleast a node that had its index directory named 
like index.<timestamp>.

I went ahead and deleted index directory that restarted that core and now the 
index directory got syched with the other node and is properly named as 'index' 
without any timestamp attached to it.This is now giving me consistent results 
for distrib=true using a load balancer.Also distrib=false returns expexted 
results for a given shard.

The underlying issue appears to be that in every shard the leader and the 
replica(follower) were out of sych.

How can I avoid this from happening again?

Thanks for your help!

Sent from my HTC

----- Reply message -----
From: "Erick Erickson" <erickerick...@gmail.com>
To: <solr-user@lucene.apache.org>
Subject: SolrCloud 4.7 not doing distributed search when querying from a load 
balancer.
Date: Fri, Oct 3, 2014 12:56 AM

Hmmmm. Assuming that you aren't re-indexing the doc you're searching for...

Try issuing http://blah blah:8983/solr/collection/update?commit=true.
That'll force all the docs to be searchable. Does <1> still hold for
the document in question? Because this is exactly backwards of what
I'd expect. I'd expect, if anything, the replica (I'm trying to call
it the "follower" when a distinction needs to be made since the leader
is a "replica" too....) would be out of sync. This is still a Bad
Thing, but the leader gets first crack at indexing thing.

bq: only the replica of the shard that has this key returns the result
, and the leader does not ,

Just to be sure we're talking about the same thing. When you say
"leader", you mean the shard leader, right? The filled-in circle on
the graph view from the admin/cloud page.

And let's see your soft and hard commit settings please.

Best,
Erick

On Thu, Oct 2, 2014 at 9:48 PM, S.L <simpleliving...@gmail.com> wrote:
> Eirck,
>
> 0> Load balancer is out of the picture
> .
> 1>When I query with *distrib=false* , I get consistent results as expected
> for those shards that dont have the key i.e I dont get the results back for
> those shards, however I just realized that while *distrib=false* is present
> in the query for the shard that is supposed to contain the key,only the
> replica of the shard that has this key returns the result , and the leader
> does not , looks like replica and the leader do not have the same data and
> replica seems to contain the key in the query for that shard.
>
> 2> By indexing I mean this collection is being populated by a web crawler.
>
> So looks like 1> above  is pointing to leader and replica being out of
> synch for atleast one shard.
>
>
>
> On Thu, Oct 2, 2014 at 11:57 PM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> bq: Also ,the collection is being actively indexed as I query this, could
>> that
>> be an issue too ?
>>
>> Not if the documents you're searching aren't being added as you search
>> (and all your autocommit intervals have expired).
>>
>> I would turn off indexing for testing, it's just one more variable
>> that can get in the way of understanding this.
>>
>> Do note that if the problem were endemic to Solr, there would probably
>> be a _lot_ more noise out there.
>>
>> So to recap:
>> 0> we can take the load balancer out of the picture all together.
>>
>> 1> when you query each shard individually with &distrib=true, every
>> replica in a particular shard returns the same count.
>>
>> 2> when you query without &distrib=true you get varying counts.
>>
>> This is very strange and not at all expected. Let's try it again
>> without indexing going on....
>>
>> And what do you mean by "indexing" anyway? How are documents being fed
>> to your system?
>>
>> Best,
>> Erick@PuzzledAsWell
>>
>> On Thu, Oct 2, 2014 at 7:32 PM, S.L <simpleliving...@gmail.com> wrote:
>> > Erick,
>> >
>> > I would like to add that the interesting behavior i.e point #2 that I
>> > mentioned in my earlier reply  happens in all the shards , if this were
>> to
>> > be a distributed search issue this should have not manifested itself in
>> the
>> > shard that contains the key that I am searching for , looks like the
>> search
>> > is just failing as whole intermittently .
>> >
>> > Also ,the collection is being actively indexed as I query this, could
>> that
>> > be an issue too ?
>> >
>> > Thanks.
>> >
>> > On Thu, Oct 2, 2014 at 10:24 PM, S.L <simpleliving...@gmail.com> wrote:
>> >
>> >> Erick,
>> >>
>> >> Thanks for your reply, I tried your suggestions.
>> >>
>> >> 1 . When not using loadbalancer if  *I have distrib=false* I get
>> >> consistent results across the replicas.
>> >>
>> >> 2. However here's the insteresting part , while not using load balancer
>> if
>> >> I *dont have distrib=false* , then when I query a particular node ,I get
>> >> the same behaviour as if I were using a loadbalancer , meaning the
>> >> distributed search from a node works intermittently .Does this give any
>> >> clue ?
>> >>
>> >>
>> >>
>> >> On Thu, Oct 2, 2014 at 7:47 PM, Erick Erickson <erickerick...@gmail.com
>> >
>> >> wrote:
>> >>
>> >>> Hmmm, nothing quite makes sense here....
>> >>>
>> >>> Here are some experiments:
>> >>> 1> avoid the load balancer and issue queries like
>> >>> http://solr_server:8983/solr/collection/q=whatever&distrib=false
>> >>>
>> >>> the &distrib=false bit will cause keep SolrCloud from trying to send
>> >>> the queries anywhere, they'll be served only from the node you address
>> >>> them to.
>> >>> that'll help check whether the nodes are consistent. You should be
>> >>> getting back the same results from each replica in a shard (i.e. 2 of
>> >>> your 6 machines).
>> >>>
>> >>> Next, try your failing query the same way.
>> >>>
>> >>> Next, try your failing query from a browser, pointing it at successive
>> >>> nodes.
>> >>>
>> >>> Where is the first place problems show up?
>> >>>
>> >>> My _guess_ is that your load balancer isn't quite doing what you
>> think, or
>> >>> your cluster isn't set up the way you think it is, but those are
>> guesses.
>> >>>
>> >>> Best,
>> >>> Erick
>> >>>
>> >>> On Thu, Oct 2, 2014 at 2:51 PM, S.L <simpleliving...@gmail.com> wrote:
>> >>> > Hi All,
>> >>> >
>> >>> > I am trying to query a 6 node Solr4.7  cluster with 3 shards and  a
>> >>> > replication factor of 2 .
>> >>> >
>> >>> > I have fronted these 6 Solr nodes using a load balancer , what I
>> notice
>> >>> is
>> >>> > that every time I do a search of the form
>> >>> > q=*:*&fq=(id:9e78c064-919f-4ef3-b236-dc66351b4acf)  it gives me a
>> result
>> >>> > only once in every 3 tries , telling me that the load balancer is
>> >>> > distributing the requests between the 3 shards and SolrCloud only
>> >>> returns a
>> >>> > result if the request goes to the core that as that id .
>> >>> >
>> >>> > However if I do a simple search like q=*:* , I consistently get the
>> >>> right
>> >>> > aggregated results back of all the documents across all the shards
>> for
>> >>> > every request from the load balancer. Can someone please let me know
>> >>> what
>> >>> > this is symptomatic of ?
>> >>> >
>> >>> > Somehow Solr Cloud seems to be doing search query distribution and
>> >>> > aggregation for queries of type *:* only.
>> >>> >
>> >>> > Thanks.
>> >>>
>> >>
>> >>
>>

Reply via email to