Re: Upgrading cluster from 4 to 5. Slow replication detected.

2017-04-19 Thread Himanshu Sachdeva
I am guessing that the index has got corrupted somehow and deleted the data
directory on the slave. It has started copying the index. I'll report here
once that gets completed. If there is any other suggestion you might have
please reply back in the meantime. Thanks.

On Wed, Apr 19, 2017 at 12:21 PM, Himanshu Sachdeva <himan...@limeroad.com>
wrote:

> Hello Shawn,
>
> Thanks for taking the time out to help me. I had assigned 45GB to the heap
> as starting memory and maximum memory it can use. The logs show the
> following two warnings repeatedly :
>
>- IndexFetcher : Cannot complete replication attempt because file
>already exists.
>- IndexFetcher : Replication attempt was not successful - trying a
>full index replication reloadCore=false.
>
>
>
> On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:
>> > We're starting to upgrade our solr cluster to version 5.5. So we
>> > removed one slave node from the cluster and installed solr 5.5.4 on it
>> > and started solr. So it started copying the index from the master.
>> > However, we noticed a drop in the replication speed compared to the
>> > other nodes which were still running solr 4. To do a fair comparison,
>> > I removed another slave node from the cluster and disabled replication
>> > on it till the new node has caught up with it. When both these nodes
>> > were at the same index generation, I turned replication on for both
>> > the nodes. Now, it has been over 15 hours since this exercise and the
>> > new node has again started lagging behind. Currently, the node with
>> > solr 5.5 is seven generations behind the other node.
>>
>> Version 5 is capable of replication bandwidth throttling, but unless you
>> actually configure the maxWriteMBPerSec attribute in the replication
>> handler definition, this should not happen by default.
>>
>> One problem that I think might be possible is that the heap has been
>> left at the default 512MB on the new 5.5.4 install and therefore the
>> machine is doing constant full garbage collections to free up memory for
>> normal operation, which would make Solr run EXTREMELY slowly.
>> Eventually a machine in this state would most likely encounter an
>> OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
>> of the entire Solr instance.
>>
>> The heap might not be the problem ... if it's not, then I do not know
>> what is going on.  Are there any errors or warnings in solr.log?
>>
>> Thanks,
>> Shawn
>>
>>
>
>
> --
> Himanshu Sachdeva
>
>


-- 
Himanshu Sachdeva


Re: Upgrading cluster from 4 to 5. Slow replication detected.

2017-04-19 Thread Himanshu Sachdeva
Hello Shawn,

Thanks for taking the time out to help me. I had assigned 45GB to the heap
as starting memory and maximum memory it can use. The logs show the
following two warnings repeatedly :

   - IndexFetcher : Cannot complete replication attempt because file
   already exists.
   - IndexFetcher : Replication attempt was not successful - trying a full
   index replication reloadCore=false.



On Tue, Apr 18, 2017 at 6:58 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/14/2017 2:10 AM, Himanshu Sachdeva wrote:
> > We're starting to upgrade our solr cluster to version 5.5. So we
> > removed one slave node from the cluster and installed solr 5.5.4 on it
> > and started solr. So it started copying the index from the master.
> > However, we noticed a drop in the replication speed compared to the
> > other nodes which were still running solr 4. To do a fair comparison,
> > I removed another slave node from the cluster and disabled replication
> > on it till the new node has caught up with it. When both these nodes
> > were at the same index generation, I turned replication on for both
> > the nodes. Now, it has been over 15 hours since this exercise and the
> > new node has again started lagging behind. Currently, the node with
> > solr 5.5 is seven generations behind the other node.
>
> Version 5 is capable of replication bandwidth throttling, but unless you
> actually configure the maxWriteMBPerSec attribute in the replication
> handler definition, this should not happen by default.
>
> One problem that I think might be possible is that the heap has been
> left at the default 512MB on the new 5.5.4 install and therefore the
> machine is doing constant full garbage collections to free up memory for
> normal operation, which would make Solr run EXTREMELY slowly.
> Eventually a machine in this state would most likely encounter an
> OutOfMemoryError.  On non-windows systems, OOME will cause a forced halt
> of the entire Solr instance.
>
> The heap might not be the problem ... if it's not, then I do not know
> what is going on.  Are there any errors or warnings in solr.log?
>
> Thanks,
> Shawn
>
>


-- 
Himanshu Sachdeva


Upgrading cluster from 4 to 5. Slow replication detected.

2017-04-14 Thread Himanshu Sachdeva
Hi,

We're starting to upgrade our solr cluster to version 5.5. So we removed
one slave node from the cluster and installed solr 5.5.4 on it and started
solr. So it started copying the index from the master. However, we noticed
a drop in the replication speed compared to the other nodes which were
still running solr 4. To do a fair comparison, I removed another slave node
from the cluster and disabled replication on it till the new node has
caught up with it. When both these nodes were at the same index generation,
I turned replication on for both the nodes. Now, it has been over 15 hours
since this exercise and the new node has again started lagging behind.
Currently, the node with solr 5.5 is seven generations behind the other
node.
Is it because the master is running solr 4 and this node is running solr
5?  Has anyone else faced similar problem while upgrading?

-- 
Himanshu Sachdeva


Re: Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-10 Thread Himanshu Sachdeva
Hi Toke,

Thanks for your time and quick response. As you said, I changed our logging
level from SEVERE to INFO and indeed found the performance warning *Overlapping
onDeckSearchers=2* in the logs. I am considering limiting the
*maxWarmingSearchers* count in configuration but want to be sure that
nothing breaks in production in case simultaneous commits do happen
afterwards.

What would happen if we set *maxWarmingSearchers* count to 1 and make
simultaneous commit from different endpoints? I understand that solr will
prevent opening a new searcher for the second commit but is that all there
is to it? Does it mean solr will serve stale data( i.e. send stale data to
the slaves) ignoring the changes from the second commit? Will these changes
reflect only when a new searcher is initialized and will they be ignored till
then? Do we even need searchers on the master as we will be querying only
the slaves? What purpose do the searchers serve exactly? Your time and
guidance will be very much appreciated. Thank you.

On Thu, Apr 6, 2017 at 6:12 PM, Toke Eskildsen <t...@kb.dk> wrote:

> On Thu, 2017-04-06 at 16:30 +0530, Himanshu Sachdeva wrote:
> > We monitored the index size for a few days and found that it varies
> > widely from 11GB to 43GB.
>
> Lucene/Solr indexes consists of segments, each holding a number of
> documents. When a document is deleted, its bytes are not removed
> immediately, only marked. When a document is updated, it is effectively
> a delete and an add.
>
> If you have an index with 3 documents
>   segment-0 (live docs [0, 1, 2], deleted docs [])
> and update document 0 and 1, you will have
>   segment-0 (live docs [2], deleted docs [0, 1])
>   segment-1 (live docs
> [0, 1], deleted docs [])
> if you then update document 1 again, you will
> have
>   segment-0 (live docs [2], deleted docs [0, 1])
>   segment-1 (live
> docs [0], deleted docs [1])
>   segment-1 (live docs [1], deleted docs [])
>
> for a total of ([2] + [0, 1]) + ([0] + [1]) + ([1] + []) = 6 documents.
>
> The space is reclaimed when segments are merged, but depending on your
> setup and update pattern that may take some time. Furthermore there is a
> temporary overhead of merging, when the merged segment is being written and
> the old segments are still available. 4x the minimum size is fairly large,
> but not unrealistic, with enough index-updates.
>
> > Recently, we started getting a lot of out of memory errors on the
> > master. Everytime, solr becomes unresponsive and we need to restart
> > jetty to bring it back up. At the same we observed the variation in
> > index size. We are suspecting that these two problems may be linked.
>
> Quick sanity check: Look for "Overlapping onDeckSearchers" in your
> solr.log to see if your memory problems are caused by multiple open
> searchers:
> https://wiki.apache.org/solr/FAQ#What_does_.22exceeded_limit_of_maxWarm
> ingSearchers.3DX.22_mean.3F
> --
> Toke Eskildsen, Royal Danish Library
>



-- 
Himanshu Sachdeva


Solr Index size keeps fluctuating, becomes ~4x normal size.

2017-04-06 Thread Himanshu Sachdeva
Hi all,

We use solr in our website for product search. Currently, we have 2.1
million documents in the products core and these documents each have around
350 fields. >90% of the fields are indexed. We have this master instance of
solr running on 15GB RAM and 200GB drive. We have also configured 10 slaves
for handling the reads from website. Slaves poll master at an interval of
20 minutes. We monitored the index size for a few days and found that it
varies widely from 11GB to 43GB.


​
Recently, we started getting a lot of out of memory errors on the master.
Everytime, solr becomes unresponsive and we need to restart jetty to bring
it back up. At the same we observed the variation in index size. We are
suspecting that these two problems may be linked.

What could be the reason that the index size becomes almost 4x?  Why does
it vary so much? Any pointers will be appreciated. If you need any more
details on the config, please let me know.

-- 
Himanshu Sachdeva