subject:"Recovery Issue \- Solr 6.6.1 and HDFS"

Well, you can  always manually change the ZK nodes, but whether just
setting a node's state to "leader" in ZK then starting the Solr
instance hosting that node would work... I don't know. Do consider
running CheckIndex on one of the replicas in question first though.

Best,
Erick

On Tue, Nov 21, 2017 at 3:06 PM, Joe Obernberger
 wrote:
> One other data point I just saw on one of the nodes.  It has the following
> error:
> 2017-11-21 22:59:48.886 ERROR
> (coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS s:shard14
> r:core_node175 x:UNCLASS_shard14_replica3]
> o.a.s.c.ShardLeaderElectionContext There was a problem trying to register as
> the leader:org.apache.solr.common.SolrException: Leader Initiated Recovery
> prevented leadership
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
> at
> org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
> at
> org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
> at
> org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
> at
> org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
>
> This stack trace repeats for a long while; looks like a recursive call.
>
>
> -Joe
>
>
> On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
>>
>> We sometimes also have replicas not recovering. If one replica is left
>> active the easiest is to then to delete the replica and create a new one.
>> When all replicas are down it helps most of the time to restart one of the
>> nodes that contains a replica in down state. If that also doesn't get the
>> replica to recover I would check the logs of the node and also that of the
>> overseer node. I have seen the same issue on Solr using local storage. The
>> main HDFS related issues we had so far was those lock files and if you
>> delete and recreate collections/cores and it sometimes happens that the data
>> was not cleaned up in HDFS and then causes a conflict.
>>
>> Hendrik
>>
>> On 21.11.2017 21:07, Joe Obernberger wrote:
>>>
>>> We've never run an index this size in anything but HDFS, so I have no
>>> comparison.  What we've been doing is keeping two main collections - all
>>> data, and the last 30 days of data.  Then we handle queries based on date
>>> range. The 30 day index is significantly faster.
>>>
>>> My main concern right now is that 6 of the 100 shards are not coming back
>>> because of no leader.  I've never seen this error before.  Any ideas?
>>> ClusterStatus shows all three replicas with state 'down'.
>>>
>>> Thanks!
>>>
>>> -joe
>>>
>>>
>>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:

 We actually also have some performance issue with HDFS at the moment. We
 are doing lots of soft commits for NRT search. Those seem to be slower then
 with local storage. The investigation is however not really far yet.

 We have a setup with 2000 collections, with one shard each and a
 replication factor of 2 or 3. When we restart nodes too fast that causes
 problems with the overseer queue, which can lead to the queue getting out 
 of
 control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
 improvements and should handle these actions faster. I would check what you
 see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
 critical part is the "overseer_queue_size" value. If this goes up to about
 1 it is pretty much game over on our setup. In that case it seems to be
 best to stop all nodes, clear the queue in ZK and then restart the nodes 
 one
 by one with a gap of like 5min. That normally recovers pretty well.
>

Re: Recovery Issue - Solr 6.6.1 and HDFS

One other data point I just saw on one of the nodes. It has the
following error:
2017-11-21 22:59:48.886 ERROR
(coreZkRegister-1-thread-1-processing-n:leda:9100_solr) [c:UNCLASS
s:shard14 r:core_node175 x:UNCLASS_shard14_replica3]
o.a.s.c.ShardLeaderElectionContext There was a problem trying to
register as the leader:org.apache.solr.common.SolrException: Leader
Initiated Recovery prevented leadership
at
org.apache.solr.cloud.ShardLeaderElectionContext.checkLIR(ElectionContext.java:521)
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:424)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
at
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)
at
org.apache.solr.cloud.ShardLeaderElectionContext.rejoinLeaderElection(ElectionContext.java:684)
at
org.apache.solr.cloud.ShardLeaderElectionContext.runLeaderProcess(ElectionContext.java:454)
at
org.apache.solr.cloud.LeaderElector.runIamLeaderProcess(LeaderElector.java:170)
at
org.apache.solr.cloud.LeaderElector.checkIfIamLeader(LeaderElector.java:135)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:307)
at
org.apache.solr.cloud.LeaderElector.joinElection(LeaderElector.java:216)

This stack trace repeats for a long while; looks like a recursive call.

-Joe

On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica is left
active the easiest is to then to delete the replica and create a new
one. When all replicas are down it helps most of the time to restart
one of the nodes that contains a replica in down state. If that also
doesn't get the replica to recover I would check the logs of the node
and also that of the overseer node. I have seen the same issue on Solr
using local storage. The main HDFS related issues we had so far was
those lock files and if you delete and recreate collections/cores and
it sometimes happens that the data was not cleaned up in HDFS and then
causes a conflict.

Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no
comparison. What we've been doing is keeping two main collections -
all data, and the last 30 days of data. Then we handle queries based
on date range. The 30 day index is significantly faster.

My main concern right now is that 6 of the 100 shards are not coming
back because of no leader. I've never seen this error before. Any
ideas? ClusterStatus shows all three replicas with state 'down'.

Thanks!

-joe

On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the
moment. We are doing lots of soft commits for NRT search. Those seem
to be slower then with local storage. The investigation is however
not really far yet.

We have a setup with 2000 collections, with one shard each and a
replication factor of 2 or 3. When we restart nodes too fast that
causes problems with the overseer queue, which can lead to the queue
getting out of control and Solr pretty much dying. We are still on
Solr 6.3. 6.6 has some improvements and should handle these actions
faster. I would check what you see for
"/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
critical part is the "overseer_queue_size" value. If this goes up to
about 1 it is pretty much game over on our setup. In that case
it seems to be best to stop all nodes, clear the queue in ZK and
then restart the nodes one by one with a gap of like 5min. That
normally recovers pretty well.

regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance
issues with HDFS, and thought that since the block size is 128M,
having a longer hard commit made sense. That was our hypothesis
anyway. Happy to switch it back and see what happens.

I don't know what caused the cluster to go into recovery in the
first place. We had a server die over the weekend, but it's just
one out of ~50. Every shard is 3x replicated (and 3x replicated in
HDFS...so 9 copies). It was at this point th

Re: Recovery Issue - Solr 6.6.1 and HDFS

Hi Hendrick - the shards in question have three replicas. I tried
restarting each one (one by one) - no luck. No leader is found. I
deleted one of the replicas and added a new one, and the new one also
shows as 'down'. I also tried the FORCELEADER call, but that had no
effect. I checked the OVERSEERSTATUS, but there is nothing unusual
there. I don't see anything useful in the logs except the error:

org.apache.solr.common.SolrException: Error getting leader from zk for
shard shard21

at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:996)
at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
at
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.solr.common.SolrException: Could not get leader props
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)

at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
... 7 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException:
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
at
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
at
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
at
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
at
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
at
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)

... 9 more

Can I modify zookeeper to force a leader? Is there any other way to
recover from this? Thanks very much!

-Joe

On 11/21/2017 3:24 PM, Hendrik Haddorp wrote:
We sometimes also have replicas not recovering. If one replica is left
active the easiest is to then to delete the replica and create a new
one. When all replicas are down it helps most of the time to restart
one of the nodes that contains a replica in down state. If that also
doesn't get the replica to recover I would check the logs of the node
and also that of the overseer node. I have seen the same issue on Solr
using local storage. The main HDFS related issues we had so far was
those lock files and if you delete and recreate collections/cores and
it sometimes happens that the data was not cleaned up in HDFS and then
causes a conflict.

Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no
comparison. What we've been doing is keeping two main collections -
all data, and the last 30 days of data. Then we handle queries based
on date range. The 30 day index is significantly faster.

My main concern right now is that 6 of the 100 shards are not coming
back because of no leader. I've never seen this error before. Any
ideas? ClusterStatus shows all three replicas with state 'down'.

Thanks!

-joe

On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the
moment. We are doing lots of soft commits for NRT search. Those seem
to be slower then with local storage. The investigation is however
not really far yet.

regards,
Hendrik

I don't know what caused

Re: Recovery Issue - Solr 6.6.1 and HDFS

Thank you Erick.  I've set the RamBufferSize to 1G; perhaps higher would 
be beneficial.  One more data point is that if I restart a node, more 
often than not, it goes into recovery, beats up the network for a while, 
and then goes green.  This happens even if I do no indexing between 
restarts.  Is that expected? Sometimes this can take longer than 20 
minutes.  No new data was added to the index between the restarts.


-Joe


On 11/21/2017 3:43 PM, Erick Erickson wrote:

bq: We are doing lots of soft commits for NRT search...

It's not surprising that this is slower than local storage, especially
if you have any autowarming going on. Opening  new searchers will need
to read data from disk for the new segments, and HDFS may be slower
here.

As far as the commit interval, an under-appreciated event is that when
RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
are written _anyway_, they're just a little invisible. That is, the
segments_n file isn't updated even though they're closed IIUC at
least. So that very long interval isn't helping with that problem I
don't think

Evidence to the contrary trumps my understanding of course.

About starting all these collections up at once and the Overseer
queue. I've seen this in similar situations. There are a _lot_ of
messages flying back and forth for each replica on startup, and the
Overseer processing was very inefficient historically so that queue
could get in the 100s of K, I've seen some pathological situations
where it's over 1M. SOLR-10524 made this a lot better. There are still
a lot of messages written in a case like yours, but at least the
Overseer has a much better chance to keep up Solr 6.6... At that
point bringing up Solr took a very long time.


Erick

On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
 wrote:

We sometimes also have replicas not recovering. If one replica is left
active the easiest is to then to delete the replica and create a new one.
When all replicas are down it helps most of the time to restart one of the
nodes that contains a replica in down state. If that also doesn't get the
replica to recover I would check the logs of the node and also that of the
overseer node. I have seen the same issue on Solr using local storage. The
main HDFS related issues we had so far was those lock files and if you
delete and recreate collections/cores and it sometimes happens that the data
was not cleaned up in HDFS and then causes a conflict.

Hendrik


On 21.11.2017 21:07, Joe Obernberger wrote:

We've never run an index this size in anything but HDFS, so I have no
comparison.  What we've been doing is keeping two main collections - all
data, and the last 30 days of data.  Then we handle queries based on date
range.  The 30 day index is significantly faster.

My main concern right now is that 6 of the 100 shards are not coming back
because of no leader.  I've never seen this error before.  Any ideas?
ClusterStatus shows all three replicas with state 'down'.

Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:

We actually also have some performance issue with HDFS at the moment. We
are doing lots of soft commits for NRT search. Those seem to be slower then
with local storage. The investigation is however not really far yet.

We have a setup with 2000 collections, with one shard each and a
replication factor of 2 or 3. When we restart nodes too fast that causes
problems with the overseer queue, which can lead to the queue getting out of
control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
improvements and should handle these actions faster. I would check what you
see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
critical part is the "overseer_queue_size" value. If this goes up to about
1 it is pretty much game over on our setup. In that case it seems to be
best to stop all nodes, clear the queue in ZK and then restart the nodes one
by one with a gap of like 5min. That normally recovers pretty well.

regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:

We set the hard commit time long because we were having performance
issues with HDFS, and thought that since the block size is 128M, having a
longer hard commit made sense.  That was our hypothesis anyway.  Happy to
switch it back and see what happens.

I don't know what caused the cluster to go into recovery in the first
place.  We had a server die over the weekend, but it's just one out of ~50.
Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies).  It
was at this point that we noticed lots of network activity, and most of the
shards in this recovery, fail, retry loop.  That is when we decided to shut
it down resulting in zombie lock files.

I tried using the FORCELEADER call, which completed, but doesn't seem to
have any effect on the shards that have no leader. Kinda out of ideas for
that problem.  If I can get the cluster back up, I'll try a lower hard
commit time.  Thanks again Erick!

-J

Re: Recovery Issue - Solr 6.6.1 and HDFS

bq: We are doing lots of soft commits for NRT search...

It's not surprising that this is slower than local storage, especially
if you have any autowarming going on. Opening  new searchers will need
to read data from disk for the new segments, and HDFS may be slower
here.

As far as the commit interval, an under-appreciated event is that when
RAMBufferSizeMB is exceeded (default 100M last I knew) new segments
are written _anyway_, they're just a little invisible. That is, the
segments_n file isn't updated even though they're closed IIUC at
least. So that very long interval isn't helping with that problem I
don't think

Evidence to the contrary trumps my understanding of course.

About starting all these collections up at once and the Overseer
queue. I've seen this in similar situations. There are a _lot_ of
messages flying back and forth for each replica on startup, and the
Overseer processing was very inefficient historically so that queue
could get in the 100s of K, I've seen some pathological situations
where it's over 1M. SOLR-10524 made this a lot better. There are still
a lot of messages written in a case like yours, but at least the
Overseer has a much better chance to keep up Solr 6.6... At that
point bringing up Solr took a very long time.

Erick

On Tue, Nov 21, 2017 at 12:24 PM, Hendrik Haddorp
 wrote:
> We sometimes also have replicas not recovering. If one replica is left
> active the easiest is to then to delete the replica and create a new one.
> When all replicas are down it helps most of the time to restart one of the
> nodes that contains a replica in down state. If that also doesn't get the
> replica to recover I would check the logs of the node and also that of the
> overseer node. I have seen the same issue on Solr using local storage. The
> main HDFS related issues we had so far was those lock files and if you
> delete and recreate collections/cores and it sometimes happens that the data
> was not cleaned up in HDFS and then causes a conflict.
>
> Hendrik
>
>
> On 21.11.2017 21:07, Joe Obernberger wrote:
>>
>> We've never run an index this size in anything but HDFS, so I have no
>> comparison.  What we've been doing is keeping two main collections - all
>> data, and the last 30 days of data.  Then we handle queries based on date
>> range.  The 30 day index is significantly faster.
>>
>> My main concern right now is that 6 of the 100 shards are not coming back
>> because of no leader.  I've never seen this error before.  Any ideas?
>> ClusterStatus shows all three replicas with state 'down'.
>>
>> Thanks!
>>
>> -joe
>>
>>
>> On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
>>>
>>> We actually also have some performance issue with HDFS at the moment. We
>>> are doing lots of soft commits for NRT search. Those seem to be slower then
>>> with local storage. The investigation is however not really far yet.
>>>
>>> We have a setup with 2000 collections, with one shard each and a
>>> replication factor of 2 or 3. When we restart nodes too fast that causes
>>> problems with the overseer queue, which can lead to the queue getting out of
>>> control and Solr pretty much dying. We are still on Solr 6.3. 6.6 has some
>>> improvements and should handle these actions faster. I would check what you
>>> see for "/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The
>>> critical part is the "overseer_queue_size" value. If this goes up to about
>>> 1 it is pretty much game over on our setup. In that case it seems to be
>>> best to stop all nodes, clear the queue in ZK and then restart the nodes one
>>> by one with a gap of like 5min. That normally recovers pretty well.
>>>
>>> regards,
>>> Hendrik
>>>
>>> On 21.11.2017 20:12, Joe Obernberger wrote:

 We set the hard commit time long because we were having performance
 issues with HDFS, and thought that since the block size is 128M, having a
 longer hard commit made sense.  That was our hypothesis anyway.  Happy to
 switch it back and see what happens.

 I don't know what caused the cluster to go into recovery in the first
 place.  We had a server die over the weekend, but it's just one out of ~50.
 Every shard is 3x replicated (and 3x replicated in HDFS...so 9 copies).  It
 was at this point that we noticed lots of network activity, and most of the
 shards in this recovery, fail, retry loop.  That is when we decided to shut
 it down resulting in zombie lock files.

 I tried using the FORCELEADER call, which completed, but doesn't seem to
 have any effect on the shards that have no leader. Kinda out of ideas for
 that problem.  If I can get the cluster back up, I'll try a lower hard
 commit time.  Thanks again Erick!

 -Joe

 On 11/21/2017 2:00 PM, Erick Erickson wrote:
>
> Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...
>
> I need to back up a bit. Once nodes are in this state it's not
> surprising that the

Re: Recovery Issue - Solr 6.6.1 and HDFS

We sometimes also have replicas not recovering. If one replica is left 
active the easiest is to then to delete the replica and create a new 
one. When all replicas are down it helps most of the time to restart one 
of the nodes that contains a replica in down state. If that also doesn't 
get the replica to recover I would check the logs of the node and also 
that of the overseer node. I have seen the same issue on Solr using 
local storage. The main HDFS related issues we had so far was those lock 
files and if you delete and recreate collections/cores and it sometimes 
happens that the data was not cleaned up in HDFS and then causes a conflict.


Hendrik

On 21.11.2017 21:07, Joe Obernberger wrote:
We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - 
all data, and the last 30 days of data.  Then we handle queries based 
on date range.  The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. 
We are doing lots of soft commits for NRT search. Those seem to be 
slower then with local storage. The investigation is however not 
really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 
1 it is pretty much game over on our setup. In that case it seems 
to be best to stop all nodes, clear the queue in ZK and then restart 
the nodes one by one with a gap of like 5min. That normally recovers 
pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the 
first place.  We had a server die over the weekend, but it's just 
one out of ~50.  Every shard is 3x replicated (and 3x replicated in 
HDFS...so 9 copies).  It was at this point that we noticed lots of 
network activity, and most of the shards in this recovery, fail, 
retry loop.  That is when we decided to shut it down resulting in 
zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't 
seem to have any effect on the shards that have no leader. Kinda out 
of ideas for that problem.  If I can get the cluster back up, I'll 
try a lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKee

Re: Recovery Issue - Solr 6.6.1 and HDFS

We've never run an index this size in anything but HDFS, so I have no 
comparison.  What we've been doing is keeping two main collections - all 
data, and the last 30 days of data.  Then we handle queries based on 
date range.  The 30 day index is significantly faster.


My main concern right now is that 6 of the 100 shards are not coming 
back because of no leader.  I've never seen this error before.  Any 
ideas?  ClusterStatus shows all three replicas with state 'down'.


Thanks!

-joe


On 11/21/2017 2:35 PM, Hendrik Haddorp wrote:
We actually also have some performance issue with HDFS at the moment. 
We are doing lots of soft commits for NRT search. Those seem to be 
slower then with local storage. The investigation is however not 
really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that 
causes problems with the overseer queue, which can lead to the queue 
getting out of control and Solr pretty much dying. We are still on 
Solr 6.3. 6.6 has some improvements and should handle these actions 
faster. I would check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 
1 it is pretty much game over on our setup. In that case it seems 
to be best to stop all nodes, clear the queue in ZK and then restart 
the nodes one by one with a gap of like 5min. That normally recovers 
pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out 
of ~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 
9 copies).  It was at this point that we noticed lots of network 
activity, and most of the shards in this recovery, fail, retry loop.  
That is when we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem 
to have any effect on the shards that have no leader. Kinda out of 
ideas for that problem.  If I can get the cluster back up, I'll try a 
lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKeeper as the 
owner and
is also not willing to retry, which is why you need to restart the 
whole
Solr instance after the cleanup. I added some logic to my Solr 
start up
script which scans the log files in HDFS and compares that with the 
state in
ZooKeeper and then delete all lock files that belong to the node 
that I'm

starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 
6.6.1 using

HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is 
split
across 100 shards with 3 replicas each.  The issue that we're 
running into
is when restarting

Re: Recovery Issue - Solr 6.6.1 and HDFS

We actually also have some performance issue with HDFS at the moment. We 
are doing lots of soft commits for NRT search. Those seem to be slower 
then with local storage. The investigation is however not really far yet.


We have a setup with 2000 collections, with one shard each and a 
replication factor of 2 or 3. When we restart nodes too fast that causes 
problems with the overseer queue, which can lead to the queue getting 
out of control and Solr pretty much dying. We are still on Solr 6.3. 6.6 
has some improvements and should handle these actions faster. I would 
check what you see for 
"/solr/admin/collections?action=OVERSEERSTATUS&wt=json". The critical 
part is the "overseer_queue_size" value. If this goes up to about 1 
it is pretty much game over on our setup. In that case it seems to be 
best to stop all nodes, clear the queue in ZK and then restart the nodes 
one by one with a gap of like 5min. That normally recovers pretty well.


regards,
Hendrik

On 21.11.2017 20:12, Joe Obernberger wrote:
We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, 
having a longer hard commit made sense.  That was our hypothesis 
anyway.  Happy to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 9 
copies).  It was at this point that we noticed lots of network 
activity, and most of the shards in this recovery, fail, retry loop.  
That is when we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem 
to have any effect on the shards that have no leader.  Kinda out of 
ideas for that problem.  If I can get the cluster back up, I'll try a 
lower hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully.
The write.lock files are then left in the HDFS as they do not get 
removed

automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should 
be owning
the lock as it is marked in the state stored in ZooKeeper as the 
owner and
is also not willing to retry, which is why you need to restart the 
whole

Solr instance after the cleanup. I added some logic to my Solr start up
script which scans the log files in HDFS and compares that with the 
state in
ZooKeeper and then delete all lock files that belong to the node 
that I'm

starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using

HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is 
split
across 100 shards with 3 replicas each.  The issue that we're 
running into
is when restarting the solr6 cluster.  The shards go into recovery 
and start
to utilize nearly all of their network interfaces.  If we start too 
many of
the nodes at once, the shards will go into a recovery, fail, and 
retry loop

and never come up.  The errors are related to HDFS not responding fast
enough and warnings from the DFSClient.  If we stop a node when 
this is

happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.

The process at this point is to start one node, find o

Re: Recovery Issue - Solr 6.6.1 and HDFS

Unfortunately I can not upload my cleanup code but the steps I'm doing 
are quite easy. I wrote it in Java using the HDFS API and Curator for 
ZooKeeper. Steps are:
    - read out the children of /collections in ZK so you know all the 
collection names

    - read /collections//state.json to get the state
    - find the replicas in the state and filter those out that have a 
"node_name" matching your locale node (the node name is basically a 
combination of your host name and the solr port)
    - if the replica data has "dataDir" set then you basically only 
need to add "index/write.lock" to it and you have the lock location
    - if "dataDir" is not set (not really sure why) then you need to 
construct it yourself: //name>/data/index/write.lock

    - if the lock file exist delete it

I believe there is a small race condition in case you use replica auto 
fail over. So I try to keep the time between checking the state in 
ZooKeeper and deleting the lock file as short, like not first determine 
all lock file locations and only then delete them but do that while 
checking the state.


regards,
Hendrik

On 21.11.2017 19:53, Joe Obernberger wrote:
A clever idea.  Normally what we do when we need to do a restart, is 
to halt indexing, and then wait about 30 minutes.  If we do not wait, 
and stop the cluster, the default scripts 180 second timeout is not 
enough and we'll have lock files to clean up.  We use puppet to start 
and stop the nodes, but at this point that is not working well since 
we need to start one node at a time.  With each one taking hours, this 
is a lengthy process!  I'd love to see your script!


This new error is now coming up - see screen shot.  For some reason 
some of the shards have no leader assigned:


http://lovehorsepower.com/SolrClusterErrors.jpg

-Joe


On 11/21/2017 1:34 PM, Hendrik Haddorp wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored 
in ZooKeeper as the owner and is also not willing to retry, which is 
why you need to restart the whole Solr instance after the cleanup. I 
added some logic to my Solr start up script which scans the log files 
in HDFS and compares that with the state in ZooKeeper and then delete 
all lock files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index. The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The 
issue that we're running into is when restarting the solr6 cluster.  
The shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and 
upon restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete 
the write.lock files, and restart.  Usually this second restart is 
faster, but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). 
Should we have not used so many replicas with HDFS?  Is there a 
better way we should have built the solr6 cluster?


Thank you for any insight!

-Joe




---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Recovery Issue - Solr 6.6.1 and HDFS

We set the hard commit time long because we were having performance 
issues with HDFS, and thought that since the block size is 128M, having 
a longer hard commit made sense.  That was our hypothesis anyway.  Happy 
to switch it back and see what happens.


I don't know what caused the cluster to go into recovery in the first 
place.  We had a server die over the weekend, but it's just one out of 
~50.  Every shard is 3x replicated (and 3x replicated in HDFS...so 9 
copies).  It was at this point that we noticed lots of network activity, 
and most of the shards in this recovery, fail, retry loop.  That is when 
we decided to shut it down resulting in zombie lock files.


I tried using the FORCELEADER call, which completed, but doesn't seem to 
have any effect on the shards that have no leader.  Kinda out of ideas 
for that problem.  If I can get the cluster back up, I'll try a lower 
hard commit time.  Thanks again Erick!


-Joe


On 11/21/2017 2:00 PM, Erick Erickson wrote:

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped gracefully.
The write.lock files are then left in the HDFS as they do not get removed
automatically when the client disconnects like a ephemeral node in
ZooKeeper. Unfortunately Solr does also not realize that it should be owning
the lock as it is marked in the state stored in ZooKeeper as the owner and
is also not willing to retry, which is why you need to restart the whole
Solr instance after the cleanup. I added some logic to my Solr start up
script which scans the log files in HDFS and compares that with the state in
ZooKeeper and then delete all lock files that belong to the node that I'm
starting.

regards,
Hendrik


On 21.11.2017 14:07, Joe Obernberger wrote:

Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
HDFS as the index.  The current index size is about 31TBytes. With 3x
replication that takes up 93TBytes of disk. Our main collection is split
across 100 shards with 3 replicas each.  The issue that we're running into
is when restarting the solr6 cluster.  The shards go into recovery and start
to utilize nearly all of their network interfaces.  If we start too many of
the nodes at once, the shards will go into a recovery, fail, and retry loop
and never come up.  The errors are related to HDFS not responding fast
enough and warnings from the DFSClient.  If we stop a node when this is
happening, the script will force a stop (180 second timeout) and upon
restart, we have lock files (write.lock) inside of HDFS.

The process at this point is to start one node, find out the lock files,
wait for it to come up completely (hours), stop it, delete the write.lock
files, and restart.  Usually this second restart is faster, but it still can
take 20-60 minutes.

The smaller indexes recover much faster (less than 5 minutes). Should we
have not used so many replicas with HDFS?  Is there a better way we should
have built the solr6 cluster?

Thank you for any insight!

-Joe


---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Recovery Issue - Solr 6.6.1 and HDFS

Frankly with HDFS I'm a bit out of my depth so listen to Hendrik ;)...

I need to back up a bit. Once nodes are in this state it's not
surprising that they need to be forcefully killed. I was more thinking
about how they got in this situation in the first place. _Before_ you
get into the nasty state how are the Solr nodes shut down? Forcefully?

Your hard commit is far longer than it needs to be, resulting in much
larger tlog files etc. I usually set this at 15-60 seconds with local
disks, not quite sure whether longer intervals are helpful on HDFS.
What this means is that you can spend up to 30 minutes when you
restart solr _replaying the tlogs_! If Solr is killed, it may not have
had a chance to fsync the segments and may have to replay on startup.
If you have openSearcher set to false, the hard commit operation is
not horribly expensive, it just fsync's the current segments and opens
new ones. It won't be a total cure, but I bet reducing this interval
would help a lot.

Also, if you stop indexing there's no need to wait 30 minutes if you
issue a manual commit, something like
.../collection/update?commit=true. Just reducing the hard commit
interval will make the wait between stopping indexing and restarting
shorter all by itself if you don't want to issue the manual commit.

Best,
Erick

On Tue, Nov 21, 2017 at 10:34 AM, Hendrik Haddorp
 wrote:
> Hi,
>
> the write.lock issue I see as well when Solr is not been stopped gracefully.
> The write.lock files are then left in the HDFS as they do not get removed
> automatically when the client disconnects like a ephemeral node in
> ZooKeeper. Unfortunately Solr does also not realize that it should be owning
> the lock as it is marked in the state stored in ZooKeeper as the owner and
> is also not willing to retry, which is why you need to restart the whole
> Solr instance after the cleanup. I added some logic to my Solr start up
> script which scans the log files in HDFS and compares that with the state in
> ZooKeeper and then delete all lock files that belong to the node that I'm
> starting.
>
> regards,
> Hendrik
>
>
> On 21.11.2017 14:07, Joe Obernberger wrote:
>>
>> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
>> HDFS as the index.  The current index size is about 31TBytes. With 3x
>> replication that takes up 93TBytes of disk. Our main collection is split
>> across 100 shards with 3 replicas each.  The issue that we're running into
>> is when restarting the solr6 cluster.  The shards go into recovery and start
>> to utilize nearly all of their network interfaces.  If we start too many of
>> the nodes at once, the shards will go into a recovery, fail, and retry loop
>> and never come up.  The errors are related to HDFS not responding fast
>> enough and warnings from the DFSClient.  If we stop a node when this is
>> happening, the script will force a stop (180 second timeout) and upon
>> restart, we have lock files (write.lock) inside of HDFS.
>>
>> The process at this point is to start one node, find out the lock files,
>> wait for it to come up completely (hours), stop it, delete the write.lock
>> files, and restart.  Usually this second restart is faster, but it still can
>> take 20-60 minutes.
>>
>> The smaller indexes recover much faster (less than 5 minutes). Should we
>> have not used so many replicas with HDFS?  Is there a better way we should
>> have built the solr6 cluster?
>>
>> Thank you for any insight!
>>
>> -Joe
>>
>

Re: Recovery Issue - Solr 6.6.1 and HDFS

A clever idea.  Normally what we do when we need to do a restart, is to 
halt indexing, and then wait about 30 minutes.  If we do not wait, and 
stop the cluster, the default scripts 180 second timeout is not enough 
and we'll have lock files to clean up.  We use puppet to start and stop 
the nodes, but at this point that is not working well since we need to 
start one node at a time.  With each one taking hours, this is a lengthy 
process!  I'd love to see your script!


This new error is now coming up - see screen shot.  For some reason some 
of the shards have no leader assigned:


http://lovehorsepower.com/SolrClusterErrors.jpg

-Joe


On 11/21/2017 1:34 PM, Hendrik Haddorp wrote:

Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored 
in ZooKeeper as the owner and is also not willing to retry, which is 
why you need to restart the whole Solr instance after the cleanup. I 
added some logic to my Solr start up script which scans the log files 
in HDFS and compares that with the state in ZooKeeper and then delete 
all lock files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The 
issue that we're running into is when restarting the solr6 cluster.  
The shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and upon 
restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is 
faster, but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should 
we have not used so many replicas with HDFS?  Is there a better way 
we should have built the solr6 cluster?


Thank you for any insight!

-Joe




---
This email has been checked for viruses by AVG.
http://www.avg.com

Re: Recovery Issue - Solr 6.6.1 and HDFS


Hi,

the write.lock issue I see as well when Solr is not been stopped 
gracefully. The write.lock files are then left in the HDFS as they do 
not get removed automatically when the client disconnects like a 
ephemeral node in ZooKeeper. Unfortunately Solr does also not realize 
that it should be owning the lock as it is marked in the state stored in 
ZooKeeper as the owner and is also not willing to retry, which is why 
you need to restart the whole Solr instance after the cleanup. I added 
some logic to my Solr start up script which scans the log files in HDFS 
and compares that with the state in ZooKeeper and then delete all lock 
files that belong to the node that I'm starting.


regards,
Hendrik

On 21.11.2017 14:07, Joe Obernberger wrote:
Hi All - we have a system with 45 physical boxes running solr 6.6.1 
using HDFS as the index.  The current index size is about 31TBytes. 
With 3x replication that takes up 93TBytes of disk. Our main 
collection is split across 100 shards with 3 replicas each.  The issue 
that we're running into is when restarting the solr6 cluster.  The 
shards go into recovery and start to utilize nearly all of their 
network interfaces.  If we start too many of the nodes at once, the 
shards will go into a recovery, fail, and retry loop and never come 
up.  The errors are related to HDFS not responding fast enough and 
warnings from the DFSClient.  If we stop a node when this is 
happening, the script will force a stop (180 second timeout) and upon 
restart, we have lock files (write.lock) inside of HDFS.


The process at this point is to start one node, find out the lock 
files, wait for it to come up completely (hours), stop it, delete the 
write.lock files, and restart.  Usually this second restart is faster, 
but it still can take 20-60 minutes.


The smaller indexes recover much faster (less than 5 minutes). Should 
we have not used so many replicas with HDFS?  Is there a better way we 
should have built the solr6 cluster?


Thank you for any insight!

-Joe

Re: Recovery Issue - Solr 6.6.1 and HDFS

Erick - thank you very much for the reply.  I'm still working through 
restarting the nodes one by one.


I'm stopping the nodes with the script, but yes - they are being killed 
forcefully because they are in this recovery, failed, retry loop.  I 
could increase the timeout, but they never seem to recover.


The largest tlog file that I see currently is 222MBytes. Autocommit is 
set to 180 and autoSoftCommit is set to 12. Errors when they are 
in the long recovery are things like:


2017-11-20 21:41:29.755 ERROR 
(recoveryExecutor-3-thread-4-processing-n:frodo:9100_solr 
x:UNCLASS_shard37_replica1 s:shard37 c:UNCLASS r:core_node196) 
[c:UNCLASS s:shard37 r:core_node196 x:UNCLASS_shard37_replica1] 
o.a.s.h.IndexFetcher Error closing file: _8dmn.cfs
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File 
/solr6.6.0/UNCLASS/core_node196/data/index.20171120195705961/_8dmn.cfs 
could only be replicated to 0 nodes instead of minReplication (=1).  
There are 39 datanode(s) running and no node(s) are excluded in this 
operation.
    at 
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1716)
    at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3385)


Complete log is here for one of the shards that was forcefully stopped.
http://lovehorsepower.com/solr.log

As to what is in the logs when it is recovering for several hours, it's 
many WARN messages from the DFSClient such as:
Abandoning 
BP-1714598269-10.2.100.220-1341346046854:blk_4366207808_1103082741732

and
Excluding datanode 
DatanodeInfoWithStorage[172.16.100.229:50010,DS-5985e40d-830a-44e7-a2ea-fc60bebabc30,DISK]


or from the IndexFetcher:

File _a96y.cfe did not match. expected checksum is 3502268220 and actual 
is checksum 2563579651. expected length is 405 and actual length is 405


Unfortunately, I'm not getting errors from some of the nodes (still 
working through restarting them) about zookeeper:


org.apache.solr.common.SolrException: Could not get leader props
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1043)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1007)

    at org.apache.solr.cloud.ZkController.getLeader(ZkController.java:963)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:902)
    at org.apache.solr.cloud.ZkController.register(ZkController.java:846)
    at 
org.apache.solr.core.ZkContainer.lambda$registerInZk$0(ZkContainer.java:181)
    at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:229)
    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

    at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
KeeperErrorCode = NoNode for /collections/UNCLASS/leaders/shard21/leader
    at 
org.apache.zookeeper.KeeperException.create(KeeperException.java:111)

    at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
    at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:357)
    at 
org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:354)
    at 
org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60)
    at 
org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:354)
    at 
org.apache.solr.cloud.ZkController.getLeaderProps(ZkController.java:1021)


Any idea what those could be?  Those shards are not coming back up. 
Sorry so many questions!


-Joe

On 11/21/2017 12:11 PM, Erick Erickson wrote:

How are you stopping Solr? Nodes should not go into recovery on
startup unless Solr was killed un-gracefully (i.e. kill -9 or the
like). If you use the bin/solr script to stop Solr and see a message
about "killing XXX forcefully" then you can lengthen out the time the
script waits for shutdown (there's a sysvar you can set, look in the
script).

Actually I'll correct myself a bit. Shards _do_ go into recovery but
it should be very short in the graceful shutdown case. Basically
shards temporarily go into recovery as part of normal processing just
long enough to see there's no recovery necessary, but that should be
measured in a few seconds.

What it sounds like from this "The shards go into recovery and start
to utilize nearly all of their network" is that your nodes go into
"full recovery" where the entire index is copied down because the
replica thinks it's "too far" out of date. That indicates something
weird about the state when the Solr nodes stopped.

wild-shot-in-the-dark question. How big are your tlogs? If you don't
hard commit very often, the tlogs can replay at startup for a very
long time.

This makes no sense to me, I'm surely missing something:

The proc

Re: Recovery Issue - Solr 6.6.1 and HDFS

How are you stopping Solr? Nodes should not go into recovery on
startup unless Solr was killed un-gracefully (i.e. kill -9 or the
like). If you use the bin/solr script to stop Solr and see a message
about "killing XXX forcefully" then you can lengthen out the time the
script waits for shutdown (there's a sysvar you can set, look in the
script).

Actually I'll correct myself a bit. Shards _do_ go into recovery but
it should be very short in the graceful shutdown case. Basically
shards temporarily go into recovery as part of normal processing just
long enough to see there's no recovery necessary, but that should be
measured in a few seconds.

What it sounds like from this "The shards go into recovery and start
to utilize nearly all of their network" is that your nodes go into
"full recovery" where the entire index is copied down because the
replica thinks it's "too far" out of date. That indicates something
weird about the state when the Solr nodes stopped.

wild-shot-in-the-dark question. How big are your tlogs? If you don't
hard commit very often, the tlogs can replay at startup for a very
long time.

This makes no sense to me, I'm surely missing something:

The process at this point is to start one node, find out the lock
files, wait for it to come up completely (hours), stop it, delete the
write.lock files, and restart.  Usually this second restart is faster,
but it still can take 20-60 minutes.

When you start one node it may take a few minutes for leader electing
to kick in (the default is 180 seconds) but absent replication it
should just be there. Taking hours totally violates my expectations.
What does Solr _think_ it's doing? What's in the logs at that point?
And if you stop solr gracefully, there shouldn't be a problem with
write.lock.

You could also try increasing the timeouts, and the HDFS directory
factory has some parameters to tweak that are a mystery to me...

All in all, this is behavior that I find mystifying.

Best,
Erick

On Tue, Nov 21, 2017 at 5:07 AM, Joe Obernberger
 wrote:
> Hi All - we have a system with 45 physical boxes running solr 6.6.1 using
> HDFS as the index.  The current index size is about 31TBytes.  With 3x
> replication that takes up 93TBytes of disk. Our main collection is split
> across 100 shards with 3 replicas each.  The issue that we're running into
> is when restarting the solr6 cluster.  The shards go into recovery and start
> to utilize nearly all of their network interfaces.  If we start too many of
> the nodes at once, the shards will go into a recovery, fail, and retry loop
> and never come up.  The errors are related to HDFS not responding fast
> enough and warnings from the DFSClient.  If we stop a node when this is
> happening, the script will force a stop (180 second timeout) and upon
> restart, we have lock files (write.lock) inside of HDFS.
>
> The process at this point is to start one node, find out the lock files,
> wait for it to come up completely (hours), stop it, delete the write.lock
> files, and restart.  Usually this second restart is faster, but it still can
> take 20-60 minutes.
>
> The smaller indexes recover much faster (less than 5 minutes). Should we
> have not used so many replicas with HDFS?  Is there a better way we should
> have built the solr6 cluster?
>
> Thank you for any insight!
>
> -Joe
>

Recovery Issue - Solr 6.6.1 and HDFS