[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378850#comment-15378850
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

Thanks for reviewing Mark but I already fixed that in the last patch.

I found a test failure in ZkControllerTest because of a thread leak so I may 
post another patch soon.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> SOLR-9290.patch, SOLR-9290.patch, index.sh, setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15378846#comment-15378846
 ] 

Mark Miller commented on SOLR-9290:
---

Patch looks okay to me.

{noformat}
+clientConnectionManager.shutdown();
+IOUtils.closeQuietly(defaultClient);
{noformat}

Not that it likely matters, but I'd reverse this and shut down the pool after 
the client.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> SOLR-9290.patch, SOLR-9290.patch, index.sh, setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377991#comment-15377991
 ] 

Mark Miller commented on SOLR-9290:
---

Okay, I didn't catch you were not removing the stale check.

bq. For reasons I don't understand, 'idle' connections are more likely to 
(exist? | be kept around indefinitely?) when the intra-node communication is 
over SSL.

I think I remember reading the SSL handles connection lifecycle differently, 
based on the SSL spec.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377355#comment-15377355
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

[~markrmil...@gmail.com] -- But you were trying to remove the stale check and 
disable Nagle's algorithm as well which exposed you to the 
NoHttpResponseExceptions. We aren't trying to do that here. We just want to 
close the idle connections so that they don't keep accumulating in the 
CLOSE_WAIT state.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377345#comment-15377345
 ] 

Mark Miller commented on SOLR-9290:
---

bq. why not just re-use the IdleConnectionEvictor class provided by 
httpcomponents

I've gone down this road. It's not a great solution. This is why we ended up 
changing to the new API's instead in SOLR-4509. Just having an evictor thread 
is not enough - you also then want the ability to check connections before use 
if they have been sitting in the pool too long and that requires HttpClient 
changes they made in the new API's.



> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15377292#comment-15377292
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

Hmm, you're right Yonik. But we've always had an idle timeout for the http 
connector in jetty set to 50 seconds (I traced this back to SOLR-128). So after 
50 seconds of inactivity, Jetty closes that connection from its end and the 
client's socket goes to CLOSE_WAIT state. As you said, this connection cannot 
be re-used anymore. When httpclient tries to use the connection, it does the 
stale check, sees the CLOSE_WAIT state and terminates the connection and gives 
a new one to Solr.

So all the connections that suddenly do not show up in CLOSE_WAIT and we 
assumed that they went to ESTABLISHED state were actually terminated.

So in summary, our assumption that connections in CLOSE_WAIT are kept around 
because of re-use is wrong but it still doesn't change the solution that I've 
proposed. We could also think of increasing the value of Jetty's idle timeout 
as a separate change but the idle eviction thread would still be necessary.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376974#comment-15376974
 ] 

Yonik Seeley commented on SOLR-9290:


I haven't been following this issue, but this caught my eye:

bq. Solr's use of HttpClient for intra-node communication has historically 
always had the potential to result in connections sitting "idle" (ie: in a 
CLOSE_WAIT state) for possible re-use later

It's been a *long* time since I messed around with making sure Solr worked with 
persistent connections (we're talking CNET days... 2004,2005 ;-)
But CLOSE_WAIT is when one side has closed the connection... there's no going 
back to ESTABLISHED from that state (i.e. no reusing that connection).

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376893#comment-15376893
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

Hoss has covered most of the things but just a few comments (note that I'm 
responding to multiple people and comments here):

bq. Why not backport that and avoid the problem entirely? Is it a different 
client version in master or something that makes it not that easy?

We could backport SOLR-4509 to 6.x and deal with the incompatible changes but 
I'd certainly not backport it to 5x because it is just a huge change and I am 
not comfortable releasing that in a minor bug-fix release. I am sure many 
people running 5.x releases would also like a fix to this issue. Adding an idle 
eviction thread is trivial and unlikely to cause any regressions.

{quote}
Shalin Shekhar Mangar: why not just re-use the IdleConnectionEvictor class 
provided by httpcomponents (getting the exact same underlying impl as what 
master gets from HttpClientBuilder.evictIdleConnections) ?
https://hc.apache.org/httpcomponents-client-4.4.x/httpclient/apidocs/org/apache/http/impl/client/IdleConnectionEvictor.html
{quote}

I wasn't aware of this class. But looking deeper, I see that it requires a 
HttpClientConnectionManager instance but the 6.x and 5.x code uses the 
deprecated PoolingClientConnectionManager which extends 
ClientConnectionManager. But now that we know it exists, I can just borrow it 
from the httpclient project instead of writing my own evictor. It is ASLv2 
anyway.

bq. Somebody sanity check my understanding / summary description of the root 
issue...

That sounds about right to me Hoss. Thanks for the summary!

bq. For reasons I don't understand, 'idle' connections are more likely to 
(exist? | be kept around indefinitely?) when the intra-node communication is 
over SSL.

Perhaps the SSL setup/teardown overhead adds some latency such that concurrent 
requests end up opening more connections overall? I am just guessing here.

bq. Which begs the question: why are there 15 CLOSE_WAIT connections that last 
forever on branch_6x even with this patch?

As Shai said, this is likely the HttpShardHandler's pool. The overseer 
collection processor invokes a core admin create for each replica in parallel 
so you get 15 connections for 15 replicas that were created by the collection 
API.

I'm working on a new patch which applies on branch_6x that incorporates Shai's 
comments as well. We can then backport it to 5x.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15376474#comment-15376474
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Which begs the question: why are there 15 CLOSE_WAIT connections that last 
forever on branch_6x even with this patch?

I think Shalin's patch only adds this monitor thread to {{UpdateShardHandler}}, 
but not to {{HttpShardHandlerFactory}} so these 15 could be from it?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, index.sh, 
> setup-solr.sh, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375885#comment-15375885
 ] 

Hoss Man commented on SOLR-9290:


Somebody sanity check my understanding / summary description of the root 
issue...

* Solr's use of HttpClient for intra-node communication has historically always 
had the potential to result in connections sitting "idle" (ie: in a CLOSE_WAIT 
state) for possible re-use later -- but these connections are kept open 
indefinitely. 
** For reasons I don't understand, 'idle' connections are more likely to 
(exist? | be kept around indefinitely?) when  the intra-node communication is 
over SSL.
* {{maxUpdateConnections}} and {{maxUpdateConnectionsPerHost}} have always set 
hard upper limits on the number of connections that could ever be created -- 
let alone in sitting idle in a CLOSE_WAIT state.
* Prior to SOLR-8533, the default values for these limits was relatively low, 
making it unlikely that users could ever observe an extreme # of idle / 
CLOSE_WAIT threads -- you were more likely to have your Solr cluster crash from 
deadlocks then notice any serious OS level problem with too many idle 
connections
* After SOLR-8533, the increased default values of these limits made the 
problem much more noticeable
* SOLR-4509's changes included use of a new option which results in a 
background thread checking for an existing idle connections on the master branch
* This issue address the problem for branch_6x (and older) branches via a 
similar background thread

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375821#comment-15375821
 ] 

Hoss Man commented on SOLR-9290:


I'm no expert but...

bq. I don't understand why the preferred approach here is to just have a thread 
that is trying to close connections. Is the problem that these connections 
would never otherwise be closed? ...

...my understanding is yes: In a situation where indexing load spikes up, you 
can get a lot of connections which are never completely closed. (even if they 
are never needed anymore)

bq. ...If that is the case, why can't we solve the problem of them not being 
closed as a part of their normal usage? ...

again, IIUC: because they are pooled connections maintained by the HTTP layer.  
Per the docs shalin quoted, clients are required to call 
ClientConnectionManager#closeExpiredConnections() if they (ie: "we") want to 
ensure those connections get closed properly.

bq. It sounds like master doesn't have this problem because of different client 
settings? ... Why not backport that and avoid the problem entirely? Is it a 
different client version in master or something that makes it not that easy?

master & branch_6x (and earlier) use completely diff http client APIs (see 
SOLR-4509) ... the {{HttpClientBuilder.evictIdleConnections}} method shalin 
refered to being used on master is on a class ({{HttpClientBuilder}}) that is 
not used at all in branch_6x.

The docs of that method describe it doing virtually the same exact same thing 
on the (private connection pool for the) HttpClient as what Shalin's patch does 
(on the pool in the shared ClientConnectionManager) ...

{noformat}
Makes this instance of HttpClient proactively evict idle connections from the 
connection pool using a background thread. 
{noformat}

Which makes me wonder...

[~shalinmangar]: why not just re-use the {{IdleConnectionEvictor}} class 
provided by httpcomponents (getting the exact same underlying impl as what 
master gets from  {{HttpClientBuilder.evictIdleConnections}}) ?

https://hc.apache.org/httpcomponents-client-4.4.x/httpclient/apidocs/org/apache/http/impl/client/IdleConnectionEvictor.html

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Scott Lindner (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375727#comment-15375727
 ] 

Scott Lindner commented on SOLR-9290:
-

I would like to add something, too.  The problem must stem from some sort of 
OS-level setting.  In our environment I've noticed that when a given IP+PORT 
combo reaches ~28k connections in a CLOSE_WAIT state that the OS, itself, 
cannot allow any more connections to that IP+PORT combo (i.e. even curl fails 
to that combo - but to other combos, including other ports on that same host - 
it works just fine).  I mention this because the problem seems related here to 
whatever settings we configure solr to use and you really must change these 
things in combination for it to ultimately make sense or you risk hitting this 
problem at some point - though admittedly with the bg thread it wouldn't be 
permanent like it is for us today.


> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread David A. Bradley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375666#comment-15375666
 ] 

David A. Bradley commented on SOLR-9290:


I don't understand why the preferred approach here is to just have a thread 
that is trying to close connections. Is the problem that these connections 
would never otherwise be closed? If that is the case, why can't we solve the 
problem of them not being closed as a part of their normal usage? It sounds 
like master doesn't have this problem because of different client settings? :
"Also, I think the reason this wasn't reproducible on master is because 
SOLR-4509 enabled eviction of idle connections by calling 
HttpClientBuilder#evictIdleConnections with a 50 second limit."

Why not backport that and avoid the problem entirely? Is it a different client 
version in master or something that makes it not that easy?

"So we must periodically close such connections once they're idle to avoid the 
number of such connections increasing to absurd limits." It seems from the 
discussion here that the problem is hitting a high number of connections, which 
is only allowed to be so high because we asked for it. What if this thread lags 
behind enough that the connections get too high? It sounds like the purpose of 
this thread is to race to prevent Solr from doing what we asked it to do.

The idea of having a thread to deal with any connections that end up in a bad 
state unexpectedly makes sense, but is the cause of all these CLOSE_WAIT 
connections really from unexpected behavior?

I feel like I must be missing something.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375601#comment-15375601
 ] 

Shai Erera commented on SOLR-9290:
--

Oh I see. So we didn't experience the problem because we run w/ 2 replicas (and 
one shard currently) and with 5.4.1's settings the math for us results in a low 
number of connections. But someone running a larger Solr deployment could 
already hit that problem prior to 5.5. Thanks for the clarification!

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375589#comment-15375589
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

bq. I was asking about 5.3.2 – how could CLOSE_WAITs get high in 5.3.2 when 
maxConnectionsPerHost was the same as in 5.4.1?

5.3.2 has maxConnectionsPerHost=100 for updates and maxConnectionsPerHost=20 
for queries. So on a leader you may have 
100*replicationFactor+20*numShards*replicationFactor connections. For a large 
cluster with many shards and replicas, the overall number of such connections 
can be quite high.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375582#comment-15375582
 ] 

Shai Erera commented on SOLR-9290:
--

Regarding the patch, the monitor looks good. Few comments:

* I prefer that we name it {{IdleConnectionsMonitor}} (w/ 's', plural 
connections). It goes for the class, field and thread name.
* Do you intend to keep all the log statements around?
* Do you think we should make the polling interval (10s) and 
idle-connections-time (50s) configurable? Perhaps through system properties?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375571#comment-15375571
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Do you have only two replicas? Perhaps the maxConnectionsPerHost limit of 
100 is kicking in?

Yes, we do have only 2 replicas and I get why the CLOSE_WAITs stop at 100. I 
was asking about 5.3.2 -- how could CLOSE_WAITs get high in 5.3.2 when 
maxConnectionsPerHost was the same as in 5.4.1?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375560#comment-15375560
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

bq. I didn't see the monitor in the latest patch, only the log printouts. Did 
you forget to add it?

Sorry [~shaie], I noticed that after uploaded. I have uploaded the right patch 
now. Please review.

bq. (1) Can/Should we have a similar monitor for HttpShardHandlerFactory?

I think so. This patch was only for my tests.

bq. Any reason why the two don't share the same HttpClient instance?

Hmm. I think originally the idea was to keep the pools for indexing and 
querying separate but now that the limit (for updates) is so high, I wonder if 
it still makes sense. I mean, yes you can deadlock a distributed search because 
of high indexing and vice-versa if you share the pool but if you ever reach the 
high limit of 100,000 connections, you have more serious problems in the 
cluster anyway.

bq. I wonder why we don't see the problem with 5.4.1. I mean, we do see 
CLOSE_WAITs piling, but stop at ~100 (200 for the leader)

Do you have only two replicas? Perhaps the maxConnectionsPerHost limit of 100 
is kicking in?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375552#comment-15375552
 ] 

Shai Erera commented on SOLR-9290:
--

bq. I thought that hypothesis holds only after SOLR-8533. Are you saying you 
also saw it on 5.3.2? If so, what are the values that are set for these 
properties there? We definitely do not see the problem with 5.4.1, but we 
didn't test prior versions.

We posted at the same time, I read your answer above. I wonder why we don't see 
the problem with 5.4.1. I mean, we do see CLOSE_WAITs piling, but stop at ~100 
(200 for the leader).

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375548#comment-15375548
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~shalinmangar]. Few questions:

bq. Also, I think the reason this wasn't reproducible on master is because 
SOLR-4509 enabled eviction of idle threads by calling 
HttpClientBuilder#evictIdleConnections with a 50 second limit.

Is this something we can apply to 5x/6x too?

bq. This patch adds a monitor thread for the pool created in UpdateShardHandler 
and with this applied

I didn't see the monitor in the latest patch, only the log printouts. Did you 
forget to add it?

bq. There are still a few connections in CLOSE_WAIT at steady state but I 
verified that they belong to a different HttpClient instance in 
HttpShardHandlerFactory and other places.

(1) Can/Should we have a similar monitor for HttpShardHandlerFactory?
(2) Any reason why the two don't share the same HttpClient instance?

bq. This patch applies on 5.3.2
bq. We have a large limit for maxConnections and maxConnectionsPerHost

I thought that hypothesis holds only after SOLR-8533. Are you saying you also 
saw it on 5.3.2? If so, what are the values that are set for these properties 
there? We definitely *do not* see the problem with 5.4.1, but we didn't test 
prior versions.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375541#comment-15375541
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

{quote}
Shalin Shekhar Mangar mentioned that he's able to reproduce this in 5.3.2 as 
well, which was without SOLR-8533 so we certainly need to look at this more.

Shalin, can you confirm if you were running your tests in stock Solr ?
{quote}

Actually it is 5.3.2 with some kerberos patches but the client which originally 
reported the issue was using stock 5.3.2. I don't think the changes are 
relevant.

I believe this was a problem all along. It just got amplified with SOLR-8533 in 
5.5.x because now the limit is higher.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, SOLR-9290-debug.patch, 
> setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375534#comment-15375534
 ] 

Anshum Gupta commented on SOLR-9290:


[~shalinmangar] mentioned that he's able to reproduce this in 5.3.2 as well, 
which was without SOLR-8533 so we certainly need to look at this more.

Shalin, can you confirm if you were running your tests in stock Solr ?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375500#comment-15375500
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~yo...@apache.org], I'll read the issue.

I agree with what you write in general, but we do hit an issue with these 
settings. That that it reproduces easily with SSL enabled suggests that the 
issue may not be in Solr code at all, but I wonder if we shouldn't perhaps pick 
smaller default values if SSL is enabled? (Our guess at the moment is that HC 
keeps more connections in the pool when SSL is enabled because they are more 
expensive to initiate, but it's just a guess).

And maybe the proper solution would be what [~shalinmangar] wrote above -- have 
a bg monitor which closes idle/expired connections. I actually wonder why it 
can't be a property of {{ClientConnectionManager}} that you can set to auto 
close idle/expired connections after a period of time. We can potentially have 
that monitor act only if SSL is enabled (or at least until non-SSL exhibits the 
same problems too).

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375243#comment-15375243
 ] 

Yonik Seeley commented on SOLR-9290:


bq. if you have a link to a discussion about why it may lead to a distributed 
deadlock, I'd be happy to read it.
SOLR-683

Same logic applies to any internal general purpose thread pools or connection 
pools / connection limits.  Think of acquiring a thread like acquiring a lock.  
If there are going to be a limited number of resources, then one needs to be 
very careful under what circumstances those resources can be acquired.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375184#comment-15375184
 ] 

Shai Erera commented on SOLR-9290:
--

Also [~markrmil...@gmail.com], for education purposes, if you have a link to a 
discussion about why it may lead to a distributed deadlock, I'd be happy to 
read it.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375151#comment-15375151
 ] 

Shai Erera commented on SOLR-9290:
--

Thanks [~markrmil...@gmail.com]. In that case, what's your take on the issue at 
hand?

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375117#comment-15375117
 ] 

Mark Miller commented on SOLR-9290:
---

The defaults need to be very high to avoid distributed deadlock.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375082#comment-15375082
 ] 

Shai Erera commented on SOLR-9290:
--

An update -- I've modified our solr.xml (which is basically the vanilla 
solr.xml) with these added props (under the {{solrcloud}} element) and I do not 
see the connections spike anymore:

{noformat}
1
100
{noformat}

Those changes were part of SOLR-8533. [~markrmil...@gmail.com] on that issue 
you didn't explain why the defaults need to be set that high. Was there perhaps 
an email thread you can link to which includes more details? I ask because one 
thing I've noticed is that if I query {{solr/admin/info/system}}, the 
{{system.openFileDescriptorCount}} is very high when there are many 
CLOSE_WAITs. Such a change in Solr default probably need to be accompanied by 
an OS-level setting too, no?

I am still running tests with those props set in solr.xml, on top of 5.5.1. 
[~mbjorgan] would you mind testing in your environment too?

[~hoss...@fucit.org], sorry I completely missed your questions. Our solr.xml is 
the vanilla one, we didn't modify anything in it. We did uncomment the SSL 
props in solr.in.sh as the ref guide says, but aside from the key name and 
password, we didn't change any settings.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374779#comment-15374779
 ] 

Shai Erera commented on SOLR-9290:
--

bq. Interestingly, the number of connections stuck in CLOSE_WAIT decrease 
during indexing and increase again about 10 or so seconds after the indexing is 
stopped.

I've observed that too and it's not that they decrease, but rather that the 
connections change their state from CLOSE_WAIT to ESTABLISHED, then when 
indexing is done to TIME_WAIT and then finally to CLOSE_WAIT again. I believe 
this aligns with what the HC documentation says -- the connections are not 
necessarily released, but kept in the pool. When you re-index again, they are 
reused and go back to the pool.

bq. However, this commit only increases the limits on how many update 
connections that can be open

That's interesting and might be a temporary workaround for the problem, which I 
intend to test shortly. In 5.4.1 they were both modified to 100,000:

{noformat}
-  public static final int DEFAULT_MAXUPDATECONNECTIONS = 1;
-  public static final int DEFAULT_MAXUPDATECONNECTIONSPERHOST = 100;
+  public static final int DEFAULT_MAXUPDATECONNECTIONS = 10;
+  public static final int DEFAULT_MAXUPDATECONNECTIONSPERHOST = 10;
{noformat}

This can explain why we run into trouble with 5.5.1 but not with 5.4.1. Though 
even in 5.4.1 there are few hundreds of CLOSE_WAIT connections, with 5.5.1 they 
reach (in our case) the orders of 35-40K, at which point Solr became useless, 
not being able to talk to the replica or pretty much anything else.

I see these can be defined in solr.xml, though it's not documented how, so I'm 
going to give it a try and will report back here.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-13 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15374478#comment-15374478
 ] 

Mads Tomasgård Bjørgan commented on SOLR-9290:
--

I performed a bisect, yielding some commit fo 5.4.1 as good, and a commit from 
5.5.3 as bad. This gave the following commit: 
ad9b87a7285e444cd61fffb83c0aee06c8f7cef0, as the first bad commit. However, 
this commit only increases the limits on how many update connections that can 
be open. Thus - this problem affects version 5.4.1 aswell - but is harder to 
see as Solr isn't allowed to use that many connections when updating.
I built Solr on top of the last commit from branch_5_4 
(7d52c2523c7a4ff70612742b76b934a12b493331), and implemented the commit that was 
supposed to be bad, and ended up with the same CLOSE_WAIT-leak.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
> Attachments: SOLR-9290-debug.patch, setup-solr.sh
>
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-12 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373452#comment-15373452
 ] 

Shalin Shekhar Mangar commented on SOLR-9290:
-

It is reproducible very easily on stock solr with SSL enabled. My test setup 
creates two SSL-enabled Solr instances with a 5 shard x 2 replica collection 
and runs a short indexing program (just 9 update requests with 1 document each 
and a commit at the end). Keep on running the indexing program repeatedly and 
the number of connections in the CLOSE_WAIT state gradually increase.

Interestingly, the number of connections stuck in CLOSE_WAIT decrease during 
indexing and increase again about 10 or so seconds after the indexing is 
stopped.

I can reproduce the problem on 6.1, 6.0, 5.5.1, 5.3.2. I am not able to 
reproduce this on master although I don't see anything relevant that has 
changed since 6.1 -- I tried this only once so it may have just been bad timing?

When the connections show in CLOSE_WAIT state, the recv-q buffer always has 
exactly 70 bytes.
{code}
netstat -tonp | grep CLOSE_WAIT | grep java
tcp   70  0 127.0.0.1:56538 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47995 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47477 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:47996 127.0.1.1:8984  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:56644 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
tcp   70  0 127.0.0.1:56533 127.0.1.1:8983  CLOSE_WAIT  
21654/java   off (0.00/0/0)
...
{code}

If I run the same steps with SSL disabled then the connections in CLOSE_WAIT 
state have just 1 byte in recv-q. I don't see the number of such connections 
increasing with indexing over time but I know for a fact (from a client) that 
eventually more and more connections pile up in this state even without SSL.
{code}
tcp   1  0 127.0.0.1:41723 127.0.1.1:8983  CLOSE_WAIT  
2522/javaoff (0.00/0/0)
tcp   1  0 127.0.0.1:41780 127.0.1.1:8983  CLOSE_WAIT  
2640/javaoff (0.00/0/0)
...
{code}

I enabled debug logging for PoolingHttpClientConnectionManager (used in 6.x) 
and PoolingClientConnectionManager (used in 5.x.x) and after running the 
indexing program and verifying that some connections are in CLOSE_WAIT, I 
grepped the logs for connections leased vs released and I always find the 
number to be the same which means that the connections are always given back to 
the pool.

Now some connections hanging around in CLOSE_WAIT are to be expected because of 
the following (quoted from the httpclient documentation):
{quote}
One of the major shortcomings of the classic blocking I/O model is that the 
network socket can react to I/O events only when blocked in an I/O operation. 
When a connection is released back to the manager, it can be kept alive however 
it is unable to monitor the status of the socket and react to any I/O events. 
If the connection gets closed on the server side, the client side connection is 
unable to detect the change in the connection state (and react appropriately by 
closing the socket on its end).
HttpClient tries to mitigate the problem by testing whether the connection is 
'stale', that is no longer valid because it was closed on the server side, 
prior to using the connection for executing an HTTP request. The stale 
connection check is not 100% reliable. The only feasible solution that does not 
involve a one thread per socket model for idle connections is a dedicated 
monitor thread used to evict connections that are considered expired due to a 
long period of inactivity. The monitor thread can periodically call 
ClientConnectionManager#closeExpiredConnections() method to close all expired 
connections and evict closed connections from the pool. It can also optionally 
call ClientConnectionManager#closeIdleConnections() method to close all 
connections that have been idle over a given period of time.
{quote}

I'm going to try adding such a monitor thread and see if this is still a 
problem.

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At

[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371758#comment-15371758
 ] 

Hoss Man commented on SOLR-9290:


questions specifically for [~shaie] followng up on comments made in the mailing 
list thread mentioned in the isue summary...

{quote}
When it does happen, the number of CLOSE_WAITS climb very high, to the order of 
30K+ entries in 'netstat'.
...
When I say it does not reproduce on 5.4.1 I really mean the numbers
don't go as high as they do in 5.5.1. Meaning, when running without
SSL, the number of CLOSE_WAITs is smallish, usually less than a 10 (I
would separately like to understand why we have any in that state at
all). When running with SSL and 5.4.1, they stay low at the order of
hundreds the most.
{quote}

* Does this only reproduce in your application, with your customized configs of 
Solr, or can you reproduce it using something trivial like "modify 
bin/solr.in.sh to point at an SSL cert, then run; {{bin/solr -noprompt 
-cloud}}." ?
* Does the problem only manifest solely with indexing, or with queries as well? 
ie...
** assuming a pre-built collection, and then all nodes restarted, does 
hammering the cluster with read only queries manifest the problem?
** assuming a virgin cluster with no docs, does hammering the cluster w/updates 
but never any queries, manifest the problem?
* Assuming you start by bringing up a virgin cluster and then begin hammering 
it with whatever sequences of requests are needed to manifest the problem, how 
long do you have to wait before the number of CLOSE_WAITS spikes high enough 
that you are reasonably confident the problem has occured?

The last question being a pre-req to wondering if we can just git bisect to 
identify where/when the problem originated.  

Even if writing a (reliable) bash automation script (to start the cluster, 
_and_ triggering requests, _and_ monitoring the CLOSE_WAITS to see if they go 
over a specified threshold in under a specified timelimit, _and_ shut 
everything down cleanly) is too cumbersome to have faith in running an 
automated {{git bisect run test.sh}}, we could still consider doing some 
manually driven git bisection to try and track this down, as long as each 
iteration doesn't take very long.

Specifically: {{git merge-base}} says ffadf9715c4a511178183fc1411b18c1701b9f1d 
is the common ancestor for 5.4.1 and 5.5.1, and {{git log}} says there are 487 
commits between that point and the 5.5.1 tag.  Statistically speaking it should 
only take 
~10 iterations to do a binary search of those commits to find the first 
problematic one.

Assuming there is a manual process someone can run on a clean git checkout of 
5.4.1 that takes under 10 minutes to get from "ant clean server" to an obvious 
splke in CLOSE_WAITS, someone with some CPU cycles to spare who doesn't mind a 
lot of context switching while they do their day job could be...
# running a command to spin up the cluster & client hammering code
# setting a 10 minute timer
# when the timer goes off, check the results of another command to count the 
CLOSE_WAITS
# {{git bisect good/bad}}
# repeat
...and within ~2-3 hours should almost certainly have tracked down when/where 
the problem started.



> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#633

[jira] [Commented] (SOLR-9290) TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled

2016-07-07 Thread Johannes Meyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-9290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15367313#comment-15367313
 ] 

Johannes Meyer commented on SOLR-9290:
--

We have the same issue on Solr 6.1.0

> TCP-connections in CLOSE_WAIT spikes during heavy indexing when SSL is enabled
> --
>
> Key: SOLR-9290
> URL: https://issues.apache.org/jira/browse/SOLR-9290
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 5.5.1, 5.5.2
>Reporter: Anshum Gupta
>Priority: Critical
>
> Heavy indexing on Solr with SSL leads to a lot of connections in CLOSE_WAIT 
> state. 
> At my workplace, we have seen this issue only with 5.5.1 and could not 
> reproduce it with 5.4.1 but from my conversation with Shalin, he knows of 
> users with 5.3.1 running into this issue too. 
> Here's an excerpt from the email [~shaie] sent to the mailing list  (about 
> what we see:
> {quote}
> 1) It consistently reproduces on 5.5.1, but *does not* reproduce on 5.4.1
> 2) It does not reproduce when SSL is disabled
> 3) Restarting the Solr process (sometimes both need to be restarted), the
> count drops to 0, but if indexing continues, they climb up again
> When it does happen, Solr seems stuck. The leader cannot talk to the
> replica, or vice versa, the replica is usually put in DOWN state and
> there's no way to fix it besides restarting the JVM.
> {quote}
> Here's the mail thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201607.mbox/%3c46cc66220a8143dc903fa34e79205...@vp-exc01.dips.local%3E
> Creating this issue so we could track this and have more people comment on 
> what they see. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org