[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-28 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752699#comment-13752699
 ] 

Mike Schrag commented on SOLR-5081:
---

I think we tracked this down on our side. We noticed when testing another part 
of the system that we had SYN flood warnings in the system logs. I believe the 
kernel was blocking traffic to the Solr port once it believed that Hadoop was 
attacking it. By turning off net.ipv4.tcp_syncookies and increasing the 
net.ipv4.tcp_max_syn_backlog, the problem seems to have gone away. This also 
explains why I was able to connect to Solr and insert still from another 
machine even when accessed died from the Hadoop cluster.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-28 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13752814#comment-13752814
 ] 

Erick Erickson commented on SOLR-5081:
--

Mike:

Thanks for letting us know! This is a tricky one

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-22 Thread Kevin Osborn (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13748047#comment-13748047
 ] 

Kevin Osborn commented on SOLR-5081:


I may have this issue as well. I am posting batches of 1000 through SolrJ. I 
have autoCommit set to 15000 with openSearcher=false. autoSoftCommit is set to 
3. During my initial testing, I was able to recreate it after just a couple 
updates. I then change the limit of the number of open files for the process 
from 4096 to 15000. This seemed to help, but only to a point.

If all my updates are at once, it seems to succeed. But if I have pauses 
between updates, it seems to have problems. I have also only seen this error 
when I have more than 1 node in my SolrCloud cluster.

I also took a look at netstat. There seemed to be a lot of connections between 
my two nodes. Could the the frequency of my updates be overwhelming the 
connection from the leader to the replica?

Deletes also fail, but queries still seem to work.

Restarting the nodes fixes the problem.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-06 Thread Yago Riveiro (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730995#comment-13730995
 ] 

Yago Riveiro commented on SOLR-5081:


I have this problem too, but in my case, Solr hangs and I can done more 
insertions without restart the nodes.

I do the insertion using a culr post in json format like:

curl http://127.0.0.1:8983/solr/collection/update --data-binary @data -H 
'Content-type:application/json'

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-05 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13730269#comment-13730269
 ] 

Hoss Man commented on SOLR-5081:


bq. I actually did this exact test when I was in this state originally, and the 
insert worked, which totally confused the situation for me. 

ok ... hold up ... basically what you're saying is the first time i saw this 
problem (solrcloud hangs and is deadlocked under heavy document insertion 
load) i tried to insert a single document and it worked.

...which makes no sense to me because if that's the case, then what exactly do 
you mean by hangs and deadlocked ? 

So let's back up: 

* what do you observe about your system that leads you to believe there is a 
problem? 
* what aspect of your observations doesn't match what you expect?
* what do you expect to observe? 
* how are you making these observations? 

Wild shot in the dark: what does your indexng code look like?  is it possible 
that your indexing code is encountering some deadlock of it's own, 
independent of anything happening in solr?  If you are using solrj, can you get 
thread dumps from your indexing client apps when you observe this deadlock 
sitaution (again: this info is useless unless we have a better understanding of 
what exactly you are observing that you think indicates a problem)





 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726746#comment-13726746
 ] 

Noble Paul commented on SOLR-5081:
--

[~mikeschrag] COuld you get any more thread dumps?

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-01 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726801#comment-13726801
 ] 

Mike Schrag commented on SOLR-5081:
---

I grabbed more and they all look basically the same as the attached, which is 
to say, it sort of looks like Solr isn't doing ANYTHING. I'm going to look into 
whether I'm crushing ZooKeeper, and maybe my requests aren't even getting to 
Solr.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-01 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726831#comment-13726831
 ] 

Erick Erickson commented on SOLR-5081:
--

Yeah, that is odd. The stack traces you sent basically showed no deadlocks, 
nothing interesting at all. I suspect pursuing whether anything is getting to 
Solr or not is a good idea

H, blunt-instrument test when the cluster is hung. What happens if you, 
say, submit a query directly to one of the nodes? Does it respond or do you see 
anything in the solr log on that node? Tip: adding distrib=false to the 
_query_ will not try to send sub-queries to other shards.

And I wonder what happens if you, say, use post.jar (comes with the example) to 
try to send a doc to Solr when it's hung, anything?

Clearly I'm grasping at straws here, but I'm kind of out of good ideas.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-08-01 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726848#comment-13726848
 ] 

Mike Schrag commented on SOLR-5081:
---

I actually did this exact test when I was in this state originally, and the 
insert _worked_, which totally confused the situation for me. However, in light 
of seeing nothing in the traces, it supports the theory that the cluster isn't 
hung, but rather I'm somehow not even getting that far in the Hadoop cluster. 
ZK was my best guess as something that maybe could be an earlier stage failure, 
but even that I would expect to have hang the test-insert. So I need to do a 
little more forensics here and see if I can get a better picture of wtf is 
going on.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-30 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723853#comment-13723853
 ] 

Mark Miller commented on SOLR-5081:
---

This is likely the same issue that has come up before - and it has nothing to 
do with cloudsolrserver - its more likely how we limit the number of threads 
that are used to forward on updates- and the nodes can talk back and forth to 
each other, run out of threads, and deadlock. It's similar to the distrib 
deadlock issue. It's been a known issue for many months, just have not had a 
chance to look into it closely yet.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-30 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723856#comment-13723856
 ] 

Erick Erickson commented on SOLR-5081:
--

Agreed, although we should be able to see the deadlock on the semaphore that we 
saw before in SolrCmdDistributor in here somewhere, and it's not in the stack 
trace we've seen so far.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-30 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723878#comment-13723878
 ] 

Mark Miller commented on SOLR-5081:
---

bq. the stack trace we've seen so far.

Those traces are suspect for the problem described I think. Regardless, for 
this type of thing, it would be great to get the traces from a couple machines 
rather than just one.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-30 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723996#comment-13723996
 ] 

Mike Schrag commented on SOLR-5081:
---

I'll kill it again today and grab traces from a few of the nodes.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723378#comment-13723378
 ] 

Noble Paul commented on SOLR-5081:
--

Can you please throw some more light into the system

# numShards
# Replication factor
# maxShardsPerNode (I guess it is 1)
# Average size per doc 
# VM startup params (-Xmx -Xms, GC params etc)
# How are you indexing? Are you using SolrJ and the CloudSolrServer? How many 
clients are used to index the data?

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-29 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13723402#comment-13723402
 ] 

Mike Schrag commented on SOLR-5081:
---

1. numShards=20
2. RF=3
3. maxShardsPerNode=1000 (aka just a big number .. we overcommit shards in 
this environment)
4. not very big ... maybe 0.5-1k
5. -Xms10g -Xmx10g -XX:MaxPermSize=1G -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode -XX:CMSInitiatingOccupancy
Fraction=60 -XX:-OmitStackTraceInFastThrow
6. SolrJ + CloudSolrServer + when you say clients, do you mean threads, or 
actual client JVM instances? Talking more generically in terms of threads, I 
know it works at around 15-20 threads, but 100 threads makes it go sadfaced

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721756#comment-13721756
 ] 

Erick Erickson commented on SOLR-5081:
--

Can you do a jstack on one or more of your Solr servers? There's some 
distributed deadlock possibility and it would be good to see if this is the 
same problem.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721761#comment-13721761
 ] 

Mike Schrag commented on SOLR-5081:
---

I attached a jstack of one of them (threads.txt) during this event. Do you want 
me to reproduce and grab more?

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721764#comment-13721764
 ] 

Mike Schrag commented on SOLR-5081:
---

btw, I dropped the hadoop cluster to doing single-record batches in the run 
corresponding to that stack dump. I saw a note somewhere (I think from you?) 
that suggested increasing the semaphore permits, which I was about to test, 
too. It's not clear what a reasonable value is, but I jacked it up from *16 to 
*1024 and figured I'd go for broke :)

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721767#comment-13721767
 ] 

Erick Erickson commented on SOLR-5081:
--

You really want to go for broke? Try SOLR-4816 (note assuming you're indexing 
from SolrJ). The deadlock I've seen has to do with intra-shard routing, 
essentially forwarding the packets to other shards, if there are enough 
packets, can lead to this situation. That JIRA is about having SolrJ just send 
the documents to the right leader so it will not have to route the docs to 
other shards. We'd be really interested to see if that worked in the real 
world...

NOTE: I'm not sure what the current state of that patch is, I think it was 
ready to rock-n-roll but just missed the cut for 4.4.



 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721781#comment-13721781
 ] 

Mike Schrag commented on SOLR-5081:
---

(that's with the latest SOLR-4816 patch applied)

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5081) Highly parallel document insertion hangs SolrCloud

2013-07-27 Thread Mike Schrag (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13721780#comment-13721780
 ] 

Mike Schrag commented on SOLR-5081:
---

No luck :(  Whatever this hang is doesn't appear to be the same as that.

 Highly parallel document insertion hangs SolrCloud
 --

 Key: SOLR-5081
 URL: https://issues.apache.org/jira/browse/SOLR-5081
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.3.1
Reporter: Mike Schrag
 Attachments: threads.txt


 If I do a highly parallel document load using a Hadoop cluster into an 18 
 node solrcloud cluster, I can deadlock solr every time.
 The ulimits on the nodes are:
 core file size  (blocks, -c) 0
 data seg size   (kbytes, -d) unlimited
 scheduling priority (-e) 0
 file size   (blocks, -f) unlimited
 pending signals (-i) 1031181
 max locked memory   (kbytes, -l) unlimited
 max memory size (kbytes, -m) unlimited
 open files  (-n) 32768
 pipe size(512 bytes, -p) 8
 POSIX message queues (bytes, -q) 819200
 real-time priority  (-r) 0
 stack size  (kbytes, -s) 10240
 cpu time   (seconds, -t) unlimited
 max user processes  (-u) 515590
 virtual memory  (kbytes, -v) unlimited
 file locks  (-x) unlimited
 The open file count is only around 4000 when this happens.
 If I bounce all the servers, things start working again, which makes me think 
 this is Solr and not ZK.
 I'll attach the stack trace from one of the servers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org