Hi Koji, I reckon the best would be to raise this issue on Solr user list. I’m not sure if you could get any more help about it here.
Andor > On 2019. Dec 14., at 1:09, Kojo <rbsnk...@gmail.com> wrote: > > Shawn, > unfortunately, this ulimit values are for the solr user. I already checked > for the zk user, we set the same values. > No constrain for process creation. > > This box is 128Gb, and Solr starts with 32Gb heap memory. Only one small > collection ~400k documents. > > I see no resources constrain. > I see no application level (Python), doing anything wrong. > > I am looking for any clue to solve this problem. > > Is it usefull if I start Solr and set memory dump, in case of crash? > > - > > /opt/solr-6.6.2/bin/solr -m 32g -e cloud -z localhost:2181 -a > "-XX:+HeapDumpOnOutOfMemoryError" -a > "-XX:HeapDumpPath=/opt/solr-6.6.2/example/cloud/node1/logs/archived" > > > Thank you, > Koji > > > Em sex., 13 de dez. de 2019 às 18:37, Shawn Heisey <apa...@elyograg.org> > escreveu: > >> On 12/13/2019 11:01 AM, Kojo wrote: >>> We had already changed SO configuration before the last crash, so I think >>> that the problem is not there. >>> >>> ulimit -a >>> core file size (blocks, -c) 0 >>> data seg size (kbytes, -d) unlimited >>> scheduling priority (-e) 0 >>> file size (blocks, -f) unlimited >>> pending signals (-i) 257683 >>> max locked memory (kbytes, -l) 64 >>> max memory size (kbytes, -m) unlimited >>> open files (-n) 65535 >>> pipe size (512 bytes, -p) 8 >>> POSIX message queues (bytes, -q) 819200 >>> real-time priority (-r) 0 >>> stack size (kbytes, -s) 8192 >>> cpu time (seconds, -t) unlimited >>> max user processes (-u) 65535 >>> virtual memory (kbytes, -v) unlimited >>> file locks (-x) unlimited >> >> Are you running this ulimit command as the same user that is running >> your Solr process? It must be the same user to learn anything useful. >> This output indicates that the user that's running the ulimit command is >> allowed to start 64K processes, which I would think should be enough. >> >> Best guess here is that the actual user that's running Solr does *NOT* >> have its limits increased. It may be a different user than you're using >> to run the ulimit command. >> >>> When Solr tries to delete a znode? I´am sorry, because I understand >> nothing >>> about this process, and it is the only point that seems suspicios for me. >>> Do you think that it can cause inconsistency leading to the OOM problem? >> >> OOME isn't caused by inconsistencies at the application level. It's a >> low-level problem, an indication that Java tried to do something >> required to run the program that it couldn't do. >> >> I assume that it's Solr trying to delete the znode, because the node >> path has solr in it. It will be the ZK client running inside Solr >> that's actually trying to do the work, but Solr code probably initiated it. >> >>> Just after this INFO message above, ZK log starts to log thousands of >> this >>> block of lines below. Where it seems that ZK creates and closes thousands >>> of sessions. >> >> I responded to this thread because I have some knowledge about Solr. I >> really have no idea what these additional ZK server logs might mean. >> The one that you quoted before was pretty straightforward, so I was able >> to understand it. >> >> Anything that gets logged after an OOME is suspect and may be useless. >> The execution of a Java program after OOME is unpredictable, because >> whatever was being run when the OOME was thrown did NOT successfully >> execute. >> >> Thanks, >> Shawn >>