Re: How do I track down a painfully long pause in a small web app?
You can use jstat -gcutil pid 2000 to print the GC statistics every 2 seconds, http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstat.html It the long pause is from GC, the columns FGCT/FGC values would be large. If you think it's a swap issue, you may want to use vmstat 1 1 watch out the si/so columns. What's your jvm arguments? Too small heap memory size may be the issue. 2014-09-24 9:47 GMT+08:00 larry google groups lawrencecloj...@gmail.com: I'm guessing that strace is showing me userland threads? When I quit strace I see: ^CProcess 19363 detached Process 19364 detached Process 19365 detached Process 19366 detached Process 19367 detached Process 19368 detached Process 19369 detached Process 19370 detached Process 19371 detached Process 19372 detached Process 19377 detached Process 19378 detached Process 19379 detached Process 19380 detached Process 19381 detached Process 19382 detached Process 19383 detached Process 19384 detached Process 19385 detached Process 19386 detached Process 19387 detached Process 19388 detached Process 19389 detached Process 19390 detached Process 19391 detached Process 19392 detached Process 19393 detached Process 19394 detached Process 19395 detached Process 19396 detached Process 19397 detached Process 19398 detached Process 19399 detached Process 19400 detached Process 19401 detached Process 19402 detached Process 19403 detached Process 19404 detached Process 19405 detached Process 19406 detached Process 19407 detached Process 19408 detached Process 19409 detached Process 19410 detached Process 19606 detached % time seconds usecs/call callserrors syscall -- --- --- - - 90.06 40.9730721363 30059 10449 futex 4.231.926411 819 2353 epoll_wait 3.021.373282 11444012 6 restart_syscall 1.240.563107 93851 6 accept 0.990.449988 12 36909 gettimeofday 0.350.156992 5 29410 clock_gettime 0.050.021064 67 316 recvfrom 0.020.010338 30 347 write 0.010.005117 24 209 sendto 0.010.004369 24 180 poll 0.010.002683 24 11222 read 0.010.002563 24 108 6 epoll_ctl 0.000.001618 14 112 open 0.000.001189 5 230 fcntl 0.000.001132 8 142 mprotect 0.000.000969 8 118 close 0.000.000806 3821 writev 0.000.000685 6 109 ioctl 0.000.000655 6 110 fstat 0.000.000229 1317 mmap 0.000.000216 36 6 shutdown 0.000.000197 33 6 dup2 0.000.92 46 2 madvise 0.000.61 512 setsockopt 0.000.57 14 4 munmap 0.000.56 512 getsockname 0.000.35 4 8 rt_sigprocmask 0.000.18 5 4 sched_getaffinity 0.000.10 5 2 clone 0.000.09 9 1 rt_sigreturn 0.000.09 5 2 uname 0.000.09 5 2 set_robust_list 0.000.08 4 2 gettid -- --- --- - - 100.00 45.497046100943 10483 total On Tuesday, September 23, 2014 9:44:52 PM UTC-4, larry google groups wrote: I am intrigued by this article, as the problem sounds the same as mine: http://corner.squareup.com/2014/09/logging-can-be-tricky.html No significant amount of resources appeared to be in use — disk I/O, network I/O, CPU, and memory all looked fairly tame. Furthermore, the bulk of queries being served were all performing as expected. So I tried to follow their example regarding strace. But I have never worked with strace before. I used grep to find the PID and then I: sudo strace -c -f -p 19363 and I got: Process 19363 attached with 45 threads Then I ran our health check which is like a series of functional tests that ping our actual app (a live environment rather than a test environment). I got nothing out of strace except these 2 lines appeared: Process 20973 attached Process 20974 attached What does this mean? I had the impression that the JVM ran in 1 process? Does strace show me userland threads (like htop does) or are these child processes? On Monday,
Re: How do I track down a painfully long pause in a small web app?
I am intrigued by this article, as the problem sounds the same as mine: http://corner.squareup.com/2014/09/logging-can-be-tricky.html No significant amount of resources appeared to be in use — disk I/O, network I/O, CPU, and memory all looked fairly tame. Furthermore, the bulk of queries being served were all performing as expected. So I tried to follow their example regarding strace. But I have never worked with strace before. I used grep to find the PID and then I: sudo strace -c -f -p 19363 and I got: Process 19363 attached with 45 threads Then I ran our health check which is like a series of functional tests that ping our actual app (a live environment rather than a test environment). I got nothing out of strace except these 2 lines appeared: Process 20973 attached Process 20974 attached What does this mean? I had the impression that the JVM ran in 1 process? Does strace show me userland threads (like htop does) or are these child processes? On Monday, September 15, 2014 12:15:14 AM UTC-4, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
I'm guessing that strace is showing me userland threads? When I quit strace I see: ^CProcess 19363 detached Process 19364 detached Process 19365 detached Process 19366 detached Process 19367 detached Process 19368 detached Process 19369 detached Process 19370 detached Process 19371 detached Process 19372 detached Process 19377 detached Process 19378 detached Process 19379 detached Process 19380 detached Process 19381 detached Process 19382 detached Process 19383 detached Process 19384 detached Process 19385 detached Process 19386 detached Process 19387 detached Process 19388 detached Process 19389 detached Process 19390 detached Process 19391 detached Process 19392 detached Process 19393 detached Process 19394 detached Process 19395 detached Process 19396 detached Process 19397 detached Process 19398 detached Process 19399 detached Process 19400 detached Process 19401 detached Process 19402 detached Process 19403 detached Process 19404 detached Process 19405 detached Process 19406 detached Process 19407 detached Process 19408 detached Process 19409 detached Process 19410 detached Process 19606 detached % time seconds usecs/call callserrors syscall -- --- --- - - 90.06 40.9730721363 30059 10449 futex 4.231.926411 819 2353 epoll_wait 3.021.373282 11444012 6 restart_syscall 1.240.563107 93851 6 accept 0.990.449988 12 36909 gettimeofday 0.350.156992 5 29410 clock_gettime 0.050.021064 67 316 recvfrom 0.020.010338 30 347 write 0.010.005117 24 209 sendto 0.010.004369 24 180 poll 0.010.002683 24 11222 read 0.010.002563 24 108 6 epoll_ctl 0.000.001618 14 112 open 0.000.001189 5 230 fcntl 0.000.001132 8 142 mprotect 0.000.000969 8 118 close 0.000.000806 3821 writev 0.000.000685 6 109 ioctl 0.000.000655 6 110 fstat 0.000.000229 1317 mmap 0.000.000216 36 6 shutdown 0.000.000197 33 6 dup2 0.000.92 46 2 madvise 0.000.61 512 setsockopt 0.000.57 14 4 munmap 0.000.56 512 getsockname 0.000.35 4 8 rt_sigprocmask 0.000.18 5 4 sched_getaffinity 0.000.10 5 2 clone 0.000.09 9 1 rt_sigreturn 0.000.09 5 2 uname 0.000.09 5 2 set_robust_list 0.000.08 4 2 gettid -- --- --- - - 100.00 45.497046100943 10483 total On Tuesday, September 23, 2014 9:44:52 PM UTC-4, larry google groups wrote: I am intrigued by this article, as the problem sounds the same as mine: http://corner.squareup.com/2014/09/logging-can-be-tricky.html No significant amount of resources appeared to be in use — disk I/O, network I/O, CPU, and memory all looked fairly tame. Furthermore, the bulk of queries being served were all performing as expected. So I tried to follow their example regarding strace. But I have never worked with strace before. I used grep to find the PID and then I: sudo strace -c -f -p 19363 and I got: Process 19363 attached with 45 threads Then I ran our health check which is like a series of functional tests that ping our actual app (a live environment rather than a test environment). I got nothing out of strace except these 2 lines appeared: Process 20973 attached Process 20974 attached What does this mean? I had the impression that the JVM ran in 1 process? Does strace show me userland threads (like htop does) or are these child processes? On Monday, September 15, 2014 12:15:14 AM UTC-4, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service
Re: How do I track down a painfully long pause in a small web app?
If you turn on verbose gc for the JVM you could at least rule out GC pauses. Hmm, exactly how do you route the requests through the apache server? It almost sounds like your applikation is restarted every now and then, iirc Apache only servers a limited amount of requests per server thread. If this somehow started a new JVM per apache thread things would go strange. What does $ps ax --forest say? /Linus Den 15 sep 2014 06:44 skrev Shantanu Kumar kumar.shant...@gmail.com: Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
GC would be the first suspect, but then it could also be combined with a swap issue, or a JVM bug. Have a look at this article, which ends with a concrete list of things to do: https://blogs.oracle.com/poonam/entry/troubleshooting_long_gc_pauses -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
Use the jvisualvm tool that comes with the jdk- you should be able to connect to the clojure process. Looking at the memory usage graphs, and if the heap size is banging against the max heap size, then you might just be using too small a heap size - try upping it. You can also install the visualgc plugin for jvisualvm to get more info on timings. Alternatively go to the Threads pane, and click Thread Dump during the 30 second pause - you should be able to confirm what actual code is running at this point, which might give a clue to what is going on. If you have a memory leak, the Heap Dump button on the Monitor tab lets you interactively explore all memory in the jvm. If there is a lot of something, that might be the thing that is leaking. On Mon, Sep 15, 2014 at 7:55 AM, François Rey fmj...@gmail.com wrote: GC would be the first suspect, but then it could also be combined with a swap issue, or a JVM bug. Have a look at this article, which ends with a concrete list of things to do: https://blogs.oracle.com/poonam/entry/troubleshooting_long_gc_pauses -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. I will add a dummy REST call, although this pause does not seem specific to a particular API call. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. We started on a small instance but recently we moved up to a reasonably powered machine with 4 gigs of RAM. Have you tried printing GC info? No, but I will. Thank you. On Monday, September 15, 2014 12:44:54 AM UTC-4, Shantanu Kumar wrote: Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
Hmm, exactly how do you route the requests through the apache server? It almost sounds like your applikation is restarted every now and then, iirc Apache only servers a limited amount of requests per server thread. Interesting if true, but I assume there would be an error if 2 instances of the app both tried to grab the same port. In: /etc/apache2/sites-available/000-default.conf which is the only config file we are using right now, I have this: ProxyPass /user/ http://127.0.0.1:34002/ ProxyPassReverse /user/ http://127.0.0.1:34002/ ProxyPass /user http://127.0.0.1:34002/ ProxyPassReverse /user http://127.0.0.1:34002/ This works fine. The requests proxy through Apache and back. I repeated the line to deal with the trailing / On Monday, September 15, 2014 2:45:13 AM UTC-4, Linus Ericsson wrote: If you turn on verbose gc for the JVM you could at least rule out GC pauses. Hmm, exactly how do you route the requests through the apache server? It almost sounds like your applikation is restarted every now and then, iirc Apache only servers a limited amount of requests per server thread. If this somehow started a new JVM per apache thread things would go strange. What does $ps ax --forest say? /Linus Den 15 sep 2014 06:44 skrev Shantanu Kumar kumar.s...@gmail.com javascript:: Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.com javascript: Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+u...@googlegroups.com javascript: For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this
Re: How do I track down a painfully long pause in a small web app?
If this somehow started a new JVM per apache thread things would go strange. What does $ps ax --forest say? That is a good thought, but I only see it once. On Monday, September 15, 2014 2:45:13 AM UTC-4, Linus Ericsson wrote: If you turn on verbose gc for the JVM you could at least rule out GC pauses. Hmm, exactly how do you route the requests through the apache server? It almost sounds like your applikation is restarted every now and then, iirc Apache only servers a limited amount of requests per server thread. If this somehow started a new JVM per apache thread things would go strange. What does $ps ax --forest say? /Linus Den 15 sep 2014 06:44 skrev Shantanu Kumar kumar.s...@gmail.com javascript:: Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.com javascript: Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+u...@googlegroups.com javascript: For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received
Re: How do I track down a painfully long pause in a small web app?
Okay, I will dig into jvisualvm. Thanks. On Monday, September 15, 2014 5:53:34 AM UTC-4, David Powell wrote: Use the jvisualvm tool that comes with the jdk- you should be able to connect to the clojure process. Looking at the memory usage graphs, and if the heap size is banging against the max heap size, then you might just be using too small a heap size - try upping it. You can also install the visualgc plugin for jvisualvm to get more info on timings. Alternatively go to the Threads pane, and click Thread Dump during the 30 second pause - you should be able to confirm what actual code is running at this point, which might give a clue to what is going on. If you have a memory leak, the Heap Dump button on the Monitor tab lets you interactively explore all memory in the jvm. If there is a lot of something, that might be the thing that is leaking. On Mon, Sep 15, 2014 at 7:55 AM, François Rey fmj...@gmail.com javascript: wrote: GC would be the first suspect, but then it could also be combined with a swap issue, or a JVM bug. Have a look at this article, which ends with a concrete list of things to do: https://blogs.oracle.com/poonam/entry/troubleshooting_long_gc_pauses -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clo...@googlegroups.com javascript: Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+u...@googlegroups.com javascript: For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com javascript:. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
I don't have any experience configuring Clojure apps on the JVM, yet, but it may be that increasing the RAM on the server does not increase the RAM allocated to the JVM instance Clojure is running on. Aria Media Sagl Via Rompada 40 6987 Caslano Switzerland +41 (0)91 600 9601 +41 (0)76 303 4477 cell skype: ariamedia On Mon, Sep 15, 2014 at 4:04 PM, larry google groups lawrencecloj...@gmail.com wrote: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. I will add a dummy REST call, although this pause does not seem specific to a particular API call. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. We started on a small instance but recently we moved up to a reasonably powered machine with 4 gigs of RAM. Have you tried printing GC info? No, but I will. Thank you. On Monday, September 15, 2014 12:44:54 AM UTC-4, Shantanu Kumar wrote: Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options. html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to
How do I track down a painfully long pause in a small web app?
I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: How do I track down a painfully long pause in a small web app?
Few thing to consider: 1. Which API calls pause? If only certain calls pause, then probably you have something specific to suspect. Try adding a dummy REST call - see if that call pauses while others do. 2. Is any of your services running on a t1.micro or a burst-oriented EC2 instance on AWS? Try changing the instance type in that case. 3. Can you mock out the components that you suspect could be a problem? Begin by mocking out everything you suspect, then replace the mock with actual impl one component at a time until you isolate the problematic component. 4. Have you tried running a profiler? 5. Have you tried printing GC info? Maybe this could be useful: http://blog.ragozin.info/2011/09/hotspot-jvm-garbage-collection-options.html Shantanu On Monday, 15 September 2014 09:45:14 UTC+5:30, larry google groups wrote: I have an embarrassing problem. I convinced my boss that I could use Clojure to build a RESTful API. I was successful in so far as that went, but now I face the issue that every once in a while, the program pauses, for a painfully long time -- sometimes 30 seconds, which causes some requests to the API to timeout. We are still in testing, so there is no real load on the app, just the frontenders, writing Javascript and making Ajax calls to the service. The service seems like a basic Clojure web app. I use Jetty as the webserver, and the libraries in use are: Ring Compojure Liberator Monger Timbre Lamina Dire When someone complains about the pauses, I will go test the service, and I can hit with 40 requests in 10 seconds and it has great performance. The pauses actually seem to come after periods of inactivity, which made me think that this had something to do with garbage collection, except that the pauses are so extreme -- like I said, sometimes as much as 30 seconds, causing requests to timeout. When the app does finally start to respond it again, it goes very fast, and responds to those pending request very fast. But I have to find a way to fix these pauses. Right now I packaged the app as an Uberjar and put it on the server, spun it up on port 24000 and proxied it through Apache. I put a script in /etc/init.d to start the app using start-stop-daemon. Possible things that could be going wrong: Maybe Jetty needs more threads, or maybe less threads? How would I test that? Maybe the link to MongoDB sometimes dies? (Mongo is on another server at Amazon) How would I test that? Maybe it is garbage collection? How would I test that? Maybe I have some code that somehow blocks the whole app? Seems unlikely but I'm trying to keep an open mind. Maybe the thread pool managed by Lamina sometimes gets overwhelmed? How would I test that? Maybe when Timbre writes to the log file it causes things to pause? (But I believe Timbre does this in its own thread?) How do I test that? This is a small app: only about 1,100 lines of code. I don't have much experience debugging problems on the JVM, so I welcome any suggestions. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.