HI Kevin, Thanks for your reply. That is what I assumed, but some of the posts I read on Stack Overflow (e.g., the one that I referenced in my mail) suggested otherwise. I was just curious if others had experienced OOM problems that weren't logged or if there were other common culprits.
Best regards, Clint On Tue, Aug 5, 2014 at 9:29 PM, Kevin Burton <bur...@spinn3r.com> wrote: > If there is an oom it will be in the logs. > > On Aug 5, 2014 8:17 PM, "Clint Kelly" <clint.ke...@gmail.com> wrote: >> >> Hi everyone, >> >> For some integration tests, we start up a CassandraDaemon in a >> separate process (using the Java 7 ProcessBuilder API). All of my >> integration tests run beautifully on my laptop, but one of them fails >> on our Jenkins cluster. >> >> The failing integration test does around 10k writes to different rows >> and then 10k reads. After running some number of reads, the job dies >> with this error: >> >> com.datastax.driver.core.exceptions.NoHostAvailableException: All >> host(s) tried for query failed (tried: /127.0.0.10:58209 >> (com.datastax.driver.core.exceptions.DriverException: Timeout during >> read)) >> >> This error appears to have occurred because the Cassandra process has >> stopped. The logs for the Cassandra process show some warnings during >> batch writes (the batches are too big), no activity for a few minutes >> (I assume this is because all of the read operations were proceeding >> smoothly), and then look like the following: >> >> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,903 >> ThriftServer.java (line 141) Stop listening to thrift clients >> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,920 Server.java >> (line 182) Stop listening for CQL clients >> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:51,930 >> Gossiper.java (line 1279) Announcing shutdown >> INFO [StorageServiceShutdownHook] 2014-08-05 19:14:53,930 >> MessagingService.java (line 683) Waiting for messaging service to >> quiesce >> INFO [ACCEPT-/127.0.0.10] 2014-08-05 19:14:53,931 >> MessagingService.java (line 923) MessagingService has terminated the >> accept() thread >> >> Does anyone have any ideas about how to debug this? Looking around on >> google I found some threads suggesting that this could occur from an >> OOM error >> (http://stackoverflow.com/questions/23755040/cassandra-exits-with-no-errors). >> Wouldn't such an error be logged, however? >> >> The test that fails is a test of our MapReduce Hadoop InputFormat and >> as such it does some pretty big queries across multiple rows (over a >> range of partitioning key tokens). The default fetch size I believe >> is 5000 rows, and the values in the rows I am fetching are just simple >> strings, so I would not think the amount of data in a single read >> would be too big. >> >> FWIW I don't see any log messages about garbage collection for at >> least 3min before the process shuts down (and no GC messages after the >> test stops doing writes and starts doing reads). >> >> I'd greatly appreciate any help before my team kills me for breaking >> our Jenkins build so consistently! :) >> >> Best regards, >> Clint