Running on the current 0.7.0 beta2 #3.
On my dev workstation (unubtu 10.4) I accidentally let cassandra have 4GB and free memory got down to <100 MB . I was running some tests that added a few (< 100) rows to the CF's in Keyspace1, then Truncated the CF's for the next tests.
I got the following error in the cassandra system.log
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 CassandraServer.java (line 954) truncating Standard1 in Keyspace1
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 StorageProxy.java (line 759) Starting a blocking truncate operation on keyspace Keyspace1, CF
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,166 StorageProxy.java (line 774) Starting to send truncate messages to hosts [localhost/127.0.0.1]
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 TruncateVerbHandler.java (line 50) Applying Truncation(keyspace='Keyspace1', cf='Standard1')
DEBUG [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 Tablejava (line 630) Truncating...
INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 ColumnFamilyStore.java (line 459) switching in a fresh Memtable for Standard1 at CommitLogContext(file='/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log', position=10022050)
INFO [MUTATION_STAGE:25] 2010-09-29 16:44:39,166 ColumnFamilyStore.java (line 771) Enqueuing flush of memtable-standa...@1701009673(40104 bytes, 1152 operations)
DEBUG [pool-1-thread-44] 2010-09-29 16:44:39,167 StorageProxy.java (line 780) Sent all truncate messages, now waiting for 1 responses
INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,167 Memtable.java (line 150) Writing memtable-standa...@1701009673(40104 bytes, 1152 operations)
INFO [FLUSH-WRITER-POOL:1] 2010-09-29 16:44:39,268 Memtable.java (line 157) Completed flushing /local1/junkbox/cassandra/data/Keyspace1/Standard1-e-10-Data.db
DEBUG [CompactionExecutor:1] 2010-09-29 16:44:39,268 CompactionManager.java (line 95) Checking to see if compaction of Standard1 would be useful
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 CommitLog.java (line 422) discard completed log segments for CommitLogContext(file='/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log', position=10022050), column family 1000.
DEBUG [COMMIT-LOG-WRITER] 2010-09-29 16:44:39,268 CommitLog.java (line 448) Marking replay position 10022050 on commit log CommitLogSegment(/local1/junkbox/cassandra/commitlog/CommitLog-1285728080403.log)
ERROR [MUTATION_STAGE:25] 2010-09-29 16:44:39,341 AbstractCassandraDaemon.java (line 88) Fatal exception in thread Thread[MUTATION_STAGE:25,5,main]
java.io.IOError: java.io.IOException: Cannot run program "ln": java.io.IOException: error=12, Cannot allocate memory
at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1368)
at org.apache.cassandra.db.ColumnFamilyStore.truncate(ColumnFamilyStore.java:1511)
at org.apache.cassandra.db.Table.truncate(Table.java:633)
at org.apache.cassandra.db.TruncateVerbHandler.doVerb(TruncateVerbHandler.java:54)
at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:50)
at javautil.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: Cannot run program "ln": java.io.IOException: error=12, Cannot allocate memory
at java.lang.ProcessBuilder.start(ProcessBuilder.java:460)
at org.apache.cassandra.io.util.FileUtils.createHardLinkWithExec(FileUtils.java:263)
at org.apache.cassandra.io.util.FileUtils.createHardLink(FileUtils.java:229)
at org.apache.cassandra.db.ColumnFamilyStore.snapshot(ColumnFamilyStore.java:1360)
... 7 more
Caused by: java.io.IOException: java.io.IOException: error=12, Cannot allocate memory
at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
at java.lang.ProcessImpl.start(ProcessImpl.java:65)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:453)
... 10 more
DEBUG [GC inspection] 2010-09-29 16:44:39,539 GCInspector.java (line 131) GC for ParNew: 43 ms, 29106728 reclaimed leaving 3567200784 used; max is 5498404864
DEBUG [pool-1-thread-44] 2010-09-29 16:44:49,189 ClientState.java (line 107) logged out: #<User allow_all groups=[]>
Fair enough, though it appeared as a timeout on the client (with a 10 second timeout)
File "/tech/home//git_home/trojan/trojan/cassandra/Cassandrapy", line 846, in truncate
self.recv_truncate()
File "/tech/home//git_home/trojan/trojan/cassandra/Cassandra.py", line 857, in recv_truncate
(fname, mtype, rseqid) = self._iprot.readMessageBegin()
File "/tech/home//git_home/trojan/trojan/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin
sz = self.readI32()
<snip>
chunk = self.read(sz-have)
File "/tech/home//git_home/trojan/trojan/thrift/transport/TSocket.py", line 92, in read
buff = self.handle.recv(sz)
timeout: timed out
Here's the output of ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 20
file size (blocks, -f) unlimited
pending signals (-i) 16382
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 65536
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Two questions:
1) Is the memory error just a result of me letting my machine run stupidly low on memory?
2) Should it have returned an ApplicationError or some such in this case? The code in ColumnFamilyStore:1368 is catching the IOException from the call to FileUtils.createHardLink and wrapping it in an IOError. However the code in TruncateVerbHandler:56 is looking for the IOException.
Cheers
Aaron