I tried the manual copy you suggest, but the SystemTable.checkHealth() function
complains it can't load the system files. Log follows, I will gather some more
info and create a ticket as soon as possible.

 INFO [main] 2011-05-26 18:25:36,147 AbstractCassandraDaemon.java Logging 
initialized
 INFO [main] 2011-05-26 18:25:36,172 AbstractCassandraDaemon.java Heap size: 
4277534720/4277534720
 INFO [main] 2011-05-26 18:25:36,174 CLibrary.java JNA not found. Native 
methods will be disabled.
INFO [main] 2011-05-26 18:25:36,190 DatabaseDescriptor.java Loading settings from file:/C:/Cassandra/conf/hscassandra9170.yaml INFO [main] 2011-05-26 18:25:36,344 DatabaseDescriptor.java DiskAccessMode 'auto' determined to be mmap, indexAccessMode is mmap
 INFO [main] 2011-05-26 18:25:36,532 SSTableReader.java Opening 
G:\Cassandra\data\system\Schema-f-2746
 INFO [main] 2011-05-26 18:25:36,577 SSTableReader.java Opening 
G:\Cassandra\data\system\Schema-f-2729
 INFO [main] 2011-05-26 18:25:36,590 SSTableReader.java Opening 
G:\Cassandra\data\system\Schema-f-2745
 INFO [main] 2011-05-26 18:25:36,599 SSTableReader.java Opening 
G:\Cassandra\data\system\Migrations-f-2167
 INFO [main] 2011-05-26 18:25:36,600 SSTableReader.java Opening 
G:\Cassandra\data\system\Migrations-f-2131
 INFO [main] 2011-05-26 18:25:36,602 SSTableReader.java Opening 
G:\Cassandra\data\system\Migrations-f-1041
 INFO [main] 2011-05-26 18:25:36,603 SSTableReader.java Opening 
G:\Cassandra\data\system\Migrations-f-1695
ERROR [main] 2011-05-26 18:25:36,634 AbstractCassandraDaemon.java Fatal 
exception during initialization
org.apache.cassandra.config.ConfigurationException: Found system table files, but they couldn't be loaded. Did you change the partitioner?
        at org.apache.cassandra.db.SystemTable.checkHealth(SystemTable.java:236)
        at 
org.apache.cassandra.service.AbstractCassandraDaemon.setup(AbstractCassandraDaemon.java:127)
 
        at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:314)
        at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:79)


Il 5/26/2011 6:04 PM, Jonathan Ellis ha scritto:
Sounds like a legitimate bug, although looking through the code I'm
not sure what would cause a tight retry loop on migration
announce/rectify. Can you create a ticket at
https://issues.apache.org/jira/browse/CASSANDRA ?

As a workaround, I would try manually copying the Migrations and
Schema sstable files from the system keyspace of the live node, then
restart the recovering one.

On Thu, May 26, 2011 at 9:27 AM, Flavio Baronti
<f.baro...@list-group.com>  wrote:
I can't seem to be able to recover a failed node on a database where i did
many updates to the schema.

I have a small cluster with 2 nodes, around 1000 CF (I know it's a lot, but
it can't be changed right now), and ReplicationFactor=2.
I shut down a node and cleaned its data entirely, then tried to bring it
back up. The node starts fetching schema updates from the live node, but the
operation fails halfway with an OOME.
After some investigation, what I found is that:

- I have a lot of schema updates (there are 2067 rows in the system.Schema
CF).
- The live node loads migrations 1-1000, and sends them to the recovering
node (Migration.getLocalMigrations())
- Soon afterwards, the live node checks the schema version on the recovering
node and finds it has moved by a little - say it has applied the first 3
migrations. It then loads migrations 3-1003, and sends them to the node.
- This process is repeated very quickly (sends migrations 6-1006, 9-1009,
etc).

Analyzing the memory dump and the logs, it looks like each of these 1000
migration blocks are composed in a single message and sent to the
OutboundTcpConnection queue. However, since the schema is big, the messages
occupy a lot of space, and are built faster than the connection can send
them. Therefore, they accumulate in OutboundTcpConnection.queue, until
memory is completely filled.

Any suggestions? Can I change something to make this work, apart from
reducing the number of CFs?

Flavio





Reply via email to