[
https://issues.apache.org/jira/browse/CASSANDRA-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Florent Clairambault updated CASSANDRA-4481:
Comment: was deleted
(was: I doesn't work, it failed again a week ago on a 1.1.5 that was running
for a little bit.
First of all, it's a commitLog writing and/or reading issue, so if you flush
your data frequently (every hour and in the stop command of the rc.d's script)
you reduce your risk of big data losses. You can lose days of data if you don't
do that. Restarting cassandra and going 2 days in the past is a very unpleasant
situation.
So here is the new process I applied to fix my data (which is in fact
restarting from scatch [except we keep the data]):
- Export the keyspace's schema
{code}
cassandra-cli -k ks schema.txt EOF
show schema;
exit;
EOF
{code}
- Simplify the export (all CF with key_validation_class in AsciiType,
default_validation_class in UTF8Type for most CF except the one that contains
binary data where I used BytesTypes).
I simplify an export like that:
{code}
create column family User
with column_type = 'Standard'
and comparator = 'AsciiType'
and default_validation_class = 'UTF8Type'
and key_validation_class = 'AsciiType'
and read_repair_chance = 0.1
and dclocal_read_repair_chance = 0.0
and gc_grace = 864000
and min_compaction_threshold = 4
and max_compaction_threshold = 32
and replicate_on_write = true
and compaction_strategy =
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'
and caching = 'KEYS_ONLY'
and column_metadata = [
{column_name : 'domain',
validation_class : UTF8Type,
index_name : 'User_domain_idx',
index_type : 0},
{column_name : 'username',
validation_class : UTF8Type,
index_name : 'User_username',
index_type : 0}]
and compression_options = {'sstable_compression' :
'org.apache.cassandra.io.compress.SnappyCompressor'};
{code}
To something like that:
{code}
create column family User
with column_type = 'Standard'
and key_validation_class = 'AsciiType'
and comparator = 'AsciiType'
and default_validation_class = 'UTF8Type'
and column_metadata = [
{column_name : 'domain', validation_class : UTF8Type, index_type : 0},
{column_name : 'username', validation_class : UTF8Type, index_type : 0}];
{code}
During this simplification process, I discovered that some
default_validation_class had incorrect type, so maybe it comes from that. It
seems strange that we could confuse cassandra this way, but this problem is
indeed very strange...
- Stop cassandra
- Move the keyspace folder to somewhere else (mkdir backup; mv ks backup)
- Start cassandra (Not having a keyspace folder is like not having any data,
it's not a problem).
- Delete the keyspace (I know deletion creates snapshots and moving is
unecessary but it's easier to use sstableloader that way)
- Recreate the keyspace with the schema exported and simplified
- Use sstableloader to import data:
{code}
cd backup; find ks -type d -exec sstableloader -d localhost {} \;
{code}
NOTE: Don't think about replaying your commitLogs with your new schema, the
column families won't have the same id.
Any empty cassandra instance startup does at least 1 mutation replay because of
the system keyspace. So I still think 0 replayed mutations should never occur
and if they do, we should have some warning with them. And if it's indeed a CF
that doesn't fully exist, it should be reported at startup.
I hope we can find a way to reproduce it.)
Commitlog not replayed after restart - data lost
Key: CASSANDRA-4481
URL: https://issues.apache.org/jira/browse/CASSANDRA-4481
Project: Cassandra
Issue Type: Bug
Affects Versions: 1.1.2
Environment: Single node cluster on 64Bit CentOS
Reporter: Ivo Meißner
Priority: Critical
When data is written to the commitlog and I restart the machine, all commited
data is lost that has not been flushed to disk.
In the startup logs it says that it replays the commitlog successfully, but
the data is not available then.
When I open the commitlog file in an editor I can see the added data, but
after the restart it cannot be fetched from cassandra.
{code}
INFO 09:59:45,362 Replaying
/var/myproject/cassandra/commitlog/CommitLog-83203377067.log
INFO 09:59:45,476 Finished reading
/var/myproject/cassandra/commitlog/CommitLog-83203377067.log
INFO 09:59:45,476 Log replay complete, 0 replayed mutations
{code}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira