Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread aaron morton
Interesting but as we are dealing with keys it should not matter as they are treated as byte buffers. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 May 2011, at 04:53, Daniel Doubleday wrote: This is a bit of a wild guess but

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread aaron morton
I take it back, the problem started in 0.6 where keys were strings. Looking into how 0.6 did it's thing - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 May 2011, at 22:36, aaron morton wrote: Interesting but as we are dealing with

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Henrik Schröder
Yes, the keys were written to 0.6, but when I looked through the thrift client code for 0.6, it explicitly converts all string keys to UTF8 before sending them over to the server so the encoding *should* be right, and after the upgrade to 0.7.5, sstablekeys prints out the correct byte values for

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Henrik Schröder
Yeah, I've seen that one, and I'm guessing that it's the root cause of my problems, something something encoding error, but that doesn't really help me. :-) However, I've done all my tests with 0.7.5, I'm gonna try them again with 0.7.4, just to see how that version reacts. /Henrik On Wed, May

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread aaron morton
The hard core way to fix the data is export to json with sstable2json, hand edit, and then json2sstable it back. Also to confirm, this only happens when data is written in 0.6 and then tried to read back in 0.7? And you what partitioner are you using ? You can still see the keys ? Can you

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Daniel Doubleday
Thats UTF-8 not UTF-16. On May 5, 2011, at 1:57 PM, aaron morton wrote: The hard core way to fix the data is export to json with sstable2json, hand edit, and then json2sstable it back. Also to confirm, this only happens when data is written in 0.6 and then tried to read back in 0.7?

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Daniel Doubleday
Don't know if that helps you but since we had the same SSTable corruption I have been looking into that very code the other day: If you could afford to drop these rows and are able to recognize them the easiest way would be patching: SSTableScanner:162 public IColumnIterator next() {

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Henrik Schröder
I can't run sstable2json on the datafiles from 0.7, it throws the same Keys must be written in ascending order. error as compaction. I can run sstable2json on the 0.6 datafiles, but when I tested that the unicode characters in the keys got completely mangled since it outputs keys in string format,

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-05 Thread Henrik Schröder
Thanks, but patching or losing keys is not an option for us. :-/ /Henrik On Thu, May 5, 2011 at 15:00, Daniel Doubleday daniel.double...@gmx.netwrote: Don't know if that helps you but since we had the same SSTable corruption I have been looking into that very code the other day: If you

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-04 Thread Henrik Schröder
My two keys that I send in my test program are 0xe695b0e69982e99693 and 0x666f6f, which decodes to 数時間 and foo respectively. So I ran my tests again, I started with a clean 0.6.13, wrote two rows with those two keys, drained, shut down, started 0.7.5, and imported my keyspace. In my test

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-04 Thread Henrik Schröder
Those few hundred duplicated rows turned out to be a HUGE problem, the automatic compaction started throwing errors that Keys must be written in ascending order., and successive failing compactions started to fill the disk until it ran out of diskspace which was a bit sad. Right now we're trying

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-04 Thread Daniel Doubleday
This is a bit of a wild guess but Windows and encoding and 0.7.5 sounds like https://issues.apache.org/jira/browse/CASSANDRA-2367 On May 3, 2011, at 5:15 PM, Henrik Schröder wrote: Hey everyone, We did some tests before upgrading our Cassandra cluster from 0.6 to 0.7, just to make sure

Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-03 Thread Henrik Schröder
Hey everyone, We did some tests before upgrading our Cassandra cluster from 0.6 to 0.7, just to make sure that the change in how keys are encoded wouldn't cause us any dataloss. Unfortunately it seems that rows stored under a unicode key couldn't be retrieved after the upgrade. We're running

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-03 Thread Henrik Schröder
The way we solved this problem is that it turned out we had only a few hundred rows with unicode keys, so we simply extracted them, upgraded to 0.7, and wrote them back. However, this means that among the rows, there are a few hundred weird duplicate rows with identical keys. Is this going to be

Re: Unicode key encoding problem when upgrading from 0.6.13 to 0.7.5

2011-05-03 Thread aaron morton
Can you provide some details of the data returned from you do the = get_range() ? It will be interesting to see the raw bytes returned for = the keys. The likely culprit is a change in the encoding. Can you also = try to grab the bytes sent for the key when doing the single select that = fails.=20