For now I'm giving up, but I'll have to refresh this thread in future ;-)
The last thing I found out is that entry that I marked in previous mail
as LAST VALID KEY/VALUE PAIR is the problem - it is fine itself, but
it breaks the stream somehow. Removing it fixes the problem, but I
still don't
Sounds like a nasty heisenbug, can you replace or rebuild the machine?
Heisenbug :D
(never heard this name before :-) )
I thought so too, but I finally managed to reproduce it locally (it
requires 3 nodes, one of them needs to have a specific token assigned),
the rest just have to be present
Hmmm... In general it seems that for some reason Cassandra reads invalid
value when trying to get key length (it should be ~100-150, but it gets
2048), then basing on this value it reads too much data and when trying
to read next key's length again it reads some garbage translating it to
a
I've finally had some time to experiment a bit with this problem (it
occured twice again) and here's what I found:
1. So far (three occurences in total), *when* it happened, it happened
only for streaming to *one* specific C* node (but it works on this node
too for 99,9% of the time)
2. It
Strange things happen.
It wasn't a single row, but one single part file of the Hadoop's input
that failed - we didn't manage to find a specific row that causes the
problem. However, it keeps failing only on production, where we can't
experiment with it a lot. We tried to reproduce it in a few
but yesterday one of 600 mappers failed
:)
From what I can understand by looking into the C* source, it seems to me that
the problem is caused by a empty (or surprisingly finished?) input buffer (?)
causing token to be set to -1 which is improper for RandomPartitioner:
Yes, there is a
We're streaming data to Cassandra directly from MapReduce job using
BulkOutputFormat. It's been working for more than a year without any
problems, but yesterday one of 600 mappers faild and we got a
strange-looking exception on one of the C* nodes.
IMPORTANT: It happens on one node and on one