MR-hook for sstable compaction?

Yang Sun, 19 Jun 2011 00:01:19 -0700

I realize that the SSTable flush/compaction process is essentially
equivalent to the reduce stage of Map-Reduce,
since entries of same keys are grouped together.


we have felt the need to do MR-style jobs on the data already stored in
cassandra, it would be very useful to
provide a hook into the compaction process so that the reduce job can be
done. for example, jobs as simple as
dumping out all the keys in a system, or for a CF with userId being the key,
and salary as a column, calculate the total
salary.

this is different from what BRISK does, since BRISK only uses CF as a
physical block storage, and does not
utilize the data already stored in Cassandra, which has been grouped by
keys.

it is possible to come up with some sort of framework to scrape sstables to
carry out the MR jobs, but the compaction
hook seems an easier and faster way to get this done, given existing systems

yang

MR-hook for sstable compaction?

Reply via email to