I realize that the SSTable flush/compaction process is essentially equivalent to the reduce stage of Map-Reduce, since entries of same keys are grouped together.
we have felt the need to do MR-style jobs on the data already stored in cassandra, it would be very useful to provide a hook into the compaction process so that the reduce job can be done. for example, jobs as simple as dumping out all the keys in a system, or for a CF with userId being the key, and salary as a column, calculate the total salary. this is different from what BRISK does, since BRISK only uses CF as a physical block storage, and does not utilize the data already stored in Cassandra, which has been grouped by keys. it is possible to come up with some sort of framework to scrape sstables to carry out the MR jobs, but the compaction hook seems an easier and faster way to get this done, given existing systems yang