> > > 2) is more expensive than 1). > I'm wondering if we could use Compaction Coprocessor for 2)? HBaseHUT > needs to be able to grab N rows and merge them into 1, delete those N rows, > and just write that 1 new row. This N could be several thousand rows. > Could Compaction Coprocessor really be used for that? > > It would depend on the details. If you're simply aggregating the data into one row, and: * the thousands of rows are contiguous in the scan * you can somehow incrementally update or emit the new row that you want to create so that you don't need to retain all the old rows in memory * the new row you want to emit would sort sequentially into the same position
Then overriding the scanner used for compaction could be a good solution. This would allow you to transform the cells emitted during compaction, including dropping the cells from the old rows and emitting new (transformed) cells for the new row. > Also, would that come into play during minor or major compactions or both? > > You can distinguish between them in your coprocessor hooks based on ScanType. So up to you.
