I have a custom co-processor endpoint that handles aggregation of various statistics for each region (the stats from all regions are then merged together for the final result). Sometimes the amount of data to aggregate is very large, and it takes longer than the exec timeout to completely aggregate the region. Under this scenario, the client then compounds the problem by initiating up to 10 retries.
I haven't been able to find any supported APIs for getting around this, so I intend to modify my co-processor to stop itself after N seconds and include in its result the row key where it should resume. I can repeatedly invoke HTable.coprocessorExec until all of the regions report that they've finished their aggregations, but each subsequent call to HTable.coprocessorExec will hit all regions, even if they've completed their work. The only way I can see to efficiently invoke my co-processor on only the servers with work remaining is to write my own code to manage the co-processor proxy objects. I haven't found any documentation that details the thread-safety of each proxy instance, or information about which thread pool is used. Can anyone shed some light on this strategy? Perhaps you've encountered the same issue; How did you solve it? Thanks in advance! --Tom
