Hello, I am learning about coprocessors and would like to know more about how to choose between coprocessors and MapReduce.
First, I thought coprocessors needed a restart but it seems a shell can be used to add/remove them without requiring a restart. However, at the moment the coprocessors are defined within jar and can not be dynamically created. Could you confirm that? (I am thinking about the Cascading way of creating the implementation which will then be serialized, send and executed.) Second, I didn't see any way to give parameters to coprocessors. Is that really the case? If not, how would the parameters be handled? Third, I assume coprocessors are using the processus/thread of the region server. Does that means that, if multiple blocks need to be processed, MaReduce should be more efficient? Are there other ways to know whether coprocessors or MapReduce should be chosen? Fourth, I know this is a really broad question but how would you compare coprocessors to YARN? I have yet to know more about both subjects but I feel that the concepts are not totally unrelated. Lastly, this is an implementation detail but how the client side waits for the results? Is it possible to perform early aggregation or does the client need to receive all the information before doing anything else? Regards Bertrand Ps : My two sources for that subject are for HBase 0.92 : * https://blogs.apache.org/hbase/entry/coprocessor_introduction * HBase The Definitive Guide.
