Coprocessors vs MapReduce?

Bertrand Dechoux Tue, 24 Jul 2012 08:00:06 -0700

Hello,

I am learning about coprocessors and would like to know more about how to
choose between coprocessors and MapReduce.


First, I thought coprocessors needed a restart but it seems a shell can be
used to add/remove them without requiring a restart. However, at the moment
the coprocessors are defined within jar and can not be dynamically created.
Could you confirm that? (I am thinking about the Cascading way of creating
the implementation which will then be serialized, send and executed.)

Second, I didn't see any way to give parameters to coprocessors. Is that
really the case? If not, how would the parameters be handled?

Third, I assume coprocessors are using the processus/thread of the region
server. Does that means that, if multiple blocks need to be processed,
MaReduce should be more efficient? Are there other ways to know whether
coprocessors or MapReduce should be chosen?

Fourth, I know this is a really broad question but how would you compare
coprocessors to YARN? I have yet to know more about both subjects but I
feel that the concepts are not totally unrelated.

Lastly, this is an implementation detail but how the client side waits for
the results? Is it possible to perform early aggregation or does the client
need to receive all the information before doing anything else?

Regards

Bertrand


Ps : My two sources for that subject are for HBase 0.92 :
* https://blogs.apache.org/hbase/entry/coprocessor_introduction
* HBase The Definitive Guide.

Coprocessors vs MapReduce?

Reply via email to