Re: Coprocessor end point vs MapReduce?

Michael Segel Wed, 17 Oct 2012 18:31:39 -0700

If you're going to be running this weekly, I would suggest that you stick with 
the M/R job.


Is there any reason why you need to be worried about the time it takes to do 
the deletes?


On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> 
wrote:

> Hi Mike,
> 
> I'm expecting to run the job weekly. I initially thought about using
> end points because I found HBASE-6942 which was a good example for my
> needs.
> 
> I'm fine with the Put part for the Map/Reduce, but I'm not sure about
> the delete. That's why I look at coprocessors. Then I figure that I
> also can do the Put on the coprocessor side.
> 
> On a M/R, can I delete the row I'm dealing with based on some criteria
> like timestamp? If I do that, I will not do bulk deletes, but I will
> delete the rows one by one, right? Which might be very slow.
> 
> If in the future I want to run the job daily, might that be an issue?
> 
> Or should I go with the initial idea of doing the Put with the M/R job
> and the delete with HBASE-6942?
> 
> Thanks,
> 
> JM
> 
> 
> 2012/10/17, Michael Segel <michael_se...@hotmail.com>:
>> Hi,
>> 
>> I'm a firm believer in KISS (Keep It Simple, Stupid)
>> 
>> The Map/Reduce (map job only) is the simplest and least prone to failure.
>> 
>> Not sure why you would want to do this using coprocessors.
>> 
>> How often are you running this job? It sounds like its going to be
>> sporadic.
>> 
>> -Mike
>> 
>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org>
>> wrote:
>> 
>>> Hi,
>>> 
>>> Can someone please help me to understand the pros and cons between
>>> those 2 options for the following usecase?
>>> 
>>> I need to transfer all the rows between 2 timestamps to another table.
>>> 
>>> My first idea was to run a MapReduce to map the rows and store them on
>>> another table, and then delete them using an end point coprocessor.
>>> But the more I look into it, the more I think the MapReduce is not a
>>> good idea and I should use a coprocessor instead.
>>> 
>>> BUT... The MapReduce framework guarantee me that it will run against
>>> all the regions. I tried to stop a regionserver while the job was
>>> running. The region moved, and the MapReduce restarted the job from
>>> the new location. Will the coprocessor do the same thing?
>>> 
>>> Also, I found the webconsole for the MapReduce with the number of
>>> jobs, the status, etc. Is there the same thing with the coprocessors?
>>> 
>>> Are all coprocessors running at the same time on all regions, which
>>> mean we can have 100 of them running on a regionserver at a time? Or
>>> are they running like the MapReduce jobs based on some configured
>>> values?
>>> 
>>> Thanks,
>>> 
>>> JM
>>> 
>> 
>> 
>

Re: Coprocessor end point vs MapReduce?

Reply via email to