If you're going to be running this weekly, I would suggest that you stick with the M/R job.
Is there any reason why you need to be worried about the time it takes to do the deletes? On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> wrote: > Hi Mike, > > I'm expecting to run the job weekly. I initially thought about using > end points because I found HBASE-6942 which was a good example for my > needs. > > I'm fine with the Put part for the Map/Reduce, but I'm not sure about > the delete. That's why I look at coprocessors. Then I figure that I > also can do the Put on the coprocessor side. > > On a M/R, can I delete the row I'm dealing with based on some criteria > like timestamp? If I do that, I will not do bulk deletes, but I will > delete the rows one by one, right? Which might be very slow. > > If in the future I want to run the job daily, might that be an issue? > > Or should I go with the initial idea of doing the Put with the M/R job > and the delete with HBASE-6942? > > Thanks, > > JM > > > 2012/10/17, Michael Segel <michael_se...@hotmail.com>: >> Hi, >> >> I'm a firm believer in KISS (Keep It Simple, Stupid) >> >> The Map/Reduce (map job only) is the simplest and least prone to failure. >> >> Not sure why you would want to do this using coprocessors. >> >> How often are you running this job? It sounds like its going to be >> sporadic. >> >> -Mike >> >> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> >> wrote: >> >>> Hi, >>> >>> Can someone please help me to understand the pros and cons between >>> those 2 options for the following usecase? >>> >>> I need to transfer all the rows between 2 timestamps to another table. >>> >>> My first idea was to run a MapReduce to map the rows and store them on >>> another table, and then delete them using an end point coprocessor. >>> But the more I look into it, the more I think the MapReduce is not a >>> good idea and I should use a coprocessor instead. >>> >>> BUT... The MapReduce framework guarantee me that it will run against >>> all the regions. I tried to stop a regionserver while the job was >>> running. The region moved, and the MapReduce restarted the job from >>> the new location. Will the coprocessor do the same thing? >>> >>> Also, I found the webconsole for the MapReduce with the number of >>> jobs, the status, etc. Is there the same thing with the coprocessors? >>> >>> Are all coprocessors running at the same time on all regions, which >>> mean we can have 100 of them running on a regionserver at a time? Or >>> are they running like the MapReduce jobs based on some configured >>> values? >>> >>> Thanks, >>> >>> JM >>> >> >> >