Run your weekly job in a low priority fair scheduler/capacity scheduler queue.
Maybe its just me, but I look at Coprocessors as a similar structure to RDBMS triggers and stored procedures. You need to restrain and use them sparingly otherwise you end up creating performance issues. Just IMHO. -Mike On Oct 17, 2012, at 8:44 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> wrote: > I don't have any concern about the time it's taking. It's more about > the load it's putting on the cluster. I have other jobs that I need to > run (secondary index, data processing, etc.). So the more time this > new job is taking, the less CPU the others will have. > > I tried the M/R and I really liked the way it's done. So my only > concern will really be the performance of the delete part. > > That's why I'm wondering what's the best practice to move a row to > another table. > > 2012/10/17, Michael Segel <michael_se...@hotmail.com>: >> If you're going to be running this weekly, I would suggest that you stick >> with the M/R job. >> >> Is there any reason why you need to be worried about the time it takes to do >> the deletes? >> >> >> On Oct 17, 2012, at 8:19 PM, Jean-Marc Spaggiari <jean-m...@spaggiari.org> >> wrote: >> >>> Hi Mike, >>> >>> I'm expecting to run the job weekly. I initially thought about using >>> end points because I found HBASE-6942 which was a good example for my >>> needs. >>> >>> I'm fine with the Put part for the Map/Reduce, but I'm not sure about >>> the delete. That's why I look at coprocessors. Then I figure that I >>> also can do the Put on the coprocessor side. >>> >>> On a M/R, can I delete the row I'm dealing with based on some criteria >>> like timestamp? If I do that, I will not do bulk deletes, but I will >>> delete the rows one by one, right? Which might be very slow. >>> >>> If in the future I want to run the job daily, might that be an issue? >>> >>> Or should I go with the initial idea of doing the Put with the M/R job >>> and the delete with HBASE-6942? >>> >>> Thanks, >>> >>> JM >>> >>> >>> 2012/10/17, Michael Segel <michael_se...@hotmail.com>: >>>> Hi, >>>> >>>> I'm a firm believer in KISS (Keep It Simple, Stupid) >>>> >>>> The Map/Reduce (map job only) is the simplest and least prone to >>>> failure. >>>> >>>> Not sure why you would want to do this using coprocessors. >>>> >>>> How often are you running this job? It sounds like its going to be >>>> sporadic. >>>> >>>> -Mike >>>> >>>> On Oct 17, 2012, at 7:11 PM, Jean-Marc Spaggiari >>>> <jean-m...@spaggiari.org> >>>> wrote: >>>> >>>>> Hi, >>>>> >>>>> Can someone please help me to understand the pros and cons between >>>>> those 2 options for the following usecase? >>>>> >>>>> I need to transfer all the rows between 2 timestamps to another table. >>>>> >>>>> My first idea was to run a MapReduce to map the rows and store them on >>>>> another table, and then delete them using an end point coprocessor. >>>>> But the more I look into it, the more I think the MapReduce is not a >>>>> good idea and I should use a coprocessor instead. >>>>> >>>>> BUT... The MapReduce framework guarantee me that it will run against >>>>> all the regions. I tried to stop a regionserver while the job was >>>>> running. The region moved, and the MapReduce restarted the job from >>>>> the new location. Will the coprocessor do the same thing? >>>>> >>>>> Also, I found the webconsole for the MapReduce with the number of >>>>> jobs, the status, etc. Is there the same thing with the coprocessors? >>>>> >>>>> Are all coprocessors running at the same time on all regions, which >>>>> mean we can have 100 of them running on a regionserver at a time? Or >>>>> are they running like the MapReduce jobs based on some configured >>>>> values? >>>>> >>>>> Thanks, >>>>> >>>>> JM >>>>> >>>> >>>> >>> >> >> >