Hey Lars, Sorry if I have mislead you. The current Coprocessor infrastructure is at _Region_ level, not at _RegionServer_ level. All these batch operations are ultimately ends up at some rows in some Regions, where you have hooked your CPs.
I am not able to follow your example. If you end up with a 200 Puts batch at a RS, then what? You need to execute this at Region level now, right? A RS just host Regions, and these Regions are mobile too. I think that is the reason one should nail down to a specific Region while doing any crud operations here; and also a reason why current CPs are at Region level. That's my pov though! :) Himanshu On Thu, Aug 11, 2011 at 11:56 AM, lars hofhansl <[email protected]> wrote: > Thanks Himanshu, > > but that is not quite what I meant. > > > Yes, a batch operation is broken up in "chunks" per regionserver and then > the chunks are shipped to the individual regionservers. > But then there is no way to interact with those chunks at the regionserver > through coprocessors(as a whole). > > > What I want to do is to look at the entire chunk at each regionserver and > then do some other bulk operation based on that. > Currently I only get pre/post hooks for single rows, and no way to group > these together later (other than just waiting for a little bit and let > work accumulate). > > Say I have a client request with (say) 1000 puts, and let's also say that > there are 5 region server, and each happens to host exactly 1/5th of the > rowkeys, so each region server gets a chunk of 200 puts. > Now a coprocessor might have logic that affect another table (for example > for naive 2ndary indexing). At the coprocessor level I can get an > > HTableInterface from the environment and now I want to do a batch put of > 200 rows (of course those will be broken up per region server again, etc). > Currently I can't do that, because there are only single "row" pre/post > hooks, and no way to determine when all operations of a request are done. > The end result is that I have to do 200 single row puts, one in each call to > pre or post hooks. > > > Does that make sense? > > -- Lars > > > > ________________________________ > From: Himanshu Vashishtha <[email protected]> > To: [email protected]; lars hofhansl <[email protected]> > Sent: Wednesday, August 10, 2011 11:21 PM > Subject: Re: Coprocessors and batch processing > > Client side batch processing is done at RegionServer level, i.e., all > Action > objects are grouped together per RS basis and send in one RPC. Once the > batch arrives at a RS, it gets distributed across corresponding Regions, > and > these Action objects are processed, one by one. This include Coprocessor's > Exec objects too. > So, a coprocessor is working at a "Region" level granularity. > > If you want to take some action (process bunch of rows of another table > from > a CP), one can get a HTable instance from Environment instance of a > Coprocessor, and use the same mechanism as used by the client side. > Will that help in your use-case? > > Thanks, > Himanshu > > > On Wed, Aug 10, 2011 at 11:46 PM, lars hofhansl <[email protected]> > wrote: > > > Here's another coprocessor question... > > > > From the client we batch operations in order to reduce the number of > round > > trips. > > Currently there is no way (that I can find) to make use of those batches > in > > coprocessors. > > > > This is an issue when, for example, sets of puts and gets are (partially) > > forwarded to another table by the coprocessor. > > Right now this would need to use many single puts/deletes/gets from the > > various {pre|post}{put|delete|get} hooks. > > > > There is no useful demarcation; other than maybe waiting a few > miliseconds, > > which is awkward. > > > > > > Of course this forwarding could be done directly from the client, put > then > > what's the point of coprocessors? > > > > I guess there could either be a {pre|post}Multi on RegionObserver > (although > > HRegionServer.multi does a lot of munging). > > Or maybe a general {pre|post}Request with no arguments - in which case it > > would be at least possible to write code in the coprocessor > > to collect the puts/deletes/etc through the normal single > > prePut/preDelete/etc hooks and then batch-process them in postRequest(). > > > > -- Lars > > >
