Andrew Purtell <apurtell@...> writes: > > Yes this is correct. > > Coprocessors / RegionObservers and bulk loading have been developing separately in parallel. > > Now that bulk loading changes are settling down, I've been considering adding CP hooks into the bulk load > process, at the HRegion level, without complicating atomicity. A simple and straightforward course of > action is to give the CP the option of rewriting the submitted store file(s) before the regionserver > attempts to validate and move them into the store. This is similar to how CPs are hooked into compaction. > Would this be sufficient for what you want to do? > > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein (via Tom White) > > >________________________________ > > From: Stanislav Barton <stanislav.barton <at> internetmemory.net> > >To: user@... > >Sent: Wednesday, January 11, 2012 6:47 AM > >Subject: bulk loading and RegionObservers > > > >Hello, > > > >I tried to find the information in the documentation but it is still > >not clear to me. I do a lot of bulk loading using the MapReduce job > >whose output is HFiles that are automatically loaded to HBase and I > >was wondering whether this way (my guess is that it is so) I do bypass > >the RegionObserver mechanisms. Meaning that such defined coprocessors > >won't get fired up when the new data is loaded in HBase. Is my > >assumption correct? > > > >Stan > > > > > >
I think that the people demanding such method of access would like to have the ability to trigger the action on a row level (so again when a Put with new values come). But I think that this would not scale - it would take a long time to scan the new region and fire prePut() call on RO for the new region? I have experience in doing 30GB bulk load steps to pre-splitted table in order to maintain highest throughput and diminish overhead as possible (on fairly small cluster (~10) of small machines). -- Stan
