Andrew Purtell <apurtell@...> writes:

> 
> Yes this is correct.
> 
> Coprocessors / RegionObservers and bulk loading have been developing
separately in parallel. 
> 
> Now that bulk loading changes are settling down, I've been considering adding
CP hooks into the bulk load
> process, at the HRegion level, without complicating atomicity. A simple and
straightforward course of
> action is to give the CP the option of rewriting the submitted store file(s)
before the regionserver
> attempts to validate and move them into the store. This is similar to how CPs
are hooked into compaction.
> Would this be sufficient for what you want to do?
>  
> Best regards,
> 
>        - Andy
> 
> Problems worthy of attack prove their worth by hitting back. - Piet Hein (via
Tom White)
> 
> >________________________________
> > From: Stanislav Barton <stanislav.barton <at> internetmemory.net>
> >To: user@... 
> >Sent: Wednesday, January 11, 2012 6:47 AM
> >Subject: bulk loading and RegionObservers
> > 
> >Hello,
> >
> >I tried to find the information in the documentation but it is still
> >not clear to me. I do a lot of bulk loading using the MapReduce job
> >whose output is HFiles that are automatically loaded to HBase and I
> >was wondering whether this way (my guess is that it is so) I do bypass
> >the RegionObserver mechanisms. Meaning that such defined coprocessors
> >won't get fired up when the new data is loaded in HBase. Is my
> >assumption correct?
> >
> >Stan
> >
> >
> >


I think that the people demanding such method of access would like to have the
ability to trigger the action on a row level (so again when a Put with new
values come). But I think that this would not scale - it would take a long time
to scan the new region and fire prePut() call on RO for the new region? I have
experience in doing 30GB bulk load steps to pre-splitted table in order to
maintain highest throughput and diminish overhead as possible (on fairly small
cluster (~10) of small machines). 

--

Stan


Reply via email to