Anoop Sam John created HBASE-15214:
--------------------------------------

             Summary: Valid mutate Ops fail with RPC Codec in use and region 
moves across
                 Key: HBASE-15214
                 URL: https://issues.apache.org/jira/browse/HBASE-15214
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.98.0
            Reporter: Anoop Sam John
            Assignee: Anoop Sam John
            Priority: Critical


Test failures in HBASE-15198 lead to this bug. Till now we are not doing cell 
block (codec usage) for write requests. (Client -> server)  Once we enabled 
Codec usage by default, aw this issue.
A multi request came to RS with mutation for different regions. One of the 
region which was in this RS got unavailable now.  In RsRpcServices#multi, we 
will fail that entire RegionAction (with N mutations in it) in that 
MultiRequest.  Then we will continue with remaining RegionActions.  Those 
Regions might be available.  (The failed RegionAction will get retried from 
client after fetching latest region location).  This all works fine in pure PB 
requests world. When a Codec is used, we wont convert the Mutation Cell to PB 
Cells and pack them in PB Message. Instead we will pass all Cells serialized 
into one byte[] cellblock. Using Decoder we will iterate over these cells at 
server side. Each Mutation PB will know only the number of cells associated 
with it.  As in above case when an entire RegionAction was skipped, there might 
be N Mutations under that which might have corresponding Cells in the 
cellblock. We are not doing the skip in that Iterator. This makes the later 
Mutations (for other Regions) to refer to invalid Cells and try to put those 
into the a different region. This will make HRegion#checkRow() to throw 
WrongRegionException which will be treated as Sanity check failure and so 
throwing back a DNRIOE to client. So the op will get failed for the user code.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to