Hi all:
    I have a problem about the epoch information in SD_OP_READ_PEER request 
header. I'm not sure whether I misunderstand the code or it is a bug.
    When we recover a erasure code object in recovery, we need to read the 
remaining replicas firstly to rebuild the lost replica. In function 
read_erasure_object(), we init  SD_OP_READ_PEER request header by the following 
code:


        sd_init_req(&hdr, SD_OP_READ_PEER);
        hdr.epoch = epoch;
        hdr.flags = SD_FLAG_CMD_RECOVERY;
        hdr.data_length = rlen;
        hdr.obj.oid = oid;
        hdr.obj.tgt_epoch = tgt_epoch;
        hdr.obj.ec_index = idx;



    I think hdr.epoch is current epoch of the cluster and hdr.obj.tgt_epoch is 
the historical epoch from which we want to read the stale replica. The target 
node will call peer_read_obj() to process SD_OP_READ_PEER request. 
Peer_read_obj() set  iocb.epoch = hdr->epoch then pass iocb to 
sd_store->read(). In default_read(), we use iocb->epoch < sys_epoch() to  judge 
whether the request is againt the older epoch which needs to read replica from 
the stale directory. I think we use the wrong epoch here. We should use 
hdr.obj.tgt_epoch rather than hdr.epoch to make the judgement. Can anyone 
answer my question?


Thanks.
Bingpeng
-- 
sheepdog mailing list
[email protected]
https://lists.wpkg.org/mailman/listinfo/sheepdog

Reply via email to