Bingpeng is right.
I am sure that the default_int() call the unlink to delete old.tmp , but its input args is filename not including full path Its should be a bug. 发件人: sheepdog [mailto:[email protected]] 代表 Bingpeng Zhu 发送时间: 2014年9月18日 16:31 收件人: Hitoshi Mitake; Ruoyu 抄送: sheepdog 主题: Re: [sheepdog] question about replica recovery failure caused by oid.tmp file Thank you for the advice. default_init() of sheep/store.c has already had the logic of unlinking oid. tmp files. I'm not sure the reason why oid.tmp file still exists in the system. ------------------ Original ------------------ From: "Hitoshi Mitake";<[email protected] <mailto:[email protected]> >; Date: Sep 18, 2014 To: "Ruoyu"<[email protected] <mailto:[email protected]> >; Cc: "Bingpeng Zhu"<[email protected] <mailto:[email protected]> >; "sheepdog"<[email protected] <mailto:[email protected]> >; Subject: Re: [sheepdog] question about replica recovery failure caused by oid.tmp file At Tue, 16 Sep 2014 10:10:32 +0800, Ruoyu wrote: > > [1 <multipart/alternative (7bit)>] > [1.1 <text/plain; ISO-8859-1 (7bit)>] > Thanks Bingpeng. > I also encountered this problem. > I suggest sheep should scan oid.tmp files and remove them when it is > being started. I agree with Ruoyu's opinion. .tmp files should be deleted at initialization time. e.g. default_init() of sheep/store.c would be a good place for it. Thanks, Hitoshi > > On 2014?09?15? 00:14, Bingpeng Zhu wrote: > > Hi, all: > > I have a problem in using sheepdog. I create a erasure coded VDI > > and write > > some data to it. Then, I unplug disk and stop/restart one sheep in a > > short > > time. After recovery is completed in the latest epoch, I find some > > replica is > > lost and only the corresponding oid.tmp file exists in the data > > directory. I tried > > to rebuild the replica using "dog vdi check", but it didn't work. I > > think it is > > caused by oid.tmp file. I have to delete the oid.tmp file manually > > and then > > "dog vdi check" successfully recoverd the lost replica. > > In function default_create_and_write() of sheep/plain_store.c, > > it returns > > success directly if oid.tmp file exists. I have read the comment in > > this function carefully, > > it says gateway and recovery thread may try to write the SAME data, > > so it is okay > > to simply return success here. To solve this problem, I want to > > change the code of > > default_create_and_write() so that replica data will be written even > > oid.tmp file exists. > > If oid.tmp exists, the function should overwrite it. > > I am not sure if this change will work good for all scenario. > > Especially, I doubt whether > > this change will lead to old data overwriting new data. But I > > haven't thought out any scenario > > that will lead to old data overwriting new data. Can someone give me > > some advice to solve this problem? > > > > > > > > [1.2 <text/html; ISO-8859-1 (7bit)>] > > [2 <text/plain; us-ascii (7bit)>] > -- > sheepdog mailing list > [email protected] <mailto:[email protected]> > http://lists.wpkg.org/mailman/listinfo/sheepdog
-- sheepdog mailing list [email protected] http://lists.wpkg.org/mailman/listinfo/sheepdog
