D2945: state: add a magicheader to each state file
This revision now requires changes to proceed. baymax added a comment. baymax requested changes to this revision. There seems to have been no activities on this Diff for the past 3 Months. By policy, we are automatically moving it out of the `need-review` state. Please, move it back to `need-review` without hesitation if this diff should still be discussed. :baymax:need-review-idle: REPOSITORY rHG Mercurial CHANGES SINCE LAST ACTION https://phab.mercurial-scm.org/D2945/new/ REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers, baymax Cc: jeffpc, yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
jeffpc added a comment. In https://phab.mercurial-scm.org/D2945#58586, @pulkit wrote: > There is a suggestion from @jeffpc to have a plain text header in statefiles like `HGStateFile` so that user can grep for them or look into files and know that they are state files. How does that sound? > (@jeffpc I am not sure I explained this correctly, it will be great if you can chime in.) I don't really care about the actual string value, I just think it is a good storage design practice to have every on-disk structure contain a magic value of some sort. (Beware, I have a storage background and therefore opinions about these things... ;) ) This makes it easier to do all sort of data recovery should the need arise. E.g., - if you know that a random file (e.g., from lost+found) is a hg state file, you can ignore/delete it and just reclone your repos - hg can detect some corruption better, and depending on the state file regenerate it or error out - hg developers can more easily sort out what's what when looking at buffers/files/etc. - as an added bonus, the magic can be used for format versioning FWIW, the three major things that I consider essential to good on-disk structures: 1. magic numbers - see above 2. checksums - possibly two checksums: one for the header and one for the payload 3. back-pointers - see below Back-pointers allow you to tie the file/buffer into the bigger context - be it which state file it is, or which repository it belongs to. In this case a state file's back-pointer could be something like the repo-id + state-file-name stored in the state file's header. (I realize that there is no repo-id, but in general some approximation of it tends to be better than nothing.) This should make it painfully obvious what data one is dealing with and how to parse/interpret it just by looking at the byte-stream without any other context. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers Cc: jeffpc, yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
pulkit added a subscriber: jeffpc. pulkit added a comment. There is a suggestion from @jeffpc to have a plain text header in statefiles like `HGStateFile` so that user can grep for them or look into files and know that they are state files. How does that sound? (@jeffpc I am not sure I explained this correctly, it will be great if you can chime in.) REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers Cc: jeffpc, yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
yuja added a comment. > There are some old state files, which don't have a version header on top > of them like graftstate. I need suggestion on how to handle them. In that case, we would have to either do some heuristic (like histedit), or maintain old/new state files (like mergestate.) Perhaps graft can do the latter? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers Cc: yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
pulkit added a comment. In https://phab.mercurial-scm.org/D2945#47831, @yuja wrote: > I think the magic has to vary depending on the current state file format. > The state file must be backward/forward compatible with the older/newer formats. > > If the current magic is `2\n` (and is parsed as `int(f.readline())`) for example, the > CBOR preamble would have to be `3\n`. Otherwise, old hg client cant determine > whether the state file is corrupted or written by newer client. There are some old state files, which don't have a version header on top of them like graftstate. I need suggestion on how to handle them. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers Cc: yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
yuja added a comment. I think the magic has to vary depending on the current state file format. The state file must be backward/forward compatible with the older/newer formats. If the current magic is `2\n` (and is parsed as `int(f.readline())`) for example, the CBOR preamble would have to be `3\n`. Otherwise, old hg client cant determine whether the state file is corrupted or written by newer client. FWIW, last time I reviewed `simplekeyvaluefile`, I insisted that a parser should take a file stream instead of a filename, so the caller can handle compatibility thingy. with open("statefile", "rb") as fp: try: version = int(fp.readline()) except ValueError: raise some nicer exception if version == 2: return readold(fp) elif version == 3: return readcbor(fp) REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 To: pulkit, #hg-reviewers Cc: yuja, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2945: state: add a magicheader to each state file
pulkit created this revision. Herald added a subscriber: mercurial-devel. Herald added a reviewer: hg-reviewers. REVISION SUMMARY This patch adds a magic header to each state file written using CBOR format so that we can distinguish between old state files and new one even if old state files get successfully parsed by cbor. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2945 AFFECTED FILES mercurial/state.py CHANGE DETAILS diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -51,6 +51,7 @@ self.opts = {} else: self.opts = opts +self.opts['magicheader'] = 'HGStateFile' def __nonzero__(self): return self.exists() To: pulkit, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel