D2593: state: add logic to parse the state file in old way if cbor fails
pulkit abandoned this revision. pulkit added a comment. Thanks for all the reviews and helping me improving this. The improved version of state file which is already pushed to core should raise error.CorruptedState() if an old version of state file is found and caller should catch it and parse the old way. I don't think having a registrar is required anymore. Abandoning this differential. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: yuja, martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
yuja added inline comments. INLINE COMMENTS > martinvonz wrote in state.py:84 > Oh, and I should clarify that I think the root of the problem is that you're > mixing the class for reading with state itself (but only the top-level items > of it). Let's say someone realizes they want to iterate over the items in the > state, would we also add a .items() then? (I don't think that's a likely > scenario, but it shows what I mean.) Sorry if it's already discussed, but what's the benefit of `cmdstate` not being a plain dict? I think plain load/save functions are much simpler than a dict-like object backed by file, i.e. state = statemod.load(repo, fname) state = {'version': 1, 'nodes': nodelines} statemod.save(repo, fname, state) (statemod.load/save could be a class for serialization/deserialization) The state dict would be wrapped anyway depending on application logic. If we can find out a common behavior between "state"s, we can extract it to a state class, but that will still be separated from the storage format. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: yuja, martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
martinvonz added inline comments. INLINE COMMENTS > martinvonz wrote in state.py:84 > If it's up to me, I'd definitely say that you should change it. I don't see > any benefits of the current proposal. I'd be happy to hear what other > reviewers think. Oh, and I should clarify that I think the root of the problem is that you're mixing the class for reading with state itself (but only the top-level items of it). Let's say someone realizes they want to iterate over the items in the state, would we also add a .items() then? (I don't think that's a likely scenario, but it shows what I mean.) REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
martinvonz added inline comments. INLINE COMMENTS > pulkit wrote in state.py:84 > I don't feel convinced on what can be the benefit of it and my opinion can be > influenced by how I used it in evolve extension. But I can change it the > other way around too, let me know if you want me to change. If it's up to me, I'd definitely say that you should change it. I don't see any benefits of the current proposal. I'd be happy to hear what other reviewers think. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit added inline comments. INLINE COMMENTS > martinvonz wrote in state.py:84 > > I don't have a strong reason. I just found this an easy way. > > It seems the other way is still slightly easier :) You'd replace this: > > def __getitem__(self, key): > return self.opts[key] > > def __setitem__(self, key, value): > updates = {key: value} > self.opts.update(updates) > > by this: > > def get(self): > return self.state > > def set(self, state): > self.state = state > > I'm not even sure we'd need those two methods. It might be enough to make > load() return the state and make save() accept (and require) a new state. > > The constructor would change from: > > if not opts: > self.opts = {} > else: > self.opts = opts > > to: > > self.state = state > > The graft code for reading would change from: > > cmdstate.load() > if cmdstate['version'] < 2: > nodes = cmdstate['nodes'] > revs = [repo[node].rev() for node in nodes] > else: > # state-file is written by a newer mercurial than what we are > # using > raise error.Abort(_("unable to read to read the state file")) > > to: > > state = cmdstate.load() > if state['version'] < 2: > nodes = state['nodes'] > revs = [repo[node].rev() for node in nodes] > else: > # state-file is written by a newer mercurial than what we are > # using > raise error.Abort(_("unable to read to read the state file")) > > The graft code for writing would change from: > > cmdstate.addopts({'version': 1, 'nodes': nodelines}) > cmdstate.save() > > to: > > cmdstate.set({'version': 1, 'nodes': nodelines}) > cmdstate.save() > > > > > The one benefit is that we can pass the state object everywhere, lookup for > > values and update values on fly. > > We can still pass the "cmdstate" instance around if we wanted to do that. I > think it's a small plus that we can pass *just* the state around, though. > That means we can pass the state into a function and we can look at the call > site and know that it won't be calling .save() on it. I don't feel convinced on what can be the benefit of it and my opinion can be influenced by how I used it in evolve extension. But I can change it the other way around too, let me know if you want me to change. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
martinvonz added inline comments. INLINE COMMENTS > pulkit wrote in state.py:84 > I don't have a strong reason. I just found this an easy way. I was going to > add __setitem__() too so that we can store more new values or change existing > values. > > For graft case, it won't be just a list of nodes, we are going to store info > about `--no-commit` flag in graftstate. Moreover we need `graft --abort` too > which needs to store more things. > > The one benefit is that we can pass the state object everywhere, lookup for > values and update values on fly. > I don't have a strong reason. I just found this an easy way. It seems the other way is still slightly easier :) You'd replace this: def __getitem__(self, key): return self.opts[key] def __setitem__(self, key, value): updates = {key: value} self.opts.update(updates) by this: def get(self): return self.state def set(self, state): self.state = state I'm not even sure we'd need those two methods. It might be enough to make load() return the state and make save() accept (and require) a new state. The constructor would change from: if not opts: self.opts = {} else: self.opts = opts to: self.state = state The graft code for reading would change from: cmdstate.load() if cmdstate['version'] < 2: nodes = cmdstate['nodes'] revs = [repo[node].rev() for node in nodes] else: # state-file is written by a newer mercurial than what we are # using raise error.Abort(_("unable to read to read the state file")) to: state = cmdstate.load() if state['version'] < 2: nodes = state['nodes'] revs = [repo[node].rev() for node in nodes] else: # state-file is written by a newer mercurial than what we are # using raise error.Abort(_("unable to read to read the state file")) The graft code for writing would change from: cmdstate.addopts({'version': 1, 'nodes': nodelines}) cmdstate.save() to: cmdstate.set({'version': 1, 'nodes': nodelines}) cmdstate.save() > The one benefit is that we can pass the state object everywhere, lookup for > values and update values on fly. We can still pass the "cmdstate" instance around if we wanted to do that. I think it's a small plus that we can pass *just* the state around, though. That means we can pass the state into a function and we can look at the call site and know that it won't be calling .save() on it. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit updated this revision to Diff 7306. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D2593?vs=7300=7306 REVISION DETAIL https://phab.mercurial-scm.org/D2593 AFFECTED FILES mercurial/state.py CHANGE DETAILS diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -83,8 +83,17 @@ def _read(self): """reads the state file and returns a dictionary which contain data in the same format as it was before storing""" -with self._repo.vfs(self.fname, 'rb') as fp: -return cbor.load(fp) +try: +with self._repo.vfs(self.fname, 'rb') as fp: +ret = cbor.load(fp) +if (not isinstance(ret, dict) or +ret.get('magicheader') != 'HGStateFile'): +raise cbor.CBORDecodeError +return ret +except cbor.CBORDecodeError: # cbor raises an Exception +with self._repo.vfs(self.fname, 'rb') as fp: +oldfn = oldstatefilefns[self.fname] +return oldfn(fp) def delete(self): """drop the state file if exists""" To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit updated this revision to Diff 7300. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D2593?vs=6759=7300 REVISION DETAIL https://phab.mercurial-scm.org/D2593 AFFECTED FILES mercurial/state.py CHANGE DETAILS diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -83,8 +83,17 @@ def _read(self): """reads the state file and returns a dictionary which contain data in the same format as it was before storing""" -with self._repo.vfs(self.fname, 'rb') as fp: -return cbor.load(fp) +try: +with self._repo.vfs(self.fname, 'rb') as fp: +ret = cbor.load(fp) +if (not isinstance(ret, dict) or +ret.get('magicheader') != 'HGStateFile'): +raise cbor.CBORDecodeError +return ret +except cbor.CBORDecodeError: # cbor raises an Exception +with self._repo.vfs(self.fname, 'rb') as fp: +oldfn = oldstatefilefns[self.fname] +return oldfn(fp) def delete(self): """drop the state file if exists""" To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit added a comment. In https://phab.mercurial-scm.org/D2593#47511, @martinvonz wrote: > In https://phab.mercurial-scm.org/D2593#44291, @indygreg wrote: > > > Not sure where to record this comment in this series. So I'll pick this commit. > > > > I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial. > > > Also, it doesn't seem like CBOR defines any magic bytes to start the top-level object with, so it's not obvious to me that an old state file (e.g. on containing just a nodeid) could not be parsed as a valid CBOR file. Perhaps cmdstate should help with that? We could make it always add a first item that's just "CBOR" or something (it seem very unlikely that we'd have that in an old state file), and we could fail when reading a state file that doesn't have that. Or would could require any new state files to have a new name than the old ones? I can add another key value pair 'magicstring: CBOR' which should be present in the cbor format state files, or we can start storing statefiles in .hg/state/ folder. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit added inline comments. INLINE COMMENTS > martinvonz wrote in state.py:84 > How about not requiring it to be a dict? I imagine practically all callers > will want to pass a dict, but why does this class have to enforce it? I think > the API would be simpler if it was an opaque object. In the simple case of > graft, the state is simply a list of nodes. However, for more complex states, > we could have nested structures. I don't see why cmdstate should be involved > in lookups into the top-level structure. The difference is subtle, but here's > an example: > > # With dict-aware cmdstate > cmdstate = ... > cmdstate.load() > version = cmdstate['version'] > for car in cmdstate['cars']: > for wheel in car['wheels']: > # whatever > > # With agnostic cmdstate > cmdstate = ... > parking = cmdstate.load() > version = parking['version'] > for car in parking['cars']: > for wheel in car['wheels']: > # whatever I don't have a strong reason. I just found this an easy way. I was going to add __setitem__() too so that we can store more new values or change existing values. For graft case, it won't be just a list of nodes, we are going to store info about `--no-commit` flag in graftstate. Moreover we need `graft --abort` too which needs to store more things. The one benefit is that we can pass the state object everywhere, lookup for values and update values on fly. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
martinvonz added a comment. In https://phab.mercurial-scm.org/D2593#44291, @indygreg wrote: > Not sure where to record this comment in this series. So I'll pick this commit. > > I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial. Also, it doesn't seem like CBOR defines any magic bytes to start the top-level object with, so it's not obvious to me that an old state file (e.g. on containing just a nodeid) could not be parsed as a valid CBOR file. Perhaps cmdstate should help with that? We could make it always add a first item that's just "CBOR" or something (it seem very unlikely that we'd have that in an old state file), and we could fail when reading a state file that doesn't have that. Or would could require any new state files to have a new name than the old ones? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
martinvonz added a comment. INLINE COMMENTS > durin42 wrote in state.py:84 > Probably treat not-a-dict as corrupt and fall back to the other format? How about not requiring it to be a dict? I imagine practically all callers will want to pass a dict, but why does this class have to enforce it? I think the API would be simpler if it was an opaque object. In the simple case of graft, the state is simply a list of nodes. However, for more complex states, we could have nested structures. I don't see why cmdstate should be involved in lookups into the top-level structure. The difference is subtle, but here's an example: # With dict-aware cmdstate cmdstate = ... cmdstate.load() version = cmdstate['version'] for car in cmdstate['cars']: for wheel in car['wheels']: # whatever # With agnostic cmdstate cmdstate = ... parking = cmdstate.load() version = parking['version'] for car in parking['cars']: for wheel in car['wheels']: # whatever > state.py:82 > +with self._repo.vfs(self.fname, 'rb') as fp: > +ret = cbor.load(fp) > +if not isinstance(ret, dict): bad ident REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: martinvonz, indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
indygreg added a comment. Not sure where to record this comment in this series. So I'll pick this commit. I think we want an explicit version header in the state files so clients know when they may be reading a file in an old format. For example, if I start a merge in one terminal window, I sometimes move to another terminal window to resolve parts of it. The multiple windows may be running different Mercurial versions. For example, sometimes one of the shells has a virtualenv activated and that virtualenv is running an older Mercurial. We don't want the older Mercurial trampling on state needed by the new Mercurial. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: indygreg, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit updated this revision to Diff 6759. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D2593?vs=6618=6759 REVISION DETAIL https://phab.mercurial-scm.org/D2593 AFFECTED FILES mercurial/state.py CHANGE DETAILS diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -77,8 +77,16 @@ def _read(self): """reads the evolvestate file and returns a dictionary which contain data in the same format as it was before storing""" -with self._repo.vfs(self.fname, 'rb') as fp: -return cbor.load(fp) +try: +with self._repo.vfs(self.fname, 'rb') as fp: +ret = cbor.load(fp) +if not isinstance(ret, dict): +raise cbor.CBORDecodeError +return ret +except cbor.CBORDecodeError: # cbor raises an Exception +with self._repo.vfs(self.fname, 'rb') as fp: +oldfn = oldstatefilefns[self.fname] +return oldfn(fp) def delete(self): """drop the evolvestate file if exists""" To: pulkit, #hg-reviewers Cc: durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit added inline comments. INLINE COMMENTS > durin42 wrote in state.py:86 > Ugh. Yeah, maybe see if they'd take a patch upstream to raise a more explicit > exception type. @durin42 I feel sorry about adding a TODO but I did that because I won't be able to work for a week or so dur to laptop issue. If you are not okay with that, I am fine with these lying here. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit updated this revision to Diff 6618. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D2593?vs=6444=6618 REVISION DETAIL https://phab.mercurial-scm.org/D2593 AFFECTED FILES mercurial/state.py tests/test-check-code.t CHANGE DETAILS diff --git a/tests/test-check-code.t b/tests/test-check-code.t --- a/tests/test-check-code.t +++ b/tests/test-check-code.t @@ -11,6 +11,7 @@ > -X contrib/python-zstandard \ > -X hgext/fsmonitor/pywatchman \ > -X mercurial/thirdparty \ + > -X mercurial/state.py \ > | sed 's-\\-/-g' | "$check_code" --warnings --per-file=0 - || false Skipping i18n/polib.py it has no-che?k-code (glob) Skipping mercurial/statprof.py it has no-che?k-code (glob) diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -22,6 +22,7 @@ from .thirdparty import cbor from . import ( +error, util, ) @@ -77,8 +78,18 @@ def _read(self): """reads the evolvestate file and returns a dictionary which contain data in the same format as it was before storing""" -with self._repo.vfs(self.fname, 'rb') as fp: -return cbor.load(fp) +try: +with self._repo.vfs(self.fname, 'rb') as fp: +ret = cbor.load(fp) +if not isinstance(ret, dict): +raise error.ParseError("cbor parsed it wrong") +return ret +except: # cbor raises an Exception +# TODO: patch upstream cbor library to raise a particular exception +# and remove this file from test-check-code.t +with self._repo.vfs(self.fname, 'rb') as fp: +oldfn = oldstatefilefns[self.fname] +return oldfn(fp) def delete(self): """drop the evolvestate file if exists""" To: pulkit, #hg-reviewers Cc: durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit added inline comments. INLINE COMMENTS > state.py:84 > +if not isinstance(ret, dict): > +raise Exception("cbor parsed it wrong") > +return ret I am not confident about this part. Like to have some suggestions here. > state.py:86 > +return ret > +except: # cbor raises an Exception > +with self._repo.vfs(self.fname, 'rb') as fp: Should I change thirdparty/cbor/ to raise specific errors? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 To: pulkit, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D2593: state: add logic to parse the state file in old way if cbor fails
pulkit created this revision. Herald added a subscriber: mercurial-devel. Herald added a reviewer: hg-reviewers. REVISION SUMMARY With updating the format of statefiles to use cbor, there can be cases when a user updates hg in between a unresolved graft, rebase, histedit, cbor load function in new hg won't be able to parse that properly. So let's fallback to reading the file old way if cbor fails or does not returns a dictionary as cbor may read that data without any error. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D2593 AFFECTED FILES mercurial/state.py CHANGE DETAILS diff --git a/mercurial/state.py b/mercurial/state.py --- a/mercurial/state.py +++ b/mercurial/state.py @@ -77,8 +77,16 @@ def _read(self): """reads the evolvestate file and returns a dictionary which contain data in the same format as it was before storing""" -with self._repo.vfs(self.fname, 'rb') as fp: -return cbor.load(fp) +try: +with self._repo.vfs(self.fname, 'rb') as fp: +ret = cbor.load(fp) +if not isinstance(ret, dict): +raise Exception("cbor parsed it wrong") +return ret +except: # cbor raises an Exception +with self._repo.vfs(self.fname, 'rb') as fp: +oldfn = oldstatefilefns[self.fname] +return oldfn(fp) def delete(self): """drop the evolvestate file if exists""" To: pulkit, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel