Hi,

The fix is available at :
http://jp-andre.pagesperso-orange.fr/redo-vcn.zip

Jean-Pierre

Jean-Pierre André wrote:
Hi,

Gil Barash via ntfs-3g-devel wrote:
Hey,

Downloading the latest did the trick. Thanks.

I agree that the data replacement operations are easier, but the
bigger problems are when an attribute is added/deleted, because those
operations shift the data for the following redo operations.
I would love to hear the rational behind all this, extremely
complicated, journal scheme from Microsoft, but that would probably
never happen :-)

I asked this because I am trying to debug a partition I created when
crashing a Windows10 machine (you can find it here:
https://s3-eu-west-1.amazonaws.com/gilbucket1/ntfs-disks/ntfs_shutdown_win10.raw).


You can see that operation 239b94 fails the recovery process.

The undo data of operation 239b94 is "0f" which differs from the "a0"
found at 0x360. I think maybe the operation was targeting the "0f"  at
0x3d8.

Yes, this was meant to replace "0f" by "1f" to record a
fifth cluster being added to the index.

Note that, an earlier operation, for the same inode (45), operation
229f2a (DeleteIndexEntryRoot) - is not executed because the undo data
doesn't match. It has a length of 0x78 which is exactly 0x3d8-0x360

I tried to trace back, to see why operation 229f2a find the "wrong"
data - perhaps an earlier operation also failed to run - but I didn't
find anything definitive yet.

Yes. The offending earlier operation is 0x22705f which
puts the vcn at a wrong location. In redo_update_root_vcn()
there is some logic which remains to be understood.

In the situations I met so far, I had to add 16 to the
offset in order to get correct behavior, but for this
specific action the correct value to add is 0x70.
When forcing this value, the full log can be processed
thus restoring the partition to a consistent state.

Comparing two different situations leads to a possible
explanation : the attribute_offset (0x40) tells where
the index entry begins, and the vcn to insert is the last
field of the entry, so just add the length of entry (0x78)
minus 8.

I need some time to check this theory (which probably also
applies to redo_update_vcn()).

Jean-Pierre


Regards,
Gil

On Fri, Jun 16, 2017 at 10:59 AM, Jean-Pierre André
<jean-pierre.an...@wanadoo.fr> wrote:

Hi,

Update...

Jean-Pierre André wrote:

Gil Barash via ntfs-3g-devel wrote:

Hey,

I have two follow up issues:


[...]


--- 2 --
General question about redo process:
In change_resident_expect (called by redo_update_root_vcn, for
example) we chose not to apply the redo data if the current buffer
state doesn't match the undo data. Note that the operation is
considered successful in this case.


I think the reasoning is a follows :

- the old state was A
- an update is made, the state is now B
- a second update is made, the state is now C
- C is synced to disk, but a failure occurs before
    the syncing is recorded in the log.

When restarting, both updates have to be replayed,
and when applying the first one, the state on disk
is not the undo state (which is A), so it is correct
to apply the update (whose result will be overwritten
by the second update).

Avoiding the updates if the undo data does not match
the state on disk probably leads to the same result,
but IMHO it is safer to make sure the final state is
what the redo data says. Other updates which overlap
could be intertwined and they would be processed
incorrectly when not applied to the same state as in
the initial execution.


Well, actually the current code is doing the opposite
(that is applying the update if the current state matches
the undo data).

I have run my tests again after reversing the rule (so
applying the update if the current state does not match
the redo data), and the tests fail when the second
update destroys the attribute (e.g. deleting a file).
In this situation there is nothing the redo data can
be compared against (more exactly the current state is
meaningless and should not be compared to redo data).
There is not enough information to rebuild the intermediate
state, the only possible action is doing nothing, and
some criterion is needed to go this way.

Jean-Pierre


Can you please provide some small explanation or direct me to some
form of documentation as to why it is OK to skip an operation.


I would also be interested to get some form of
documentation...

Regards

Jean-Pierre

I suspect it might cause some kind of "chain-reaction" where future
operations on this "cluster" would also be skipped because they expect
to see the data as it should have been after applying the redo
operation.

Thanks,
Gil





------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
ntfs-3g-devel mailing list
ntfs-3g-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/ntfs-3g-devel

Reply via email to