Comparing snapshots?

2011-02-25 Thread Arvin Schnell
Hi,

for a backup program I have to find all differing files
(including metadata) in two snapshots taken from the same
subvolume.

Having looked at the find-new command I thought about this
process:

1. Get the two transids when the two snapshots were created.

2. Query modifications to the original subvolume between the two
   transids.

Is the general process corrent or have I overseen something?

AFAIS the btrfs tool does not provide the required
information/commands. Would it be possible to add those?

Thanks in advance,
  Arvin

-- 
Arvin Schnell, aschn...@suse.de
Senior Software Engineer, Research  Development
SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Comparing snapshots?

2011-02-25 Thread João Eduardo Luís
Hello,

Please note that my experience with btrfs is both recent and, above all, very 
small. However, I've been wondering about the same issue for a different 
purpose and your question intrigues me.

However, and I may be off-base here, I think that wouldn't be trivial to 
achieve. 

Even if one would be able to differ the metadata changes between both 
snapshots, the problem would still be present regarding finding the changed 
data. It would be possible to check for changed extents, at least by comparing 
extent checksums, but I don't think it would be trivial to discover where 
(exactly) the extent was modified.

I would recommend using the generation fields, whenever applicable, but I 
believe these are private to each subvolume/snapshot.


Anyway, I wonder if keeping a data structure (I would go with a tree) 
containing metadata regarding the changed files, within the file system, could 
be a plausible solution, but I'm in no condition (btrfs-knowledge-wise) to make 
such statement.


Cheers.

---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu 





On Feb 25, 2011, at 9:59 AM, Arvin Schnell wrote:

 Hi,
 
 for a backup program I have to find all differing files
 (including metadata) in two snapshots taken from the same
 subvolume.
 
 Having looked at the find-new command I thought about this
 process:
 
 1. Get the two transids when the two snapshots were created.
 
 2. Query modifications to the original subvolume between the two
   transids.
 
 Is the general process corrent or have I overseen something?
 
 AFAIS the btrfs tool does not provide the required
 information/commands. Would it be possible to add those?
 
 Thanks in advance,
  Arvin
 
 -- 
 Arvin Schnell, aschn...@suse.de
 Senior Software Engineer, Research  Development
 SUSE LINUX Products GmbH, GF: Markus Rex, HRB 16746 (AG Nürnberg)
 --
 To unsubscribe from this list: send the line unsubscribe linux-btrfs in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html



PGP.sig
Description: This is a digitally signed message part


Re: Comparing snapshots?

2011-02-25 Thread Goffredo Baroncelli
On 02/25/2011 10:59 AM, Arvin Schnell wrote:
 Hi,
 
 for a backup program I have to find all differing files
 (including metadata) in two snapshots taken from the same
 subvolume.
 
 Having looked at the find-new command I thought about this
 process:
 
 1. Get the two transids when the two snapshots were created.
 
 2. Query modifications to the original subvolume between the two
transids.
 
 Is the general process corrent or have I overseen something?

I suppose that you are thinking to something like:

- record the last trans-id (trans-id1)
- update the file-system
- [...]
- record the last trans-id (trans-id2)
- update the file-system
- [...]
- Backup all the objects which have a trans-id between (trans-id1-trans-id2]

This may lead to miss two kinds of operations
1) a file deletion
2) a file changed two times, the first one after the first snapshot,
and the second one after the second snapshot.

In the first case you would not be able to find any key update between
the two trans-id(s), because they simply doesn't exist.

In the second case the trans-id associated to the object is after trans-id2.

For solving the point two you must change Query modifications to the
original subvolume into Query modifications to the second snapshot.
This means that the second snapshot must exist (it is not sufficient to
know the trans-id)..

For solving the point one, it is needed to
a) track the change not only of the files but also of the directory (if
you remove a file, the timestamp of the directory inode is updated).

b) compare the update directories with the original ones. This means
that the first snapshot must exist (it is not sufficient to know the
trans-id).

I have to point out that for a backup purpose would be sufficient to
track the changed files (and not the deleted ones).

I started to develop a tool to comparing two snapshot. But I stopped
when I discovered that the ioctl BTRFS_IOC_TREE_SEARCH was not robust
enough for that: when I tried to find the changed inode, attribute,
extended attribute... I discovered that the ioctl BTRFS_IOC_TREE_SEARCH
don't work well is some corner case [*].

I even tried to propose a patch to mitigate the problem. But at the time
the develop efforts were (are) oriented to other issues, and the patch
was not merged..

However if you want to start to develop something, I can go deeper in
the problem.


[*] see the thread Bug in the design of the tree search ioctl API ?,
http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg07523.html

 AFAIS the btrfs tool does not provide the required
 information/commands. Would it be possible to add those?
 
 Thanks in advance,
   Arvin
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Comparing snapshots?

2011-02-25 Thread Goffredo Baroncelli
On 02/25/2011 08:32 PM, João Eduardo Luís wrote:
 Hello,
 
 Please note that my experience with btrfs is both recent and, above
 all, very small. However, I've been wondering about the same issue
 for a different purpose and your question intrigues me.
 
 However, and I may be off-base here, I think that wouldn't be trivial
 to achieve.
 
 Even if one would be able to differ the metadata changes between both
 snapshots, the problem would still be present regarding finding the
 changed data. It would be possible to check for changed extents, at
 least by comparing extent checksums, but I don't think it would be
 trivial to discover where (exactly) the extent was modified.

Look at the find-new command. It returns also which part of the file is
changed. I don't remember very well the details, but also the data is
stored in a tree like the metadata. Using the same strategies of
comparing the keys and revid leads to discover which part of the file is
changed, with minimum effort (no checksums comparing is needed).

 
 I would recommend using the generation fields, whenever applicable,
 but I believe these are private to each subvolume/snapshot.
 
 
 Anyway, I wonder if keeping a data structure (I would go with a tree)
 containing metadata regarding the changed files, within the file
 system, could be a plausible solution, but I'm in no condition
 (btrfs-knowledge-wise) to make such statement.
 
 
 Cheers.
 
 --- João Eduardo Luís gpg key: 477C26E5 from pool.keyserver.eu
 
 
 
 
 
 On Feb 25, 2011, at 9:59 AM, Arvin Schnell wrote:
 
 Hi,
 
 for a backup program I have to find all differing files (including
 metadata) in two snapshots taken from the same subvolume.
 
 Having looked at the find-new command I thought about this 
 process:
 
 1. Get the two transids when the two snapshots were created.
 
 2. Query modifications to the original subvolume between the two 
 transids.
 
 Is the general process corrent or have I overseen something?
 
 AFAIS the btrfs tool does not provide the required 
 information/commands. Would it be possible to add those?
 
 Thanks in advance, Arvin
 
 -- Arvin Schnell, aschn...@suse.de Senior Software Engineer,
 Research  Development SUSE LINUX Products GmbH, GF: Markus Rex,
 HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the
 line unsubscribe linux-btrfs in the body of a message to
 majord...@vger.kernel.org More majordomo info at
 http://vger.kernel.org/majordomo-info.html
 

--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fwd: Comparing snapshots?

2011-02-25 Thread Joao Luis
I had rich text enabled by default, and the ml bounced back the email.
Apparently, HTML equals spam and/or virus. :-)

Here goes the plain-text version.


-- Forwarded message --
From: João Eduardo Luís jecl...@gmail.com
Date: 2011/2/25
Subject: Re: Comparing snapshots?
To: kreij...@inwind.it
Cc: linux-btrfs@vger.kernel.org


 On Feb 25, 2011, at 8:08 PM, Goffredo Baroncelli wrote:
 On 02/25/2011 08:32 PM, João Eduardo Luís wrote:

 Hello,

 Please note that my experience with btrfs is both recent and, above
 all, very small. However, I've been wondering about the same issue
 for a different purpose and your question intrigues me.

 However, and I may be off-base here, I think that wouldn't be trivial
 to achieve.

 Even if one would be able to differ the metadata changes between both
 snapshots, the problem would still be present regarding finding the
 changed data. It would be possible to check for changed extents, at
 least by comparing extent checksums, but I don't think it would be
 trivial to discover where (exactly) the extent was modified.

 Look at the find-new command. It returns also which part of the file is
 changed. I don't remember very well the details, but also the data is
 stored in a tree like the metadata. Using the same strategies of
 comparing the keys and revid leads to discover which part of the file is
 changed, with minimum effort (no checksums comparing is needed).


You are right. I just took a peek at the code, and it seems the
generation id (which IIRC is the same as the id of the last modifying
transaction) is shared file system wise, instead of being snapshot or
subvolume specific.
I should have confirmed in the code before replying.

Cheers.
---
João Eduardo Luís
gpg key: 477C26E5 from pool.keyserver.eu
--
To unsubscribe from this list: send the line unsubscribe linux-btrfs in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html