Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-26 Thread Ilya Dryomov
On Mon, Sep 26, 2016 at 11:13 AM, Ilya Dryomov  wrote:
> On Mon, Sep 26, 2016 at 8:39 AM, Nikolay Borisov  wrote:
>>
>>
>> On 09/22/2016 06:36 PM, Ilya Dryomov wrote:
>>> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov  wrote:
 On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov  wrote:
>
> [snipped]
>
> cat /sys/bus/rbd/devices/47/client_id
> client157729
> cat /sys/bus/rbd/devices/1/client_id
> client157729
>
> Client client157729 is alxc13, based on correlation by the ip address
> shown by the rados -p ... command. So it's the only client where the rbd
> images are mapped.

 Well, the watches are there, but cookie numbers indicate that they may
 have been re-established, so that's inconclusive.

 My suggestion would be to repeat the test and do repeated freezes to
 see if snapshot continues to follow HEAD.

 Further, to rule out a missed snap context update, repeat the test, but
 stick

 # echo 1 >/sys/bus/rbd/devices//refresh

 after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
 would be 47).
>>>
>>> Hi Nikolay,
>>>
>>> Any news on this?
>>
>> Hello,
>>
>> I was on holiday hence the radio silence. Here is the latest set of
>> tests that were run:
>>
>> Results:
>>
>> c11579 (100GB - used: 83GB):
>> root@alxc13:~# rbd showmapped |grep c11579
>> 47  rbd  c11579 -/dev/rbd47
>> root@alxc13:~# fsfreeze -f /var/lxc/c11579
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s
>> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>> root@alxc13:~# rbd snap create rbd/c11579@snap_test
>> root@alxc13:~# rbd map c11579@snap_test
>> /dev/rbd1
>> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s
>> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s
>> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>> root@alxc13:~# file -s /dev/rbd1
>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files)
>> (huge files)
>> root@alxc13:~# fsfreeze -u /var/lxc/c11579
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s
>> 65294ce9eae5694a56054ec4af011264  /dev/fd/63
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s
>> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>>
>> 30min later:
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s
>> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>>
>>
>>
>> c12607 (30GB - used: 4GB):
>> root@alxc13:~# rbd showmapped |grep c12607
>> 39  rbd  c12607 -/dev/rbd39
>> root@alxc13:~# fsfreeze -f /var/lxc/c12607
>> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
>> 3840+0 records in
>> 3840+0 records out
>> 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s
>> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
>> root@alxc13:~# rbd snap create rbd/c12607@snap_test
>> root@alxc13:~# rbd map c12607@snap_test
>> /dev/rbd21
>> root@alxc13:~# rbd snap protect rbd/c12607@snap_test
>> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh
>> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
>> 3840+0 records in
>> 3840+0 records out
>> 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s
>> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
>> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
>> 3840+0 records in
>> 3840+0 records out
>> 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s
>> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
>> root@alxc13:~# file -s /dev/rbd21
>> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
>> (huge files)
>> root@alxc13:~# fsfreeze -u /var/lxc/c12607
>> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
>> 3840+0 records in
>> 3840+0 records out
>> 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s
>> 71c5efc24162452473cda50155cd4399  /dev/fd/63
>> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
>> 3840+0 records in
>> 3840+0 records out
>> 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s
>> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
>> root@alxc13:~# file -s /dev/rbd21
>> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
>> (huge files)
>> root@alxc13:~#

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-26 Thread Ilya Dryomov
On Mon, Sep 26, 2016 at 8:39 AM, Nikolay Borisov  wrote:
>
>
> On 09/22/2016 06:36 PM, Ilya Dryomov wrote:
>> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov  wrote:
>>> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov  wrote:

 [snipped]

 cat /sys/bus/rbd/devices/47/client_id
 client157729
 cat /sys/bus/rbd/devices/1/client_id
 client157729

 Client client157729 is alxc13, based on correlation by the ip address
 shown by the rados -p ... command. So it's the only client where the rbd
 images are mapped.
>>>
>>> Well, the watches are there, but cookie numbers indicate that they may
>>> have been re-established, so that's inconclusive.
>>>
>>> My suggestion would be to repeat the test and do repeated freezes to
>>> see if snapshot continues to follow HEAD.
>>>
>>> Further, to rule out a missed snap context update, repeat the test, but
>>> stick
>>>
>>> # echo 1 >/sys/bus/rbd/devices//refresh
>>>
>>> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
>>> would be 47).
>>
>> Hi Nikolay,
>>
>> Any news on this?
>
> Hello,
>
> I was on holiday hence the radio silence. Here is the latest set of
> tests that were run:
>
> Results:
>
> c11579 (100GB - used: 83GB):
> root@alxc13:~# rbd showmapped |grep c11579
> 47  rbd  c11579 -/dev/rbd47
> root@alxc13:~# fsfreeze -f /var/lxc/c11579
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s
> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
> root@alxc13:~# rbd snap create rbd/c11579@snap_test
> root@alxc13:~# rbd map c11579@snap_test
> /dev/rbd1
> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s
> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s
> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
> root@alxc13:~# file -s /dev/rbd1
> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files)
> (huge files)
> root@alxc13:~# fsfreeze -u /var/lxc/c11579
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s
> 65294ce9eae5694a56054ec4af011264  /dev/fd/63
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s
> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>
> 30min later:
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s
> f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
>
>
>
> c12607 (30GB - used: 4GB):
> root@alxc13:~# rbd showmapped |grep c12607
> 39  rbd  c12607 -/dev/rbd39
> root@alxc13:~# fsfreeze -f /var/lxc/c12607
> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s
> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
> root@alxc13:~# rbd snap create rbd/c12607@snap_test
> root@alxc13:~# rbd map c12607@snap_test
> /dev/rbd21
> root@alxc13:~# rbd snap protect rbd/c12607@snap_test
> root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh
> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s
> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s
> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
> root@alxc13:~# file -s /dev/rbd21
> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
> (huge files)
> root@alxc13:~# fsfreeze -u /var/lxc/c12607
> root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s
> 71c5efc24162452473cda50155cd4399  /dev/fd/63
> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s
> e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
> root@alxc13:~# file -s /dev/rbd21
> /dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
> (huge files)
> root@alxc13:~#
>
> 30min later:
> root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
> 3840+0 records in
> 3840+0 records out
> 32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s
> 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-26 Thread Nikolay Borisov


On 09/22/2016 06:36 PM, Ilya Dryomov wrote:
> On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov  wrote:
>> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov  wrote:
>>>
>>> [snipped]
>>>
>>> cat /sys/bus/rbd/devices/47/client_id
>>> client157729
>>> cat /sys/bus/rbd/devices/1/client_id
>>> client157729
>>>
>>> Client client157729 is alxc13, based on correlation by the ip address
>>> shown by the rados -p ... command. So it's the only client where the rbd
>>> images are mapped.
>>
>> Well, the watches are there, but cookie numbers indicate that they may
>> have been re-established, so that's inconclusive.
>>
>> My suggestion would be to repeat the test and do repeated freezes to
>> see if snapshot continues to follow HEAD.
>>
>> Further, to rule out a missed snap context update, repeat the test, but
>> stick
>>
>> # echo 1 >/sys/bus/rbd/devices//refresh
>>
>> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
>> would be 47).
> 
> Hi Nikolay,
> 
> Any news on this?

Hello,

I was on holiday hence the radio silence. Here is the latest set of
tests that were run:

Results:

c11579 (100GB - used: 83GB):
root@alxc13:~# rbd showmapped |grep c11579
47  rbd  c11579 -/dev/rbd47
root@alxc13:~# fsfreeze -f /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 686.382 s, 156 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test
/dev/rbd1
root@alxc13:~# echo 1 >/sys/bus/rbd/devices/47/refresh
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 915.225 s, 117 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 863.464 s, 143 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63
root@alxc13:~# file -s /dev/rbd1
/dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~# fsfreeze -u /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 730.243 s, 147 MB/s
65294ce9eae5694a56054ec4af011264  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 649.373 s, 165 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63

30min later:
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 648.328 s, 166 MB/s
f2edb5abb100de30c1301b0856e595aa  /dev/fd/63



c12607 (30GB - used: 4GB):
root@alxc13:~# rbd showmapped |grep c12607
39  rbd  c12607 -/dev/rbd39
root@alxc13:~# fsfreeze -f /var/lxc/c12607
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 228.2 s, 141 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# rbd snap create rbd/c12607@snap_test
root@alxc13:~# rbd map c12607@snap_test
/dev/rbd21
root@alxc13:~# rbd snap protect rbd/c12607@snap_test
root@alxc13:~# echo 1 >/sys/bus/rbd/devices/39/refresh
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 217.138 s, 148 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 212.254 s, 152 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# file -s /dev/rbd21
/dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~# fsfreeze -u /var/lxc/c12607
root@alxc13:~# md5sum <(dd if=/dev/rbd39 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 322.964 s, 99.7 MB/s
71c5efc24162452473cda50155cd4399  /dev/fd/63
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 326.273 s, 98.7 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63
root@alxc13:~# file -s /dev/rbd21
/dev/rbd21: Linux rev 1.0 ext4 filesystem data (extents) (large files)
(huge files)
root@alxc13:~#

30min later:
root@alxc13:~# md5sum <(dd if=/dev/rbd21 iflag=direct bs=8M)
3840+0 records in
3840+0 records out
32212254720 bytes (32 GB) copied, 359.917 s, 89.5 MB/s
e6ce3ea688a778b9c732041164b4638c  /dev/fd/63

Everything seems consistent, but when an rsync was initiated from the
snapshot it again failed. Unfortunately I deem those results rather
unstable because they now contradict the ones which I showed you earlier
with the differing checksums.


> 
> Thanks,
> 
> Ilya
> 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-22 Thread Ilya Dryomov
On Thu, Sep 15, 2016 at 3:18 PM, Ilya Dryomov  wrote:
> On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov  wrote:
>>
>> [snipped]
>>
>> cat /sys/bus/rbd/devices/47/client_id
>> client157729
>> cat /sys/bus/rbd/devices/1/client_id
>> client157729
>>
>> Client client157729 is alxc13, based on correlation by the ip address
>> shown by the rados -p ... command. So it's the only client where the rbd
>> images are mapped.
>
> Well, the watches are there, but cookie numbers indicate that they may
> have been re-established, so that's inconclusive.
>
> My suggestion would be to repeat the test and do repeated freezes to
> see if snapshot continues to follow HEAD.
>
> Further, to rule out a missed snap context update, repeat the test, but
> stick
>
> # echo 1 >/sys/bus/rbd/devices//refresh
>
> after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
> would be 47).

Hi Nikolay,

Any news on this?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Ilya Dryomov
On Thu, Sep 15, 2016 at 2:43 PM, Nikolay Borisov  wrote:
>
> [snipped]
>
> cat /sys/bus/rbd/devices/47/client_id
> client157729
> cat /sys/bus/rbd/devices/1/client_id
> client157729
>
> Client client157729 is alxc13, based on correlation by the ip address
> shown by the rados -p ... command. So it's the only client where the rbd
> images are mapped.

Well, the watches are there, but cookie numbers indicate that they may
have been re-established, so that's inconclusive.

My suggestion would be to repeat the test and do repeated freezes to
see if snapshot continues to follow HEAD.

Further, to rule out a missed snap context update, repeat the test, but
stick

# echo 1 >/sys/bus/rbd/devices//refresh

after "rbd snap create" (for the today's test, ID_OF_THE_ORIG_DEVICE
would be 47).

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Nikolay Borisov


On 09/15/2016 03:15 PM, Ilya Dryomov wrote:
> On Thu, Sep 15, 2016 at 12:54 PM, Nikolay Borisov  wrote:
>>
>>
>> On 09/15/2016 01:24 PM, Ilya Dryomov wrote:
>>> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov
>>>  wrote:


 On 09/15/2016 09:22 AM, Nikolay Borisov wrote:
>
>
> On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
>> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:
>>>
>>>
>>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
 On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  
 wrote:
>
>
> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>
>> I found I could ignore the XFS issues and just mount it with the 
>> appropriate options (below from my backup scripts):
>>
>> #
>> # Mount with nouuid (conflicting XFS) and norecovery (ro 
>> snapshot)
>> #
>> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; 
>> then
>> echo "FAILED: Unable to mount snapshot $DATESTAMP of 
>> $FS - cleaning up"
>> rbd unmap $SNAPDEV
>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>> exit 3;
>> fi
>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>
>> It's impossible without clones to do it without norecovery.
>
> But shouldn't freezing the fs and doing a snapshot constitute a "clean
> unmount" hence no need to recover on the next mount (of the snapshot) 
> -
> Ilya?

 I *thought* it should (well, except for orphan inodes), but now I'm not
 sure.  Have you tried reproducing with loop devices yet?
>>>
>>> Here is what the checksum tests showed:
>>>
>>> fsfreeze -f  /mountpoit
>>> md5sum /dev/rbd0
>>> f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
>>> rbd snap create xx@xxx && rbd snap protect xx@xxx
>>> rbd map xx@xxx
>>> md5sum /dev/rbd1
>>> 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1
>>>
>>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
>>> different, worrying.
>>
>> Sorry, for the filesystem device you should do
>>
>> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)
>>
>> to get what's actually on disk, so that it's apples to apples.
>
> root@alxc13:~# rbd showmapped  |egrep "device|c11579"
> id  pool image  snap  device
> 47  rbd  c11579 - /dev/rbd47
> root@alxc13:~# fsfreeze -f /var/lxc/c11579
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63  <--- Check sum after 
> freeze
> root@alxc13:~# rbd snap create rbd/c11579@snap_test
> root@alxc13:~# rbd map c11579@snap_test
> /dev/rbd1
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63 <--- Check sum of 
> snapshot
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63<--- Check sum of 
> original device, not changed - GOOD
> root@alxc13:~# file -s /dev/rbd1
> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) 
> (huge files)
> root@alxc13:~# fsfreeze -u /var/lxc/c11579
> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
> 92b7182591d7d7380435cfdea79a8897  /dev/fd/63   <--- After unfreeze 
> checksum is different - OK
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
> bc3b68f0276c608d9435223f89589962  /dev/fd/63 <--- Why the heck the 
> checksum of the snapshot is different after unfreeze? BAD?
> root@alxc13:~# file -s /dev/rbd1
> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) 
> (extents) (large files) (huge files)
> root@alxc13:~#
>

 And something even more peculiar - taking an md5sum some hours after the
 above test produced this:

 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s
 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Ilya Dryomov
On Thu, Sep 15, 2016 at 12:54 PM, Nikolay Borisov  wrote:
>
>
> On 09/15/2016 01:24 PM, Ilya Dryomov wrote:
>> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov
>>  wrote:
>>>
>>>
>>> On 09/15/2016 09:22 AM, Nikolay Borisov wrote:


 On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:
>>
>>
>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
>>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  
>>> wrote:


 On 09/14/2016 09:55 AM, Adrian Saul wrote:
>
> I found I could ignore the XFS issues and just mount it with the 
> appropriate options (below from my backup scripts):
>
> #
> # Mount with nouuid (conflicting XFS) and norecovery (ro 
> snapshot)
> #
> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; 
> then
> echo "FAILED: Unable to mount snapshot $DATESTAMP of 
> $FS - cleaning up"
> rbd unmap $SNAPDEV
> rbd snap rm ${RBDPATH}@${DATESTAMP}
> exit 3;
> fi
> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>
> It's impossible without clones to do it without norecovery.

 But shouldn't freezing the fs and doing a snapshot constitute a "clean
 unmount" hence no need to recover on the next mount (of the snapshot) -
 Ilya?
>>>
>>> I *thought* it should (well, except for orphan inodes), but now I'm not
>>> sure.  Have you tried reproducing with loop devices yet?
>>
>> Here is what the checksum tests showed:
>>
>> fsfreeze -f  /mountpoit
>> md5sum /dev/rbd0
>> f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
>> rbd snap create xx@xxx && rbd snap protect xx@xxx
>> rbd map xx@xxx
>> md5sum /dev/rbd1
>> 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1
>>
>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
>> different, worrying.
>
> Sorry, for the filesystem device you should do
>
> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)
>
> to get what's actually on disk, so that it's apples to apples.

 root@alxc13:~# rbd showmapped  |egrep "device|c11579"
 id  pool image  snap  device
 47  rbd  c11579 - /dev/rbd47
 root@alxc13:~# fsfreeze -f /var/lxc/c11579
 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63  <--- Check sum after 
 freeze
 root@alxc13:~# rbd snap create rbd/c11579@snap_test
 root@alxc13:~# rbd map c11579@snap_test
 /dev/rbd1
 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63 <--- Check sum of snapshot
 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63<--- Check sum of original 
 device, not changed - GOOD
 root@alxc13:~# file -s /dev/rbd1
 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) 
 (huge files)
 root@alxc13:~# fsfreeze -u /var/lxc/c11579
 root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
 92b7182591d7d7380435cfdea79a8897  /dev/fd/63   <--- After unfreeze 
 checksum is different - OK
 root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
 12800+0 records in
 12800+0 records out
 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
 bc3b68f0276c608d9435223f89589962  /dev/fd/63 <--- Why the heck the 
 checksum of the snapshot is different after unfreeze? BAD?
 root@alxc13:~# file -s /dev/rbd1
 /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) 
 (extents) (large files) (huge files)
 root@alxc13:~#

>>>
>>> And something even more peculiar - taking an md5sum some hours after the
>>> above test produced this:
>>>
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s
>>> e68e41616489d41544cd873c73defb08  /dev/fd/63
>>>
>>> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't
>>> recreated, just the same old snapshot. Is this normal?
>>
>> Hrm, I wonder if it missed a 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Nikolay Borisov


On 09/15/2016 01:24 PM, Ilya Dryomov wrote:
> On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov
>  wrote:
>>
>>
>> On 09/15/2016 09:22 AM, Nikolay Borisov wrote:
>>>
>>>
>>> On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
 On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:
>
>
> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>>>
>>>
>>> On 09/14/2016 09:55 AM, Adrian Saul wrote:

 I found I could ignore the XFS issues and just mount it with the 
 appropriate options (below from my backup scripts):

 #
 # Mount with nouuid (conflicting XFS) and norecovery (ro 
 snapshot)
 #
 if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
 echo "FAILED: Unable to mount snapshot $DATESTAMP of 
 $FS - cleaning up"
 rbd unmap $SNAPDEV
 rbd snap rm ${RBDPATH}@${DATESTAMP}
 exit 3;
 fi
 echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"

 It's impossible without clones to do it without norecovery.
>>>
>>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>>> unmount" hence no need to recover on the next mount (of the snapshot) -
>>> Ilya?
>>
>> I *thought* it should (well, except for orphan inodes), but now I'm not
>> sure.  Have you tried reproducing with loop devices yet?
>
> Here is what the checksum tests showed:
>
> fsfreeze -f  /mountpoit
> md5sum /dev/rbd0
> f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
> rbd snap create xx@xxx && rbd snap protect xx@xxx
> rbd map xx@xxx
> md5sum /dev/rbd1
> 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1
>
> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
> different, worrying.

 Sorry, for the filesystem device you should do

 md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)

 to get what's actually on disk, so that it's apples to apples.
>>>
>>> root@alxc13:~# rbd showmapped  |egrep "device|c11579"
>>> id  pool image  snap  device
>>> 47  rbd  c11579 - /dev/rbd47
>>> root@alxc13:~# fsfreeze -f /var/lxc/c11579
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63  <--- Check sum after 
>>> freeze
>>> root@alxc13:~# rbd snap create rbd/c11579@snap_test
>>> root@alxc13:~# rbd map c11579@snap_test
>>> /dev/rbd1
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63 <--- Check sum of snapshot
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
>>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63<--- Check sum of original 
>>> device, not changed - GOOD
>>> root@alxc13:~# file -s /dev/rbd1
>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge 
>>> files)
>>> root@alxc13:~# fsfreeze -u /var/lxc/c11579
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
>>> 92b7182591d7d7380435cfdea79a8897  /dev/fd/63   <--- After unfreeze checksum 
>>> is different - OK
>>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>>> 12800+0 records in
>>> 12800+0 records out
>>> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
>>> bc3b68f0276c608d9435223f89589962  /dev/fd/63 <--- Why the heck the checksum 
>>> of the snapshot is different after unfreeze? BAD?
>>> root@alxc13:~# file -s /dev/rbd1
>>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) 
>>> (extents) (large files) (huge files)
>>> root@alxc13:~#
>>>
>>
>> And something even more peculiar - taking an md5sum some hours after the
>> above test produced this:
>>
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s
>> e68e41616489d41544cd873c73defb08  /dev/fd/63
>>
>> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't
>> recreated, just the same old snapshot. Is this normal?
> 
> Hrm, I wonder if it missed a snapshot context update.  Please pastebin
> entire dmesg for that boot.

The machine has been up more than 2 and the dmesg has been rewritten
several times for that time. Also the node is rather busy so there's
plenty 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Ilya Dryomov
On Thu, Sep 15, 2016 at 10:22 AM, Nikolay Borisov
 wrote:
>
>
> On 09/15/2016 09:22 AM, Nikolay Borisov wrote:
>>
>>
>> On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
>>> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:


 On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>>
>>
>> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>>
>>> I found I could ignore the XFS issues and just mount it with the 
>>> appropriate options (below from my backup scripts):
>>>
>>> #
>>> # Mount with nouuid (conflicting XFS) and norecovery (ro 
>>> snapshot)
>>> #
>>> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
>>> echo "FAILED: Unable to mount snapshot $DATESTAMP of 
>>> $FS - cleaning up"
>>> rbd unmap $SNAPDEV
>>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>>> exit 3;
>>> fi
>>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>>
>>> It's impossible without clones to do it without norecovery.
>>
>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>> unmount" hence no need to recover on the next mount (of the snapshot) -
>> Ilya?
>
> I *thought* it should (well, except for orphan inodes), but now I'm not
> sure.  Have you tried reproducing with loop devices yet?

 Here is what the checksum tests showed:

 fsfreeze -f  /mountpoit
 md5sum /dev/rbd0
 f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
 rbd snap create xx@xxx && rbd snap protect xx@xxx
 rbd map xx@xxx
 md5sum /dev/rbd1
 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1

 where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
 different, worrying.
>>>
>>> Sorry, for the filesystem device you should do
>>>
>>> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)
>>>
>>> to get what's actually on disk, so that it's apples to apples.
>>
>> root@alxc13:~# rbd showmapped  |egrep "device|c11579"
>> id  pool image  snap  device
>> 47  rbd  c11579 - /dev/rbd47
>> root@alxc13:~# fsfreeze -f /var/lxc/c11579
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63  <--- Check sum after freeze
>> root@alxc13:~# rbd snap create rbd/c11579@snap_test
>> root@alxc13:~# rbd map c11579@snap_test
>> /dev/rbd1
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63 <--- Check sum of snapshot
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
>> 2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63<--- Check sum of original 
>> device, not changed - GOOD
>> root@alxc13:~# file -s /dev/rbd1
>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge 
>> files)
>> root@alxc13:~# fsfreeze -u /var/lxc/c11579
>> root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
>> 92b7182591d7d7380435cfdea79a8897  /dev/fd/63   <--- After unfreeze checksum 
>> is different - OK
>> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
>> 12800+0 records in
>> 12800+0 records out
>> 107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
>> bc3b68f0276c608d9435223f89589962  /dev/fd/63 <--- Why the heck the checksum 
>> of the snapshot is different after unfreeze? BAD?
>> root@alxc13:~# file -s /dev/rbd1
>> /dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) 
>> (extents) (large files) (huge files)
>> root@alxc13:~#
>>
>
> And something even more peculiar - taking an md5sum some hours after the
> above test produced this:
>
> root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
> 12800+0 records in
> 12800+0 records out
> 107374182400 bytes (107 GB) copied, 636.836 s, 169 MB/s
> e68e41616489d41544cd873c73defb08  /dev/fd/63
>
> Meaning the read-only snapshot somehow has "mutated". E.g. it wasn't
> recreated, just the same old snapshot. Is this normal?

Hrm, I wonder if it missed a snapshot context update.  Please pastebin
entire dmesg for that boot.

Have those devices been remapped or alxc13 rebooted since then?  If
not, what's the output of

$ rados -p rbd listwatchers $(rbd info c11579 | grep block_name_prefix
| awk '{ print $2 }' | sed 's/rbd_data/rbd_header/')

and can you check whether that snapshot is continuing to mutate as the
image is mutated - freeze 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-15 Thread Nikolay Borisov


On 09/14/2016 05:53 PM, Ilya Dryomov wrote:
> On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:
>>
>>
>> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
>>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:


 On 09/14/2016 09:55 AM, Adrian Saul wrote:
>
> I found I could ignore the XFS issues and just mount it with the 
> appropriate options (below from my backup scripts):
>
> #
> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
> #
> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS 
> - cleaning up"
> rbd unmap $SNAPDEV
> rbd snap rm ${RBDPATH}@${DATESTAMP}
> exit 3;
> fi
> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>
> It's impossible without clones to do it without norecovery.

 But shouldn't freezing the fs and doing a snapshot constitute a "clean
 unmount" hence no need to recover on the next mount (of the snapshot) -
 Ilya?
>>>
>>> I *thought* it should (well, except for orphan inodes), but now I'm not
>>> sure.  Have you tried reproducing with loop devices yet?
>>
>> Here is what the checksum tests showed:
>>
>> fsfreeze -f  /mountpoit
>> md5sum /dev/rbd0
>> f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
>> rbd snap create xx@xxx && rbd snap protect xx@xxx
>> rbd map xx@xxx
>> md5sum /dev/rbd1
>> 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1
>>
>> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
>> different, worrying.
> 
> Sorry, for the filesystem device you should do
> 
> md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)
> 
> to get what's actually on disk, so that it's apples to apples.

root@alxc13:~# rbd showmapped  |egrep "device|c11579"
id  pool image  snap  device
47  rbd  c11579 - /dev/rbd47
root@alxc13:~# fsfreeze -f /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 617.815 s, 174 MB/s
2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63  <--- Check sum after freeze
root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test
/dev/rbd1
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 610.043 s, 176 MB/s
2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63 <--- Check sum of snapshot
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 592.164 s, 181 MB/s
2ddc99ce1b3ef51da1945d9da25ac296  /dev/fd/63<--- Check sum of original 
device, not changed - GOOD
root@alxc13:~# file -s /dev/rbd1
/dev/rbd1: Linux rev 1.0 ext4 filesystem data (extents) (large files) (huge 
files)
root@alxc13:~# fsfreeze -u /var/lxc/c11579
root@alxc13:~# md5sum <(dd if=/dev/rbd47 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 647.01 s, 166 MB/s
92b7182591d7d7380435cfdea79a8897  /dev/fd/63   <--- After unfreeze checksum is 
different - OK
root@alxc13:~# md5sum <(dd if=/dev/rbd1 iflag=direct bs=8M)
12800+0 records in
12800+0 records out
107374182400 bytes (107 GB) copied, 590.556 s, 182 MB/s
bc3b68f0276c608d9435223f89589962  /dev/fd/63 <--- Why the heck the checksum of 
the snapshot is different after unfreeze? BAD?
root@alxc13:~# file -s /dev/rbd1
/dev/rbd1: Linux rev 1.0 ext4 filesystem data (needs journal recovery) 
(extents) (large files) (huge files)
root@alxc13:~# 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Ilya Dryomov
On Wed, Sep 14, 2016 at 3:30 PM, Nikolay Borisov  wrote:
>
>
> On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
>> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>>>
>>>
>>> On 09/14/2016 09:55 AM, Adrian Saul wrote:

 I found I could ignore the XFS issues and just mount it with the 
 appropriate options (below from my backup scripts):

 #
 # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
 #
 if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
 echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
 cleaning up"
 rbd unmap $SNAPDEV
 rbd snap rm ${RBDPATH}@${DATESTAMP}
 exit 3;
 fi
 echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"

 It's impossible without clones to do it without norecovery.
>>>
>>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>>> unmount" hence no need to recover on the next mount (of the snapshot) -
>>> Ilya?
>>
>> I *thought* it should (well, except for orphan inodes), but now I'm not
>> sure.  Have you tried reproducing with loop devices yet?
>
> Here is what the checksum tests showed:
>
> fsfreeze -f  /mountpoit
> md5sum /dev/rbd0
> f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
> rbd snap create xx@xxx && rbd snap protect xx@xxx
> rbd map xx@xxx
> md5sum /dev/rbd1
> 6f702740281874632c73aeb2c0fcf34a  /dev/rbd1
>
> where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
> different, worrying.

Sorry, for the filesystem device you should do

md5sum <(dd if=/dev/rbd0 iflag=direct bs=8M)

to get what's actually on disk, so that it's apples to apples.

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Nikolay Borisov


On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>>
>>
>> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>>
>>> I found I could ignore the XFS issues and just mount it with the 
>>> appropriate options (below from my backup scripts):
>>>
>>> #
>>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
>>> #
>>> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
>>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
>>> cleaning up"
>>> rbd unmap $SNAPDEV
>>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>>> exit 3;
>>> fi
>>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>>
>>> It's impossible without clones to do it without norecovery.
>>
>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>> unmount" hence no need to recover on the next mount (of the snapshot) -
>> Ilya?
> 
> I *thought* it should (well, except for orphan inodes), but now I'm not
> sure.  Have you tried reproducing with loop devices yet?

Here is what the checksum tests showed:

fsfreeze -f  /mountpoit
md5sum /dev/rbd0
f33c926373ad604da674bcbfbe6460c5  /dev/rbd0
rbd snap create xx@xxx && rbd snap protect xx@xxx
rbd map xx@xxx
md5sum /dev/rbd1
6f702740281874632c73aeb2c0fcf34a  /dev/rbd1

where rbd1 is a snapshot of the rbd0 device. So the checksum is indeed
different, worrying.

> 
> Thanks,
> 
> Ilya
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Nikolay Borisov


On 09/14/2016 02:55 PM, Ilya Dryomov wrote:
> On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>>
>>
>> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>>
>>> I found I could ignore the XFS issues and just mount it with the 
>>> appropriate options (below from my backup scripts):
>>>
>>> #
>>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
>>> #
>>> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
>>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
>>> cleaning up"
>>> rbd unmap $SNAPDEV
>>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>>> exit 3;
>>> fi
>>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>>
>>> It's impossible without clones to do it without norecovery.
>>
>> But shouldn't freezing the fs and doing a snapshot constitute a "clean
>> unmount" hence no need to recover on the next mount (of the snapshot) -
>> Ilya?
> 
> I *thought* it should (well, except for orphan inodes), but now I'm not
> sure.  Have you tried reproducing with loop devices yet?

Unfortunately not yet since this is being tested in our production setup
which is non-trivial to replicate in a test environment. Tonight the
results of the checksumming experiments should be available.

While on the topic this might very well be caused by a race in the
fsfreeze code as seen here: https://lkml.org/lkml/2016/9/12/337

Also this is observed only on large and busy volumes (e.g. testing with
a 10g volume which is not very busy doesn't exhibit the corruption).



> 
> Thanks,
> 
> Ilya
> 
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Ilya Dryomov
On Wed, Sep 14, 2016 at 9:01 AM, Nikolay Borisov  wrote:
>
>
> On 09/14/2016 09:55 AM, Adrian Saul wrote:
>>
>> I found I could ignore the XFS issues and just mount it with the appropriate 
>> options (below from my backup scripts):
>>
>> #
>> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
>> #
>> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
>> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
>> cleaning up"
>> rbd unmap $SNAPDEV
>> rbd snap rm ${RBDPATH}@${DATESTAMP}
>> exit 3;
>> fi
>> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
>>
>> It's impossible without clones to do it without norecovery.
>
> But shouldn't freezing the fs and doing a snapshot constitute a "clean
> unmount" hence no need to recover on the next mount (of the snapshot) -
> Ilya?

I *thought* it should (well, except for orphan inodes), but now I'm not
sure.  Have you tried reproducing with loop devices yet?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Adrian Saul
> But shouldn't freezing the fs and doing a snapshot constitute a "clean
> unmount" hence no need to recover on the next mount (of the snapshot) -
> Ilya?

It's what I thought as well, but XFS seems to want to attempt to replay the log 
regardless on mount and write to the device to do so.  This was the only way I 
found to mount it without converting the snapshot to a clone (which I couldn't 
do with the image options enabled anyway).

I have this script snapshotting, mounting and backing up multiple file systems 
on my cluster with no issue.
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Nikolay Borisov


On 09/14/2016 09:55 AM, Adrian Saul wrote:
> 
> I found I could ignore the XFS issues and just mount it with the appropriate 
> options (below from my backup scripts):
> 
> #
> # Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
> #
> if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
> echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
> cleaning up"
> rbd unmap $SNAPDEV
> rbd snap rm ${RBDPATH}@${DATESTAMP}
> exit 3;
> fi
> echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"
> 
> It's impossible without clones to do it without norecovery.

But shouldn't freezing the fs and doing a snapshot constitute a "clean
unmount" hence no need to recover on the next mount (of the snapshot) -
Ilya?

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-14 Thread Adrian Saul

I found I could ignore the XFS issues and just mount it with the appropriate 
options (below from my backup scripts):

#
# Mount with nouuid (conflicting XFS) and norecovery (ro snapshot)
#
if ! mount -o ro,nouuid,norecovery  $SNAPDEV /backup${FS}; then
echo "FAILED: Unable to mount snapshot $DATESTAMP of $FS - 
cleaning up"
rbd unmap $SNAPDEV
rbd snap rm ${RBDPATH}@${DATESTAMP}
exit 3;
fi
echo "Backup snapshot of $RBDPATH mounted at: /backup${FS}"

It's impossible without clones to do it without norecovery.



> -Original Message-
> From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
> Ilya Dryomov
> Sent: Wednesday, 14 September 2016 1:51 AM
> To: Nikolay Borisov
> Cc: ceph-users; SiteGround Operations
> Subject: Re: [ceph-users] Consistency problems when taking RBD snapshot
>
> On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov <ker...@kyup.com> wrote:
> >
> >
> > On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
> > [SNIP]
> >>
> >> Hmm, it could be about whether it is able to do journal replay on
> >> mount.  When you mount a snapshot, you get a read-only block device;
> >> when you mount a clone image, you get a read-write block device.
> >>
> >> Let's try this again, suppose image is foo and snapshot is snap:
> >>
> >> # fsfreeze -f /mnt
> >>
> >> # rbd snap create foo@snap
> >> # rbd map foo@snap
> >> /dev/rbd0
> >> # file -s /dev/rbd0
> >> # fsck.ext4 -n /dev/rbd0
> >> # mount /dev/rbd0 /foo
> >> # umount /foo
> >> 
> >> # file -s /dev/rbd0
> >> # fsck.ext4 -n /dev/rbd0
> >>
> >> # rbd clone foo@snap bar
> >> $ rbd map bar
> >> /dev/rbd1
> >> # file -s /dev/rbd1
> >> # fsck.ext4 -n /dev/rbd1
> >> # mount /dev/rbd1 /bar
> >> # umount /bar
> >> 
> >> # file -s /dev/rbd1
> >> # fsck.ext4 -n /dev/rbd1
> >>
> >> Could you please provide the output for the above?
> >
> > Here you go : http://paste.ubuntu.com/23173721/
>
> OK, so that explains it: the frozen filesystem is "needs journal recovery", so
> mounting it off of read-only block device leads to errors.
>
> root@alxc13:~# fsfreeze -f /var/lxc/c11579 root@alxc13:~# rbd snap create
> rbd/c11579@snap_test root@alxc13:~# rbd map c11579@snap_test
> /dev/rbd151
> root@alxc13:~# fsfreeze -u /var/lxc/c11579 root@alxc13:~# file -s
> /dev/rbd151
> /dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal
> recovery) (extents) (large files) (huge files)
>
> Now, to isolate the problem, the easiest would probably be to try to
> reproduce it with loop devices.  Can you try dding one of these images to a
> file, make sure that the filesystem is clean, losetup + mount, freeze, make a
> "snapshot" with cp and losetup -r + mount?
>
> Try sticking file -s before unfreeze and also compare md5sums:
>
> root@alxc13:~# fsfreeze -f /var/lxc/c11579  device> root@alxc13:~# rbd snap create rbd/c11579@snap_test
> root@alxc13:~# rbd map c11579@snap_test  device>  root@alxc13:~# file -s /dev/rbd151
> root@alxc13:~# fsfreeze -u /var/lxc/c11579  device>  root@alxc13:~# file -s /dev/rbd151
>
> Thanks,
>
> Ilya
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 4:11 PM, Nikolay Borisov  wrote:
>
>
> On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
> [SNIP]
>>
>> Hmm, it could be about whether it is able to do journal replay on
>> mount.  When you mount a snapshot, you get a read-only block device;
>> when you mount a clone image, you get a read-write block device.
>>
>> Let's try this again, suppose image is foo and snapshot is snap:
>>
>> # fsfreeze -f /mnt
>>
>> # rbd snap create foo@snap
>> # rbd map foo@snap
>> /dev/rbd0
>> # file -s /dev/rbd0
>> # fsck.ext4 -n /dev/rbd0
>> # mount /dev/rbd0 /foo
>> # umount /foo
>> 
>> # file -s /dev/rbd0
>> # fsck.ext4 -n /dev/rbd0
>>
>> # rbd clone foo@snap bar
>> $ rbd map bar
>> /dev/rbd1
>> # file -s /dev/rbd1
>> # fsck.ext4 -n /dev/rbd1
>> # mount /dev/rbd1 /bar
>> # umount /bar
>> 
>> # file -s /dev/rbd1
>> # fsck.ext4 -n /dev/rbd1
>>
>> Could you please provide the output for the above?
>
> Here you go : http://paste.ubuntu.com/23173721/

OK, so that explains it: the frozen filesystem is "needs journal
recovery", so mounting it off of read-only block device leads to
errors.

root@alxc13:~# fsfreeze -f /var/lxc/c11579
root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test
/dev/rbd151
root@alxc13:~# fsfreeze -u /var/lxc/c11579
root@alxc13:~# file -s /dev/rbd151
/dev/rbd151: Linux rev 1.0 ext4 filesystem data (needs journal
recovery) (extents) (large files) (huge files)

Now, to isolate the problem, the easiest would probably be to try to
reproduce it with loop devices.  Can you try dding one of these images
to a file, make sure that the filesystem is clean, losetup + mount,
freeze, make a "snapshot" with cp and losetup -r + mount?

Try sticking file -s before unfreeze and also compare md5sums:

root@alxc13:~# fsfreeze -f /var/lxc/c11579

root@alxc13:~# rbd snap create rbd/c11579@snap_test
root@alxc13:~# rbd map c11579@snap_test


root@alxc13:~# file -s /dev/rbd151
root@alxc13:~# fsfreeze -u /var/lxc/c11579


root@alxc13:~# file -s /dev/rbd151

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov


On 09/13/2016 04:30 PM, Ilya Dryomov wrote:
[SNIP]
> 
> Hmm, it could be about whether it is able to do journal replay on
> mount.  When you mount a snapshot, you get a read-only block device;
> when you mount a clone image, you get a read-write block device.
> 
> Let's try this again, suppose image is foo and snapshot is snap:
> 
> # fsfreeze -f /mnt
> 
> # rbd snap create foo@snap
> # rbd map foo@snap
> /dev/rbd0
> # file -s /dev/rbd0
> # fsck.ext4 -n /dev/rbd0
> # mount /dev/rbd0 /foo
> # umount /foo
> 
> # file -s /dev/rbd0
> # fsck.ext4 -n /dev/rbd0
> 
> # rbd clone foo@snap bar
> $ rbd map bar
> /dev/rbd1
> # file -s /dev/rbd1
> # fsck.ext4 -n /dev/rbd1
> # mount /dev/rbd1 /bar
> # umount /bar
> 
> # file -s /dev/rbd1
> # fsck.ext4 -n /dev/rbd1
> 
> Could you please provide the output for the above?

Here you go : http://paste.ubuntu.com/23173721/


[SNIP]

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 1:59 PM, Nikolay Borisov  wrote:
>
>
> On 09/13/2016 01:33 PM, Ilya Dryomov wrote:
>> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
>>> Hello list,
>>>
>>>
>>> I have the following cluster:
>>>
>>> ceph status
>>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>>>  health HEALTH_OK
>>>  monmap e2: 5 mons at 
>>> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
>>> election epoch 196, quorum 0,1,2,3,4 
>>> alxc10,alxc5,alxc6,alxc7,alxc11
>>>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>>>  osdmap e11243: 50 osds: 50 up, 50 in
>>>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
>>> 4323 GB used, 85071 GB / 89424 GB avail
>>> 8192 active+clean
>>>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>>>
>>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
>>> and kernel 4.4.14
>>>
>>> I have multiple rbd devices which are used as the root for lxc-based 
>>> containers and have ext4. At some point I want
>>> to create a an rbd snapshot, for this the sequence of operations I do is 
>>> thus:
>>>
>>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted
>>
>> fsfreeze?
>
> Yes, indeed, my bad.
>
>>
>>>
>>> 2. rbd snap create 
>>> "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>>>
>>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>>>
>>> <= At this point normal container operation continues =>
>>>
>>> 4. Mount the newly created snapshot to a 2nd location as read-only and 
>>> rsync the files from it to a remote server.
>>>
>>> However as I start rsyncing stuff to the remote server then certain files 
>>> in the snapshot are reported as corrupted.
>>
>> Can you share some dmesg snippets?  Is there a pattern - the same
>> file/set of files, etc?
>
> [1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #52269: comm rsync: deleted inode referenced: 46393
> [1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718060.045246] rbd: rbd143: write 1000 at 0 result -30
> [1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0
> [1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #385038: comm rsync: deleted inode referenced: 46581
> [1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.404739] rbd: rbd143: write 1000 at 0 result -30
> [1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.419844] rbd: rbd143: write 1000 at 0 result -30
> [1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.421441] rbd: rbd143: write 1000 at 0 result -30
> [1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #52269: comm rsync: deleted inode referenced: 46393
> [1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718071.543680] rbd: rbd143: write 1000 at 0 result -30
> [1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0
> [1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #385038: comm rsync: deleted inode referenced: 46581
> [1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718083.389324] rbd: rbd143: write 1000 at 0 result -30
> [1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0
> [1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync 
> page write
> [1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
> #769039: comm rsync: deleted inode referenced: 410848
> [1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected
> [1718083.404581] rbd: rbd143: write 1000 at 0 result -30
> [1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0
> [1718083.404816] Buffer I/O error on dev 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Nikolay Borisov


On 09/13/2016 01:33 PM, Ilya Dryomov wrote:
> On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
>> Hello list,
>>
>>
>> I have the following cluster:
>>
>> ceph status
>> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>>  health HEALTH_OK
>>  monmap e2: 5 mons at 
>> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
>> election epoch 196, quorum 0,1,2,3,4 
>> alxc10,alxc5,alxc6,alxc7,alxc11
>>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>>  osdmap e11243: 50 osds: 50 up, 50 in
>>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
>> 4323 GB used, 85071 GB / 89424 GB avail
>> 8192 active+clean
>>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>>
>> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
>> and kernel 4.4.14
>>
>> I have multiple rbd devices which are used as the root for lxc-based 
>> containers and have ext4. At some point I want
>> to create a an rbd snapshot, for this the sequence of operations I do is 
>> thus:
>>
>> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted
> 
> fsfreeze?

Yes, indeed, my bad. 

> 
>>
>> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>>
>> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>>
>> <= At this point normal container operation continues =>
>>
>> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync 
>> the files from it to a remote server.
>>
>> However as I start rsyncing stuff to the remote server then certain files in 
>> the snapshot are reported as corrupted.
> 
> Can you share some dmesg snippets?  Is there a pattern - the same
> file/set of files, etc?

[1718059.910038] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718060.044540] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: 
comm rsync: deleted inode referenced: 46393
[1718060.044978] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718060.045246] rbd: rbd143: write 1000 at 0 result -30
[1718060.045249] blk_update_request: I/O error, dev rbd143, sector 0
[1718060.045487] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.404057] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#385038: comm rsync: deleted inode referenced: 46581
[1718071.404466] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.404739] rbd: rbd143: write 1000 at 0 result -30
[1718071.404742] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.404999] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.419172] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718071.419575] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.419844] rbd: rbd143: write 1000 at 0 result -30
[1718071.419847] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.420081] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.420758] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718071.421196] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.421441] rbd: rbd143: write 1000 at 0 result -30
[1718071.421443] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.421671] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718071.543020] EXT4-fs error (device rbd143): ext4_lookup:1584: inode #52269: 
comm rsync: deleted inode referenced: 46393
[1718071.543422] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718071.543680] rbd: rbd143: write 1000 at 0 result -30
[1718071.543682] blk_update_request: I/O error, dev rbd143, sector 0
[1718071.543945] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.388635] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#385038: comm rsync: deleted inode referenced: 46581
[1718083.389060] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718083.389324] rbd: rbd143: write 1000 at 0 result -30
[1718083.389327] blk_update_request: I/O error, dev rbd143, sector 0
[1718083.389561] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.403910] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718083.404319] EXT4-fs (rbd143): previous I/O error to superblock detected
[1718083.404581] rbd: rbd143: write 1000 at 0 result -30
[1718083.404583] blk_update_request: I/O error, dev rbd143, sector 0
[1718083.404816] Buffer I/O error on dev rbd143, logical block 0, lost sync 
page write
[1718083.405484] EXT4-fs error (device rbd143): ext4_lookup:1584: inode 
#769039: comm rsync: deleted inode referenced: 410848
[1718083.405893] EXT4-fs (rbd143): previous I/O error to 

Re: [ceph-users] Consistency problems when taking RBD snapshot

2016-09-13 Thread Ilya Dryomov
On Tue, Sep 13, 2016 at 12:08 PM, Nikolay Borisov  wrote:
> Hello list,
>
>
> I have the following cluster:
>
> ceph status
> cluster a2fba9c1-4ca2-46d8-8717-a8e42db14bb0
>  health HEALTH_OK
>  monmap e2: 5 mons at 
> {alxc10=x:6789/0,alxc11=x:6789/0,alxc5=x:6789/0,alxc6=:6789/0,alxc7=x:6789/0}
> election epoch 196, quorum 0,1,2,3,4 
> alxc10,alxc5,alxc6,alxc7,alxc11
>  mdsmap e797: 1/1/1 up {0=alxc11.=up:active}, 2 up:standby
>  osdmap e11243: 50 osds: 50 up, 50 in
>   pgmap v3563774: 8192 pgs, 3 pools, 1954 GB data, 972 kobjects
> 4323 GB used, 85071 GB / 89424 GB avail
> 8192 active+clean
>   client io 168 MB/s rd, 11629 kB/s wr, 3447 op/s
>
> It's running ceph version 0.94.7 (d56bdf93ced6b80b07397d57e3fa68fe68304432) 
> and kernel 4.4.14
>
> I have multiple rbd devices which are used as the root for lxc-based 
> containers and have ext4. At some point I want
> to create a an rbd snapshot, for this the sequence of operations I do is thus:
>
> 1. freezefs -f /path/to/where/ext4-ontop-of-rbd-is-mounted

fsfreeze?

>
> 2. rbd snap create "${CEPH_POOL_NAME}/${name-of-blockdev}@${name-of-snapshot}
>
> 3. freezefs -u /path/to/where/ext4-ontop-of-rbd-is-mounted
>
> <= At this point normal container operation continues =>
>
> 4. Mount the newly created snapshot to a 2nd location as read-only and rsync 
> the files from it to a remote server.
>
> However as I start rsyncing stuff to the remote server then certain files in 
> the snapshot are reported as corrupted.

Can you share some dmesg snippets?  Is there a pattern - the same
file/set of files, etc?

>
> freezefs implies filesystem syncing I also tested with manually doing 
> sync/syncfs on the fs which is being snapshot. Before
> and after the freezefs and the corruption is still present. So it's unlikely 
> there are dirty buffers in the page cache.
> I'm using the kernel rbd driver for the clients. The theory currently is 
> there are some caches which are not being flushed,
> other than the linux page cache. Reading the doc implies that only librbd is 
> using separate caching but I'm not using librbd.

What happens if you run fsck -n on the snapshot (ro mapping)?

What happens if you run clone from the snapshot and run fsck (rw
mapping)?

What happens if you mount the clone without running fsck and run rsync?

Can you try taking more than one snapshot and then compare them?

Thanks,

Ilya
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com