Hi Ashish, On 08/31/2016 07:17 AM, Ashish Samant wrote: > Hi Eric, > > I am able to reproduce this on 4.8.0-rc3 as well. Can you try again and issue > a sync > between fallocate and dd?
It works! ocfs2dev2 is not patched: ==================== ocfs2dev2:/mnt/ocfs2 # reflink -f 10MBfile reflnktest ocfs2dev2:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest ocfs2dev2:/mnt/ocfs2 # sync ocfs2dev2:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0936888 s, 11.2 MB/s 00100000 while ocfs2dev1 is patched: ====================== ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile wrote 10485760/10485760 bytes at offset 0 10 MiB, 2560 ops; 0.0000 sec (183.137 MiB/sec and 46883.0122 ops/sec) ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest ocfs2dev1:/mnt/ocfs2 # sync ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd |................| * 1+0 records in 1+0 records out 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0933082 s, 11.2 MB/s 00100000 > > On 08/30/2016 12:38 AM, Eric Ren wrote: >> Hi, >> >> I'm on 4.8.0-rc3 kernel. Hope someone else can double-confirm this;-) >> >> On 08/30/2016 12:11 PM, Ashish Samant wrote: >>> Hmm, thats weird. I see this on 4.7 kernel without the patch: >>> >>> # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>> wrote 10485760/10485760 bytes at offset 0 >>> 10 MiB, 2560 ops; 0.0000 sec (683.995 MiB/sec and 175102.5992 ops/sec) >>> # reflink -f 10MBfile reflnktest >>> # fallocate -p -o 0 -l 1048615 reflnktest >>> # dd if=10MBfile iflag=direct bs=1048576 count=1 | hexdump -C >>> 00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >>> |................| >>> * >>> 1+0 records in >>> 1+0 records out >>> 1048576 bytes (1.0 MB) copied, 0.0321517 s, 32.6 MB/s >>> 00100000 >>> >>> and with patch >>> ---- >>> # dd if=10MBfile iflag=direct bs=1M count=1 | hexdump -C >>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >>> |................| >> >> I'm not familiar with this code. So why is the output "cd ..."? because we >> didn't write >> anything >> into "10MBfile". Is it a magic number when reading from a hole? > No, "cd" is what xfs_io wrote into the file. Those are the original contents > of the file > which are overwritten by 0 in the first cluster because of this bug. Ah, gotcha, thanks! Eric > > Thanks, > Ashish >> >> Eric >> >>> * >>> 1+0 records in >>> 1+0 records out >>> 00100000 >> >> >> >>> >>> Thanks, >>> Ashish >>> >>> >>> On 08/29/2016 08:33 PM, Eric Ren wrote: >>>> Hello, >>>> >>>> On 08/30/2016 03:23 AM, Ashish Samant wrote: >>>>> Hi Eric, >>>>> >>>>> The easiest way to reproduce this is : >>>>> >>>>> 1. Create a random file of say 10 MB >>>>> xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>>>> 2. Reflink it >>>>> reflink -f 10MBfile reflnktest >>>>> 3. Punch a hole at starting at cluster boundary with range greater that >>>>> 1MB. You can >>>>> also use a range that will put the end offset in another extent. >>>>> fallocate -p -o 0 -l 1048615 reflnktest >>>>> 4. sync >>>>> 5. Check the first cluster in the source file. (It will be zeroed out). >>>>> dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C >>>> >>>> Thanks! I have a try myself, but I'm not sure what is our expected output >>>> and if the >>>> test result meet >>>> it: >>>> >>>> 1. After applying this patch: >>>> ocfs2dev1:/mnt/ocfs2 # rm 10MBfile reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile >>>> wrote 10485760/10485760 bytes at offset 0 >>>> 10 MiB, 2560 ops; 0.0000 sec (1.089 GiB/sec and 285427.5839 ops/sec) >>>> ocfs2dev1:/mnt/ocfs2 # reflink -f 10MBfile reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # fallocate -p -o 0 -l 1048615 reflnktest >>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | >>>> hexdump -C >>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >>>> |................| >>>> * >>>> 1+0 records in >>>> 1+0 records out >>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0952464 s, 11.0 MB/s >>>> 00100000 >>>> >>>> 2. Before this patch: >>>> .... >>>> ocfs2dev1:/mnt/ocfs2 # dd if=10MBfile iflag=direct bs=1048576 count=1 | >>>> hexdump -C >>>> 00000000 cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd cd >>>> |................| >>>> * >>>> 1+0 records in >>>> 1+0 records out >>>> 1048576 bytes (1.0 MB, 1.0 MiB) copied, 0.0954648 s, 11.0 MB/s >>>> 00100000 >>>> >>>> 3. debugfs.ocfs2 -R stats /dev/sdb >>>> ... >>>> Block Size Bits: 12 Cluster Size Bits: 20 >>>> ... >>>> >>>> Eric >>>>> >>>>> Thanks, >>>>> Ashish >>>>> >>>>> On 08/28/2016 10:39 PM, Eric Ren wrote: >>>>>> Hi, >>>>>> >>>>>> Thanks for this fix. I'd like to reproduce this issue locally and test >>>>>> this patch, >>>>>> could you elaborate the detailed steps of reproduction? >>>>>> >>>>>> Thanks, >>>>>> Eric >>>>>> >>>>>> On 08/27/2016 07:04 AM, Ashish Samant wrote: >>>>>>> If we punch a hole on a reflink such that following conditions are met: >>>>>>> >>>>>>> 1. start offset is on a cluster boundary >>>>>>> 2. end offset is not on a cluster boundary >>>>>>> 3. (end offset is somewhere in another extent) or >>>>>>> (hole range > MAX_CONTIG_BYTES(1MB)), >>>>>>> >>>>>>> we dont COW the first cluster starting at the start offset. But in this >>>>>>> case, we were wrongly passing this cluster to >>>>>>> ocfs2_zero_range_for_truncate() to zero out. This will modify the >>>>>>> cluster >>>>>>> in place and zero it in the source too. >>>>>>> >>>>>>> Fix this by skipping this cluster in such a scenario. >>>>>>> >>>>>>> Reported-by: Saar Maoz <saar.m...@oracle.com> >>>>>>> Signed-off-by: Ashish Samant <ashish.sam...@oracle.com> >>>>>>> Reviewed-by: Srinivas Eeda <srinivas.e...@oracle.com> >>>>>>> --- >>>>>>> v1->v2: >>>>>>> -Changed the commit msg to include a better and generic description of >>>>>>> the problem, for all cluster sizes. >>>>>>> -Added Reported-by and Reviewed-by tags. >>>>>>> fs/ocfs2/file.c | 34 ++++++++++++++++++++++++---------- >>>>>>> 1 file changed, 24 insertions(+), 10 deletions(-) >>>>>>> >>>>>>> diff --git a/fs/ocfs2/file.c b/fs/ocfs2/file.c >>>>>>> index 4e7b0dc..0b055bf 100644 >>>>>>> --- a/fs/ocfs2/file.c >>>>>>> +++ b/fs/ocfs2/file.c >>>>>>> @@ -1506,7 +1506,8 @@ static int ocfs2_zero_partial_clusters(struct >>>>>>> inode *inode, >>>>>>> u64 start, u64 len) >>>>>>> { >>>>>>> int ret = 0; >>>>>>> - u64 tmpend, end = start + len; >>>>>>> + u64 tmpend = 0; >>>>>>> + u64 end = start + len; >>>>>>> struct ocfs2_super *osb = OCFS2_SB(inode->i_sb); >>>>>>> unsigned int csize = osb->s_clustersize; >>>>>>> handle_t *handle; >>>>>>> @@ -1538,18 +1539,31 @@ static int ocfs2_zero_partial_clusters(struct >>>>>>> inode *inode, >>>>>>> } >>>>>>> /* >>>>>>> - * We want to get the byte offset of the end of the 1st cluster. >>>>>>> + * If start is on a cluster boundary and end is somewhere in >>>>>>> another >>>>>>> + * cluster, we have not COWed the cluster starting at start, unless >>>>>>> + * end is also within the same cluster. So, in this case, we skip >>>>>>> this >>>>>>> + * first call to ocfs2_zero_range_for_truncate() truncate and move >>>>>>> on >>>>>>> + * to the next one. >>>>>>> */ >>>>>>> - tmpend = (u64)osb->s_clustersize + (start & ~(osb->s_clustersize - >>>>>>> 1)); >>>>>>> - if (tmpend > end) >>>>>>> - tmpend = end; >>>>>>> + if ((start & (csize - 1)) != 0) { >>>>>>> + /* >>>>>>> + * We want to get the byte offset of the end of the 1st >>>>>>> + * cluster. >>>>>>> + */ >>>>>>> + tmpend = (u64)osb->s_clustersize + >>>>>>> + (start & ~(osb->s_clustersize - 1)); >>>>>>> + if (tmpend > end) >>>>>>> + tmpend = end; >>>>>>> - trace_ocfs2_zero_partial_clusters_range1((unsigned long long)start, >>>>>>> - (unsigned long long)tmpend); >>>>>>> + trace_ocfs2_zero_partial_clusters_range1( >>>>>>> + (unsigned long long)start, >>>>>>> + (unsigned long long)tmpend); >>>>>>> - ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>>>> tmpend); >>>>>>> - if (ret) >>>>>>> - mlog_errno(ret); >>>>>>> + ret = ocfs2_zero_range_for_truncate(inode, handle, start, >>>>>>> + tmpend); >>>>>>> + if (ret) >>>>>>> + mlog_errno(ret); >>>>>>> + } >>>>>>> if (tmpend < end) { >>>>>>> /* >>>>>> >>>>>> >>>>> >>>> >>> >>> >> > > _______________________________________________ Ocfs2-devel mailing list Ocfs2-devel@oss.oracle.com https://oss.oracle.com/mailman/listinfo/ocfs2-devel