Re: [lkp] [ext4] 5405511e1a: ltp.acl_test01.fail]

2016-07-11 Thread Vegard Nossum

On 07/11/2016 05:15 AM, Theodore Ts'o wrote:

On Mon, Jul 11, 2016 at 09:59:54AM +0800, kernel test robot wrote:


FYI, we noticed the following commit:

https://github.com/0day-ci/linux 
Vegard-Nossum/ext4-validate-number-of-clusters-in-group/20160708-041426
commit 5405511e1a984ab644fa9e29a0d3d958b835ab75 ("ext4: validate number of meta 
clusters in group")

[...]


Vegard, I'm guessing you didn't have a chance to test your patch
before you sent it to the list?


I test all my patches against the failing test-case and a few other images.

This patch specifically I think was sent with an [RFC] tag which I
intended to signal that I'm *not* sure of the fix.

That said, I could do a better job of running more conventional fs tests
on my patches, so I'll incorporate xfstests into my workflow.


bit_max = ext4_num_clusters_in_group(sb, i);
if ((bit_max >> 3) >= sb->s_blocksize) {
ext4_msg(sb, KERN_WARNING, "clusters in "
  "group %u exceeds block size", i);
goto failed_mount;
}


This is the test which is failing, but it will fail by default on
pretty much all ext4 file systems, since by default there will be
32768 blocks (clusters) per group, with a 4k block size (and 32768 >>
3 == 4096).  And in the test that failed, this was a 1k block size
with 8192 blocks per blocks (and 8192 >> 3 == 1024).


Ugh, brain-o on my part. It should say > rather than >=, agreed?


Anyway, as I mentioned before, I'd much rather do very specific sanity
checking on superblock fields, instead of sanity checking calculated
values such as ext4_num_clusters_in_group().

Perhaps the easist thing to do is to run e2fsck -n on those file
systems that are causing problems?


The function (ext4_init_block_bitmap()) has even more problems than the
ones I reported to the list so far; ext4_block_bitmap(),
ext4_inode_bitmap(), and ext4_inode_table() may _also_ point outside the
buffer and cause random corruptions.

I'll try to come up with a new (and better tested) patch.

Thanks,


Vegard


Re: [lkp] [ext4] 5405511e1a: ltp.acl_test01.fail]

2016-07-10 Thread Ye Xiaolong
Hi, Theodore

On Sun, Jul 10, 2016 at 11:15:53PM -0400, Theodore Ts'o wrote:
>On Mon, Jul 11, 2016 at 09:59:54AM +0800, kernel test robot wrote:
>> 
>> FYI, we noticed the following commit:
>> 
>> https://github.com/0day-ci/linux 
>> Vegard-Nossum/ext4-validate-number-of-clusters-in-group/20160708-041426
>> commit 5405511e1a984ab644fa9e29a0d3d958b835ab75 ("ext4: validate number of 
>> meta clusters in group")
>
>Hi, many thanks for the report.  One question -- is there some way the
>reporting could be speeded up?  Vegard's patch was sent late on the
>7th, and it looks like it hit the 0day-ci git repo a few hours later
>on the 8th, and was run early in the morning Chinese time.  Was the
>delay until the 11th due to the weekend and the desire to have a human
>sanity check the result?  Is that because there was a concern that
>there would be too many false positives?

Yes, you're right, although we have auto report feature (which has much
less delay) for high confidence errors/regressions detected, we still need
human sanity check for other cases to avoid too many false positives, it
will introduce some delays especially in weekend, sorry for that. We'll
improve our auto report part to cover more sensible cases.

>
>Also, if there was some way the 0day robot report could include a
>In-Reply-To: header to the patch being sent to the mailing list (in
>this case: <577eb740.10...@oracle.com>), it would make this even
>*more* useful.
>

Good point! We'll add this feature later.

Thanks,
Xiaolong

>
>Vegard, I'm guessing you didn't have a chance to test your patch
>before you sent it to the list?
>
>   bit_max = ext4_num_clusters_in_group(sb, i);
>   if ((bit_max >> 3) >= sb->s_blocksize) {
>   ext4_msg(sb, KERN_WARNING, "clusters in "
> "group %u exceeds block size", i);
>   goto failed_mount;
>   }
>
>
>This is the test which is failing, but it will fail by default on
>pretty much all ext4 file systems, since by default there will be
>32768 blocks (clusters) per group, with a 4k block size (and 32768 >>
>3 == 4096).  And in the test that failed, this was a 1k block size
>with 8192 blocks per blocks (and 8192 >> 3 == 1024).
>
>Anyway, as I mentioned before, I'd much rather do very specific sanity
>checking on superblock fields, instead of sanity checking calculated
>values such as ext4_num_clusters_in_group().
>
>Perhaps the easist thing to do is to run e2fsck -n on those file
>systems that are causing problems?
>
>Cheers,
>
>   - Ted


Re: [lkp] [ext4] 5405511e1a: ltp.acl_test01.fail]

2016-07-10 Thread Theodore Ts'o
On Mon, Jul 11, 2016 at 09:59:54AM +0800, kernel test robot wrote:
> 
> FYI, we noticed the following commit:
> 
> https://github.com/0day-ci/linux 
> Vegard-Nossum/ext4-validate-number-of-clusters-in-group/20160708-041426
> commit 5405511e1a984ab644fa9e29a0d3d958b835ab75 ("ext4: validate number of 
> meta clusters in group")

Hi, many thanks for the report.  One question -- is there some way the
reporting could be speeded up?  Vegard's patch was sent late on the
7th, and it looks like it hit the 0day-ci git repo a few hours later
on the 8th, and was run early in the morning Chinese time.  Was the
delay until the 11th due to the weekend and the desire to have a human
sanity check the result?  Is that because there was a concern that
there would be too many false positives?

Also, if there was some way the 0day robot report could include a
In-Reply-To: header to the patch being sent to the mailing list (in
this case: <577eb740.10...@oracle.com>), it would make this even
*more* useful.


Vegard, I'm guessing you didn't have a chance to test your patch
before you sent it to the list?

bit_max = ext4_num_clusters_in_group(sb, i);
if ((bit_max >> 3) >= sb->s_blocksize) {
ext4_msg(sb, KERN_WARNING, "clusters in "
  "group %u exceeds block size", i);
goto failed_mount;
}


This is the test which is failing, but it will fail by default on
pretty much all ext4 file systems, since by default there will be
32768 blocks (clusters) per group, with a 4k block size (and 32768 >>
3 == 4096).  And in the test that failed, this was a 1k block size
with 8192 blocks per blocks (and 8192 >> 3 == 1024).

Anyway, as I mentioned before, I'd much rather do very specific sanity
checking on superblock fields, instead of sanity checking calculated
values such as ext4_num_clusters_in_group().

Perhaps the easist thing to do is to run e2fsck -n on those file
systems that are causing problems?

Cheers,

- Ted