On 8/4/23 11:18 PM, Jeff Moyer wrote:
> Hi, Aneesh,
> 
> "Aneesh Kumar K.V" <[email protected]> writes:
> 
>> On architectures that have different page size values used for kernel
>> direct mapping and userspace mappings, the user can end up creating 
>> zero-sized
>> namespaces as shown below
>>
>> :/sys/bus/nd/devices/region1# cat align
>> 0x1000000
>> /sys/bus/nd/devices/region1# echo 0x200000 > align
>> /sys/bus/nd/devices/region1/dax1.0# cat supported_alignments
>> 65536 16777216
>>  $ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K
>> {
>>   "dev":"namespace1.0",
>>   "mode":"devdax",
>>   "map":"dev",
>>   "size":0,
>>   "uuid":"3094329a-0c66-4905-847e-357223e56ab0",
>>   "daxregion":{
>>     "id":1,
>>     "size":0,
>>     "align":65536
>>   },
>>   "align":65536
>> }
>> similarily for fsdax
>>
>>  $ ndctl create-namespace -r region1 -m fsdax  -s 18M --align 64K
>> {
>>   "dev":"namespace1.0",
>>   "mode":"fsdax",
>>   "map":"dev",
>>   "size":0,
>>   "uuid":"45538a6f-dec7-405d-b1da-2a4075e06232",
>>   "sector_size":512,
>>   "align":65536,
>>   "blockdev":"pmem1"
>> }
> 
> Just curious, but have you seen this in practice?  It seems like an odd
> thing to do.
> 

This was identified while writing new test cases for region alignment update.


>> In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce 
>> memremap_compat_align()")
>> memremap_compat_align was added to make sure the kernel always creates
>> namespaces with 16MB alignment. But the user can still override the
>> region alignment and no input validation is done in the region alignment
>> values to retain the flexibility user had before. However, the kernel
>> ensures that only part of the namespace that can be mapped via kernel
>> direct mapping page size is enabled. This is achieved by tracking the
>> unmapped part of the namespace in pfn_sb->end_trunc. The kernel also
>> ensures that the start address of the namespace is also aligned to the
>> kernel direct mapping page size.
>>
>> Depending on the user request, the kernel implements userspace mapping
>> alignment by updating pfn device alignment attribute and this value is
>> used to adjust the start address for userspace mappings. This is tracked
>> in pfn_sb->dataoff. Hence the available size for userspace mapping is:
>>
>> usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff
>>
>> If the kernel finds the user mapping size zero then don't allow the
>> creation of namespace.
>>
>> After fix:
>> $ ndctl create-namespace -f  -r region1 -m devdax  -s 18M --align 64K
>> libndctl: ndctl_dax_enable: dax1.1: failed to enable
>>   Error: namespace1.2: failed to enable
>>
>> failed to create namespace: No such device or address
>>
>> And existing zero sized namespace will be marked disabled.
>> root@ltczz75-lp2:/home/kvaneesh# ndctl  list -N -r region1 -i
>> [
>>   {
>>     "dev":"namespace1.0",
>>     "mode":"raw",
>>     "size":18874368,
>>     "uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181",
>>     "sector_size":512,
>>     "state":"disabled"
>>   },
> 
> Thank you for providing examples of the command output before and after
> the change.  I appreciate that.
> 
>>
>> Signed-off-by: Aneesh Kumar K.V <[email protected]>
>> ---
>>  drivers/nvdimm/pfn_devs.c | 10 ++++++++--
>>  1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index af7d9301520c..36b904a129b9 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -453,7 +453,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
>> *sig)
>>      struct resource *res;
>>      enum nd_pfn_mode mode;
>>      struct nd_namespace_io *nsio;
>> -    unsigned long align, start_pad;
>> +    unsigned long align, start_pad, end_trunc;
>>      struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
>>      struct nd_namespace_common *ndns = nd_pfn->ndns;
>>      const uuid_t *parent_uuid = nd_dev_to_uuid(&ndns->dev);
>> @@ -503,6 +503,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
>> *sig)
>>      align = le32_to_cpu(pfn_sb->align);
>>      offset = le64_to_cpu(pfn_sb->dataoff);
>>      start_pad = le32_to_cpu(pfn_sb->start_pad);
>> +    end_trunc = le32_to_cpu(pfn_sb->end_trunc);
>>      if (align == 0)
>>              align = 1UL << ilog2(offset);
>>      mode = le32_to_cpu(pfn_sb->mode);
>> @@ -610,6 +611,10 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char 
>> *sig)
>>              return -EOPNOTSUPP;
>>      }
>>  
>> +    if (offset >= (res->end - res->start + 1 - start_pad - end_trunc)) {
>                        ^^^^^^^^^^^^^^^^^^^^^^^^^ That's what
> resource_size(res) does.  It might be better to create a local variable
> 'size' to hold that, as there are now two instances of that in the
> function.


Will update. 

> 
>> +            dev_err(&nd_pfn->dev, "bad offset with small namespace\n");
>> +            return -EOPNOTSUPP;
>> +    }
>>      return 0;
>>  }
>>  EXPORT_SYMBOL(nd_pfn_validate);
>> @@ -810,7 +815,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>>      else
>>              return -ENXIO;
>>  
>> -    if (offset >= size) {
>> +    if (offset >= (size - end_trunc)) {
>> +            /* This implies we result in zero size devices */
>>              dev_err(&nd_pfn->dev, "%s unable to satisfy requested 
>> alignment\n",
>>                              dev_name(&ndns->dev));
>>              return -ENXIO;
> 
> Functionally, this looks good to me.
> 
> Cheers,
> Jeff
> 

Thanks for reviewing the patch.
-aneesh

Reply via email to