On 8/4/23 11:18 PM, Jeff Moyer wrote:
> Hi, Aneesh,
>
> "Aneesh Kumar K.V" <[email protected]> writes:
>
>> On architectures that have different page size values used for kernel
>> direct mapping and userspace mappings, the user can end up creating
>> zero-sized
>> namespaces as shown below
>>
>> :/sys/bus/nd/devices/region1# cat align
>> 0x1000000
>> /sys/bus/nd/devices/region1# echo 0x200000 > align
>> /sys/bus/nd/devices/region1/dax1.0# cat supported_alignments
>> 65536 16777216
>> $ ndctl create-namespace -r region1 -m devdax -s 18M --align 64K
>> {
>> "dev":"namespace1.0",
>> "mode":"devdax",
>> "map":"dev",
>> "size":0,
>> "uuid":"3094329a-0c66-4905-847e-357223e56ab0",
>> "daxregion":{
>> "id":1,
>> "size":0,
>> "align":65536
>> },
>> "align":65536
>> }
>> similarily for fsdax
>>
>> $ ndctl create-namespace -r region1 -m fsdax -s 18M --align 64K
>> {
>> "dev":"namespace1.0",
>> "mode":"fsdax",
>> "map":"dev",
>> "size":0,
>> "uuid":"45538a6f-dec7-405d-b1da-2a4075e06232",
>> "sector_size":512,
>> "align":65536,
>> "blockdev":"pmem1"
>> }
>
> Just curious, but have you seen this in practice? It seems like an odd
> thing to do.
>
This was identified while writing new test cases for region alignment update.
>> In commit 9ffc1d19fc4a ("mm/memremap_pages: Introduce
>> memremap_compat_align()")
>> memremap_compat_align was added to make sure the kernel always creates
>> namespaces with 16MB alignment. But the user can still override the
>> region alignment and no input validation is done in the region alignment
>> values to retain the flexibility user had before. However, the kernel
>> ensures that only part of the namespace that can be mapped via kernel
>> direct mapping page size is enabled. This is achieved by tracking the
>> unmapped part of the namespace in pfn_sb->end_trunc. The kernel also
>> ensures that the start address of the namespace is also aligned to the
>> kernel direct mapping page size.
>>
>> Depending on the user request, the kernel implements userspace mapping
>> alignment by updating pfn device alignment attribute and this value is
>> used to adjust the start address for userspace mappings. This is tracked
>> in pfn_sb->dataoff. Hence the available size for userspace mapping is:
>>
>> usermapping_size = size of the namespace - pfn_sb->end_trun - pfn_sb->dataoff
>>
>> If the kernel finds the user mapping size zero then don't allow the
>> creation of namespace.
>>
>> After fix:
>> $ ndctl create-namespace -f -r region1 -m devdax -s 18M --align 64K
>> libndctl: ndctl_dax_enable: dax1.1: failed to enable
>> Error: namespace1.2: failed to enable
>>
>> failed to create namespace: No such device or address
>>
>> And existing zero sized namespace will be marked disabled.
>> root@ltczz75-lp2:/home/kvaneesh# ndctl list -N -r region1 -i
>> [
>> {
>> "dev":"namespace1.0",
>> "mode":"raw",
>> "size":18874368,
>> "uuid":"94a90fb0-8e78-4fb6-a759-ffc62f9fa181",
>> "sector_size":512,
>> "state":"disabled"
>> },
>
> Thank you for providing examples of the command output before and after
> the change. I appreciate that.
>
>>
>> Signed-off-by: Aneesh Kumar K.V <[email protected]>
>> ---
>> drivers/nvdimm/pfn_devs.c | 10 ++++++++--
>> 1 file changed, 8 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/nvdimm/pfn_devs.c b/drivers/nvdimm/pfn_devs.c
>> index af7d9301520c..36b904a129b9 100644
>> --- a/drivers/nvdimm/pfn_devs.c
>> +++ b/drivers/nvdimm/pfn_devs.c
>> @@ -453,7 +453,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
>> *sig)
>> struct resource *res;
>> enum nd_pfn_mode mode;
>> struct nd_namespace_io *nsio;
>> - unsigned long align, start_pad;
>> + unsigned long align, start_pad, end_trunc;
>> struct nd_pfn_sb *pfn_sb = nd_pfn->pfn_sb;
>> struct nd_namespace_common *ndns = nd_pfn->ndns;
>> const uuid_t *parent_uuid = nd_dev_to_uuid(&ndns->dev);
>> @@ -503,6 +503,7 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
>> *sig)
>> align = le32_to_cpu(pfn_sb->align);
>> offset = le64_to_cpu(pfn_sb->dataoff);
>> start_pad = le32_to_cpu(pfn_sb->start_pad);
>> + end_trunc = le32_to_cpu(pfn_sb->end_trunc);
>> if (align == 0)
>> align = 1UL << ilog2(offset);
>> mode = le32_to_cpu(pfn_sb->mode);
>> @@ -610,6 +611,10 @@ int nd_pfn_validate(struct nd_pfn *nd_pfn, const char
>> *sig)
>> return -EOPNOTSUPP;
>> }
>>
>> + if (offset >= (res->end - res->start + 1 - start_pad - end_trunc)) {
> ^^^^^^^^^^^^^^^^^^^^^^^^^ That's what
> resource_size(res) does. It might be better to create a local variable
> 'size' to hold that, as there are now two instances of that in the
> function.
Will update.
>
>> + dev_err(&nd_pfn->dev, "bad offset with small namespace\n");
>> + return -EOPNOTSUPP;
>> + }
>> return 0;
>> }
>> EXPORT_SYMBOL(nd_pfn_validate);
>> @@ -810,7 +815,8 @@ static int nd_pfn_init(struct nd_pfn *nd_pfn)
>> else
>> return -ENXIO;
>>
>> - if (offset >= size) {
>> + if (offset >= (size - end_trunc)) {
>> + /* This implies we result in zero size devices */
>> dev_err(&nd_pfn->dev, "%s unable to satisfy requested
>> alignment\n",
>> dev_name(&ndns->dev));
>> return -ENXIO;
>
> Functionally, this looks good to me.
>
> Cheers,
> Jeff
>
Thanks for reviewing the patch.
-aneesh