On 6/7/21 9:47 PM, Joao Martins wrote:
> On 6/7/21 9:17 PM, Dan Williams wrote:
>> On Tue, May 18, 2021 at 10:28 AM Joao Martins <[email protected]>
>> wrote:
>>>
>>> On 5/5/21 11:36 PM, Joao Martins wrote:
>>>> On 5/5/21 11:20 PM, Dan Williams wrote:
>>>>> On Wed, May 5, 2021 at 12:50 PM Joao Martins <[email protected]>
>>>>> wrote:
>>>>>> On 5/5/21 7:44 PM, Dan Williams wrote:
>>>>>>> On Thu, Mar 25, 2021 at 4:10 PM Joao Martins
>>>>>>> <[email protected]> wrote:
>>>>>>>> diff --git a/include/linux/memremap.h b/include/linux/memremap.h
>>>>>>>> index b46f63dcaed3..bb28d82dda5e 100644
>>>>>>>> --- a/include/linux/memremap.h
>>>>>>>> +++ b/include/linux/memremap.h
>>>>>>>> @@ -114,6 +114,7 @@ struct dev_pagemap {
>>>>>>>> struct completion done;
>>>>>>>> enum memory_type type;
>>>>>>>> unsigned int flags;
>>>>>>>> + unsigned long align;
>>>>>>>
>>>>>>> I think this wants some kernel-doc above to indicate that non-zero
>>>>>>> means "use compound pages with tail-page dedup" and zero / PAGE_SIZE
>>>>>>> means "use non-compound base pages".
>>>
>>> [...]
>>>
>>>>>>> The non-zero value must be
>>>>>>> PAGE_SIZE, PMD_PAGE_SIZE or PUD_PAGE_SIZE.
>>>>>>> Hmm, maybe it should be an
>>>>>>> enum:
>>>>>>>
>>>>>>> enum devmap_geometry {
>>>>>>> DEVMAP_PTE,
>>>>>>> DEVMAP_PMD,
>>>>>>> DEVMAP_PUD,
>>>>>>> }
>>>>>>>
>>>>>> I suppose a converter between devmap_geometry and page_size would be
>>>>>> needed too? And maybe
>>>>>> the whole dax/nvdimm align values change meanwhile (as a followup
>>>>>> improvement)?
>>>>>
>>>>> I think it is ok for dax/nvdimm to continue to maintain their align
>>>>> value because it should be ok to have 4MB align if the device really
>>>>> wanted. However, when it goes to map that alignment with
>>>>> memremap_pages() it can pick a mode. For example, it's already the
>>>>> case that dax->align == 1GB is mapped with DEVMAP_PTE today, so
>>>>> they're already separate concepts that can stay separate.
>>>>>
>>>> Gotcha.
>>>
>>> I am reconsidering part of the above. In general, yes, the meaning of
>>> devmap @align
>>> represents a slightly different variation of the device @align i.e. how the
>>> metadata is
>>> laid out **but** regardless of what kind of page table entries we use
>>> vmemmap.
>>>
>>> By using DEVMAP_PTE/PMD/PUD we might end up 1) duplicating what nvdimm/dax
>>> already
>>> validates in terms of allowed device @align values (i.e. PAGE_SIZE,
>>> PMD_SIZE and PUD_SIZE)
>>> 2) the geometry of metadata is very much tied to the value we pick to
>>> @align at namespace
>>> provisioning -- not the "align" we might use at mmap() perhaps that's what
>>> you referred
>>> above? -- and 3) the value of geometry actually derives from dax device
>>> @align because we
>>> will need to create compound pages representing a page size of @align value.
>>>
>>> Using your example above: you're saying that dax->align == 1G is mapped
>>> with DEVMAP_PTEs,
>>> in reality the vmemmap is populated with PMDs/PUDs page tables (depending
>>> on what archs
>>> decide to do at vmemmap_populate()) and uses base pages as its metadata
>>> regardless of what
>>> device @align. In reality what we want to convey in @geometry is not page
>>> table sizes, but
>>> just the page size used for the vmemmap of the dax device.
>>
>> Good point, the names "PTE, PMD, PUD" imply the hardware mapping size,
>> not the software compound page size.
>>
>>> Additionally, limiting its
>>> value might not be desirable... if tomorrow Linux for some arch supports
>>> dax/nvdimm
>>> devices with 4M align or 64K align, the value of @geometry will have to
>>> reflect the 4M to
>>> create compound pages of order 10 for the said vmemmap.
>>>
>>> I am going to wait until you finish reviewing the remaining four patches of
>>> this series,
>>> but maybe this is a simple misnomer (s/align/geometry/) with a comment but
>>> without
>>> DEVMAP_{PTE,PMD,PUD} enum part? Or perhaps its own struct with a value and
>>> enum a
>>> setter/getter to audit its value? Thoughts?
>>
>> I do see what you mean about the confusion DEVMAP_{PTE,PMD,PUD}
>> introduces, but I still think the device-dax align and the
>> organization of the 'struct page' metadata are distinct concepts. So
>> I'm happy with any color of the bikeshed as long as the 2 concepts are
>> distinct. How about calling it "compound_page_order"? Open to other
>> ideas...
>>
> I actually like the name of @geometry. The only thing better would be
> @vmemmap_geometry
> solely because it makes it clear that its the vmemmap that we are talking
> about -- but
> might be unnecssarily verbose. And I still agree that is separate concept
> that should be
> named differently *at least*.
>
> But naming aside, I was trying to get at was to avoid a second geometry value
> validation
> i.e. to be validated the value and set with a value such as DEVMAP_PTE,
> DEVMAP_PMD and
> DEVMAP_PUD.
Sorry my english keeps getting broken, I meant this instead:
But naming aside, what I am trying to get at is to remove the second geometry
value
validation i.e. for @geometry to not be validated a second time to be set to
DEVMAP_PTE,
DEVMAP_PMD or DEVMAP_PUD.
> That to me sounds a little redundant, when the geometry value depends on what
> align is going to be used from. Here my metnion of @align refers to what's
> used to create
> the dax device, not the mmap() align [which can be lower than the device
> one]. The dax
> device align is the one used to decide whether to use PTEs, PMDs or PUDs at
> dax fault handler.
>
> So separate concepts, but still its value dependent on one another. At least
> unless we
> want to allow geometry values different than those set by --align as Jane
> suggested.
>
And I should add:
I can maintain the DEVMAP_* enum values, but then these will need to be changed
in tandem
anytime a new @align value is supported. Or instead we use the name @geometry
albeit with
still as an unsigned long type . Or rather than an unsigned long perhaps making
another
type and its value obtained/changed with getter/setter.