On Tue, Nov 29, 2016 at 05:54:46PM -0500, Tejun Heo wrote:
> Hello,
>
> On Tue, Nov 29, 2016 at 10:14:03AM -0800, Shaohua Li wrote:
> > What the patches do doesn't conflict what you are talking about. We need a
> > way
> > to detect if cgroups are idle or active. I think the problem is how to
>
Hello,
On Tue, Nov 29, 2016 at 10:14:03AM -0800, Shaohua Li wrote:
> What the patches do doesn't conflict what you are talking about. We need a way
> to detect if cgroups are idle or active. I think the problem is how to define
> 'active' and 'idle'. We must quantify the state. We could use:
> 1.
Hello,
On Tue, Nov 29, 2016 at 10:30:44AM -0800, Shaohua Li wrote:
> > As discussed separately, it might make more sense to just use the avg
> > of the closest bucket instead of trying to line-fit the buckets, but
> > it's an implementation detail and whatever which works is fine.
>
> that is sti
Changes from v1->v2
1) Removed work queues and call backs. The code now operates in
in a normal call chain fashion. Each opal command provides a
series of commands it needs to run. next() iterates through
the functions only calling the subsequent function once the
current has finished a
This patch adds the definitions and structures for the SED
Opal code.
Signed-off-by: Scott Bauer
Signed-off-by: Rafael Antognolli
---
include/linux/sed-opal.h | 57 ++
include/linux/sed.h | 85 +
include/uapi/linux/sed-opal.h
This patch implements the necessary logic to unlock a SED
enabled device coming back from an S3.
The patch also implements the ioctl handling from the block
layer.
Signed-off-by: Scott Bauer
Signed-off-by: Rafael Antognolli
---
drivers/nvme/host/core.c | 76
Signed-off-by: Scott Bauer
Signed-off-by: Rafael Antognolli
---
MAINTAINERS | 10 ++
1 file changed, 10 insertions(+)
diff --git a/MAINTAINERS b/MAINTAINERS
index 8d414840..929eba3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -10846,6 +10846,16 @@ L: linux-...@vger.kernel.org
S:
This patch implements the necessary logic to bring an Opal
enabled drive out of a factory-enabled into a working
Opal state.
This patch set also enables logic to save a password to
be replayed during a resume from suspend. The key can be
saved in the driver or in the Kernel's Key managment.
Signe
On 11/28/2016 02:38 PM, Matias Bjørling wrote:
Hi Jens,
A bunch of patches for 4.10 have been prepared.
Javier has been busy eliminating abstractions in the LightNVM
interface. Mainly killing generic nvm_block and nvm_lun, which
simplifies the locking mechanism within targets. He also added a c
On Tue, Nov 29, 2016 at 12:24:35PM -0500, Tejun Heo wrote:
> Hello, Shaohua.
>
> On Mon, Nov 14, 2016 at 02:22:20PM -0800, Shaohua Li wrote:
> > To do this, we sample some data, eg, average latency for request size
> > 4k, 8k, 16k, 32k, 64k. We then use an equation f(x) = a * x + b to fit
> > the
On Tue, Nov 29, 2016 at 12:31:08PM -0500, Tejun Heo wrote:
> Hello,
>
> On Mon, Nov 14, 2016 at 02:22:22PM -0800, Shaohua Li wrote:
> > One hard problem adding .high limit is to detect idle cgroup. If one
> > cgroup doesn't dispatch enough IO against its high limit, we must have a
> > mechanism to
Hello,
On Mon, Nov 14, 2016 at 02:22:22PM -0800, Shaohua Li wrote:
> One hard problem adding .high limit is to detect idle cgroup. If one
> cgroup doesn't dispatch enough IO against its high limit, we must have a
> mechanism to determine if other cgroups dispatch more IO. We added the
> think time
Hello, Shaohua.
On Mon, Nov 14, 2016 at 02:22:20PM -0800, Shaohua Li wrote:
> To do this, we sample some data, eg, average latency for request size
> 4k, 8k, 16k, 32k, 64k. We then use an equation f(x) = a * x + b to fit
> the data (x is request size in KB, f(x) is the latency). Then we can use
>
Hello, Shaohua.
On Mon, Nov 28, 2016 at 03:10:18PM -0800, Shaohua Li wrote:
> > But we can increase sharing by upping the target latency. That should
> > be the main knob - if low, the user wants stricter service guarantee
> > at the cost of lower overall utilization; if high, the workload can
>
On 11/28/2016 10:01 AM, Gabriel Krisman Bertazi wrote:
Sorry for the dup. Missed linux-block address.
8
After commit 287922eb0b18 ("block: defer timeouts to a workqueue"),
deleting the timeout work after freezing the queue shouldn't be
necessary, since the synchronization is already enforced
This reverts commit 356e1c23292a4f63cfdf1daf0e0ddada51f32de8.
After conversion of huge tmpfs to multi-order entries, we don't need
this anymore.
Signed-off-by: Kirill A. Shutemov
---
include/linux/radix-tree.h | 1 -
lib/radix-tree.c | 74 -
Slab pages can be compound, but we shouldn't threat them as THP for
pupose of hpage_* helpers, otherwise it would lead to confusing results.
For instance, ext4 uses slab pages for journal pages and we shouldn't
confuse them with THPs. The easiest way is to exclude them in hpage_*
helpers.
Signed-
Do not assume length of bio segment is never larger than PAGE_SIZE.
With huge pages it's HPAGE_PMD_SIZE (2M on x86-64).
Signed-off-by: Kirill A. Shutemov
---
drivers/block/brd.c | 17 -
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/block/brd.c b/drivers/b
Introduce new helpers which return size/mask of the page:
HPAGE_PMD_SIZE/HPAGE_PMD_MASK if the page is PageTransHuge() and
PAGE_SIZE/PAGE_MASK otherwise.
Signed-off-by: Kirill A. Shutemov
---
include/linux/huge_mm.h | 16
1 file changed, 16 insertions(+)
diff --git a/include/li
Let's add FileHugePages and FilePmdMapped fields into meminfo and smaps.
It indicates how many times we allocate and map file THP.
Signed-off-by: Kirill A. Shutemov
---
drivers/base/node.c| 6 ++
fs/proc/meminfo.c | 4
fs/proc/task_mmu.c | 5 -
include/linux/mmzone.h
For filesystems that wants to be write-notified (has mkwrite), we will
encount write-protection faults for huge PMDs in shared mappings.
The easiest way to handle them is to clear the PMD and let it refault as
wriable.
Signed-off-by: Kirill A. Shutemov
Reviewed-by: Jan Kara
---
mm/memory.c | 1
For huge pages we need to unmap whole range covered by the huge page.
Signed-off-by: Kirill A. Shutemov
---
mm/truncate.c | 23 ++-
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/mm/truncate.c b/mm/truncate.c
index d2d95f283ec3..6df4b06a190f 100644
--- a/mm/tr
Adjust check on whether part of the page beyond file size and apply
compound_head() and page_mapping() where appropriate.
Signed-off-by: Kirill A. Shutemov
---
fs/buffer.c | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/buffer.c b/fs/buffer.c
index 7d333621ccfb.
For huge pages 'stop' must be within HPAGE_PMD_SIZE.
Let's use hpage_size() in the BUG_ON().
We also need to change how we calculate lblk for cluster deallocation.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/fs/e
We would need to use multi-order radix-tree entires for ext4 and other
filesystems to have coherent view on tags (dirty/towrite) in the tree.
This patch converts huge tmpfs implementation to multi-order entries, so
we will be able to use the same code patch for all filesystems.
We also change int
It's more or less straight-forward.
Most changes are around getting offset/len withing page right and zero
out desired part of the page.
Signed-off-by: Kirill A. Shutemov
---
fs/buffer.c | 70 +++--
1 file changed, 40 insertions(+), 30 del
As BIO_MAX_PAGES is smaller (on x86) than HPAGE_PMD_NR, we cannot use
the optimization ext4_mpage_readpages() provides.
So, for huge pages, we fallback directly to block_read_full_page().
This should be re-visited once we get multipage bvec upstream.
Signed-off-by: Kirill A. Shutemov
---
fs/ex
Trivial: remove assumption on page size.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 13 +++--
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index fa4467e4b129..387aa857770b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@
As the function handles zeroing range only within one block, the
required changes are trivial, just remove assuption on page size.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 7 +--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
in
The approach is straight-forward: for compound pages we read out whole
huge page.
For huge page we cannot have array of buffer head pointers on stack --
it's 4096 pointers on x86-64 -- 'arr' is allocated with kmalloc() for
huge pages.
Signed-off-by: Kirill A. Shutemov
---
fs/buffer.c
With huge pages in page cache we see tail pages in more code paths.
This patch replaces direct access to struct page fields with macros
which can handle tail pages properly.
Signed-off-by: Kirill A. Shutemov
---
fs/buffer.c | 2 +-
fs/ext4/inode.c | 4 ++--
mm/filemap.c| 24
Adjust how we find relevant block within page and how we clear the
required part of the page.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/move_extent.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index 6fc14def0
Modify mpage_map_and_submit_buffers() and mpage_release_unused_pages()
to deal with huge pages.
Mostly result of try-and-error. Critical view would be appriciated.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 61 -
1 file change
These flags are in use for filesystems with backing storage: PG_error,
PG_writeback and PG_readahead.
Signed-off-by: Kirill A. Shutemov
---
include/linux/page-flags.h | 10 +-
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-fl
ext4_find_unwritten_pgoff() needs few tweaks to work with huge pages.
Mostly trivial page_mapping()/page_to_pgoff() and adjustment to how we
find relevant block.
Signe-off-by: Kirill A. Shutemov
---
fs/ext4/file.c | 18 ++
1 file changed, 14 insertions(+), 4 deletions(-)
diff --
__ext4_block_zero_page_range() adjusted to calculate starting iblock
correctry for huge pages.
ext4_{collapse,insert}_range() requires page cache invalidation. We need
the invalidation to be aligning to huge page border if huge pages are
possible in page cache.
Signed-off-by: Kirill A. Shutemov
We want mmap(NULL) to return PMD-aligned address if the inode can have
huge pages in page cache.
Signed-off-by: Kirill A. Shutemov
---
mm/huge_memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index a15d566b14f6..9c6ba124ba50 1006
This patch adds basic functionality to put huge page into page cache.
At the moment we only put huge pages into radix-tree if the range covered
by the huge page is empty.
We ignore shadow entires for now, just remove them from the tree before
inserting huge page.
Later we can add logic to accumu
From: Naoya Horiguchi
Currently, hugetlb pages are linked to page cache on the basis of hugepage
offset (derived from vma_hugecache_offset()) for historical reason, which
doesn't match to the generic usage of page cache and requires some routines
to covert page offset <=> hugepage offset in commo
As with shmem_undo_range(), truncate_inode_pages_range() removes huge
pages, if it fully within range.
Partial truncate of huge pages zero out this part of THP.
Unlike with shmem, it doesn't prevent us having holes in the middle of
huge page we still can skip writeback not touched buffers.
With
The same four values as in tmpfs case.
Encyption code is not yet ready to handle huge page, so we disable huge
pages support if the inode has EXT4_INODE_ENCRYPT.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/ext4.h | 5 +
fs/ext4/inode.c | 30 +++---
fs/ext4/super.
Most page cache allocation happens via readahead (sync or async), so if
we want to have significant number of huge pages in page cache we need
to find a ways to allocate them from readahead.
Unfortunately, huge pages doesn't fit into current readahead design:
128 max readahead window, assumption o
It simply matches changes to __block_write_begin_int().
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 35 +--
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d3143dfe9962..21662bcbbbcb 100644
--- a/
Change ext4_writepage() and underlying ext4_bio_write_page().
It basically removes assumption on page size, infer it from struct page
instead.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 10 +-
fs/ext4/page-io.c | 11 +--
2 files changed, 14 insertions(+), 7 deleti
We writeback whole huge page a time.
Signed-off-by: Kirill A. Shutemov
---
mm/filemap.c | 5 +
1 file changed, 5 insertions(+)
diff --git a/mm/filemap.c b/mm/filemap.c
index ec976ddcb88a..52be2b457208 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -405,9 +405,14 @@ static int __filemap_fda
We need to account huge pages according to its size to get background
writaback work properly.
Signed-off-by: Kirill A. Shutemov
---
fs/fs-writeback.c | 10 +++---
include/linux/backing-dev.h | 10 ++
include/linux/memcontrol.h | 22 ++---
mm/migrate.c| 1
Call ext4_da_should_update_i_disksize() for head page with offset
relative to head page.
Signed-off-by: Kirill A. Shutemov
---
fs/ext4/inode.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 21662bcbbbcb..e89249c03d2f 100644
---
Most of work happans on head page. Only when we need to do copy data to
userspace we find relevant subpage.
We are still limited by PAGE_SIZE per iteration. Lifting this limitation
would require some more work.
Signed-off-by: Kirill A. Shutemov
---
mm/filemap.c | 5 -
1 file changed, 4 inse
We want page to be isolated from the rest of the system before spliting
it. We rely on page count to be 2 for file pages to make sure nobody
uses the page: one pin to caller, one to radix-tree.
Filesystems with backing storage can have page count increased if it has
buffers.
Let's try to free the
Here's respin of my huge ext4 patchset on top of Matthew's patchset with
few changes and fixes (see below).
Please review and consider applying.
I don't see any xfstests regressions with huge pages enabled. Patch with
new configurations for xfstests-bld is below.
The basics are the same as with
We writeback whole huge page a time. Let's adjust iteration this way.
Signed-off-by: Kirill A. Shutemov
---
include/linux/mm.h | 1 +
include/linux/pagemap.h | 1 +
mm/page-writeback.c | 17 -
3 files changed, 14 insertions(+), 5 deletions(-)
diff --git a/include/linu
Write path allocate pages using pagecache_get_page(). We should be able
to allocate huge pages there, if it's allowed. As usually, fallback to
small pages, if failed.
Signed-off-by: Kirill A. Shutemov
---
mm/filemap.c | 17 +++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff
52 matches
Mail list logo