Re: [RFC 0/4] ZRAM: make it just store the high compression rate page

2016-09-04 Thread Hui Zhu
On Mon, Sep 5, 2016 at 10:18 AM, Minchan Kim  wrote:
> On Thu, Aug 25, 2016 at 04:25:30PM +0800, Hui Zhu wrote:
>> On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
>>  wrote:
>> > Hello,
>> >
>> > On (08/22/16 16:25), Hui Zhu wrote:
>> >>
>> >> Current ZRAM just can store all pages even if the compression rate
>> >> of a page is really low.  So the compression rate of ZRAM is out of
>> >> control when it is running.
>> >> In my part, I did some test and record with ZRAM.  The compression rate
>> >> is about 40%.
>> >>
>> >> This series of patches make ZRAM can just store the page that the
>> >> compressed size is smaller than a value.
>> >> With these patches, I set the value to 2048 and did the same test with
>> >> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> >> also decreased.
>> >
>> > I haven't looked at the patches in details yet. can you educate me a bit?
>> > is your test stable? why the number of lowmemorykill-s has decreased?
>> > ... or am reading "The times of lowmemorykiller also decreased" wrong?
>> >
>> > suppose you have X pages that result in bad compression size (from zram
>> > point of view). zram stores such pages uncompressed, IOW we have no memory
>> > savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
>> > don't try to store those pages in zsmalloc, but keep them as unevictable.
>> > so the page still occupies PAGE_SIZE; no memory saving again. why did it
>> > improve LMK?
>>
>> No, zram will not save this page uncompressed with these patches.  It
>> will set it as non-swap and kick back to shrink_page_list.
>> Shrink_page_list will remove this page from swapcache and kick it to
>> unevictable list.
>> Then this page will not be swaped before it get write.
>> That is why most of code are around vmscan.c.
>
> If I understand Sergey's point right, he means there is no gain
> to save memory between before and after.
>
> With your approach, you can prevent unnecessary pageout(i.e.,
> uncompressible page swap out) but it doesn't mean you save the
> memory compared to old so why does your patch decrease the number of
> lowmemory killing?
>
> A thing I can imagine is without this feature, zram could be full of
> uncompressible pages so good-compressible page cannot be swapped out.
> Hui, is this scenario right for your case?
>

That is one reason.  But it is not the principal one.

Another reason is when swap is running to put page to zram, what the
system wants is to get memory.
Then the deal is system spends cpu time and memory to get memory. If
the zram just access the high compression rate pages, system can get
more memory with the same amount of memory. It will pull system from
low memory status earlier. (Maybe more cpu time, because the
compression rate checks. But maybe less, because fewer pages need to
digress. That is the interesting part. :)
I think that is why lmk times decrease.

And yes, all of this depends on the number of high compression rate
pages. So you cannot just set a non_swap limit to the system and get
everything. You need to do a lot of test around it to make sure the
non_swap limit is good for your system.

And I think use AOP_WRITEPAGE_ACTIVATE without kicking page to a
special list will make cpu too busy sometimes.
I did some tests before I kick page to a special list. The shrink task
will be moved around, around and around because low compression rate
pages just moved from one list to another a lot of times, again, again
and again.
And all this low compression rate pages always stay together.

Thanks,
Hui


> Thanks.


Re: [RFC 0/4] ZRAM: make it just store the high compression rate page

2016-08-25 Thread Hui Zhu
On Thu, Aug 25, 2016 at 2:09 PM, Sergey Senozhatsky
 wrote:
> Hello,
>
> On (08/22/16 16:25), Hui Zhu wrote:
>>
>> Current ZRAM just can store all pages even if the compression rate
>> of a page is really low.  So the compression rate of ZRAM is out of
>> control when it is running.
>> In my part, I did some test and record with ZRAM.  The compression rate
>> is about 40%.
>>
>> This series of patches make ZRAM can just store the page that the
>> compressed size is smaller than a value.
>> With these patches, I set the value to 2048 and did the same test with
>> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> also decreased.
>
> I haven't looked at the patches in details yet. can you educate me a bit?
> is your test stable? why the number of lowmemorykill-s has decreased?
> ... or am reading "The times of lowmemorykiller also decreased" wrong?
>
> suppose you have X pages that result in bad compression size (from zram
> point of view). zram stores such pages uncompressed, IOW we have no memory
> savings - swapped out page lands in zsmalloc PAGE_SIZE class. now you
> don't try to store those pages in zsmalloc, but keep them as unevictable.
> so the page still occupies PAGE_SIZE; no memory saving again. why did it
> improve LMK?

No, zram will not save this page uncompressed with these patches.  It
will set it as non-swap and kick back to shrink_page_list.
Shrink_page_list will remove this page from swapcache and kick it to
unevictable list.
Then this page will not be swaped before it get write.
That is why most of code are around vmscan.c.

Thanks,
Hui

>
> -ss


Re: [RFC 0/4] ZRAM: make it just store the high compression rate page

2016-08-23 Thread Hui Zhu
Hi Minchan,

On Wed, Aug 24, 2016 at 9:04 AM, Minchan Kim  wrote:
> Hi Hui,
>
> On Mon, Aug 22, 2016 at 04:25:05PM +0800, Hui Zhu wrote:
>> Current ZRAM just can store all pages even if the compression rate
>> of a page is really low.  So the compression rate of ZRAM is out of
>> control when it is running.
>> In my part, I did some test and record with ZRAM.  The compression rate
>> is about 40%.
>>
>> This series of patches make ZRAM can just store the page that the
>> compressed size is smaller than a value.
>> With these patches, I set the value to 2048 and did the same test with
>> before.  The compression rate is about 20%.  The times of lowmemorykiller
>> also decreased.
>
> I have an interest about the feature for a long time but didn't work on it
> because I didn't have a good idea to implment it with generic approach
> without layer violation. I will look into this after handling urgent works.
>
> Thanks.

That will be great.  Thanks.

Best,
Hui


[RFC 0/4] ZRAM: make it just store the high compression rate page

2016-08-22 Thread Hui Zhu
Current ZRAM just can store all pages even if the compression rate
of a page is really low.  So the compression rate of ZRAM is out of
control when it is running.
In my part, I did some test and record with ZRAM.  The compression rate
is about 40%.

This series of patches make ZRAM can just store the page that the
compressed size is smaller than a value.
With these patches, I set the value to 2048 and did the same test with
before.  The compression rate is about 20%.  The times of lowmemorykiller
also decreased.

Hui Zhu (4):
vmscan.c: shrink_page_list: unmap anon pages after pageout
Add non-swap page flag to mark a page will not swap
ZRAM: do not swap the pages that compressed size bigger than non_swap
vmscan.c: zram: add non swap support for shmem file pages

 drivers/block/zram/Kconfig |   11 +++
 drivers/block/zram/zram_drv.c  |   38 +++
 drivers/block/zram/zram_drv.h  |4 +
 fs/proc/meminfo.c  |6 +
 include/linux/mm_inline.h  |   20 +
 include/linux/mmzone.h |3 
 include/linux/page-flags.h |8 ++
 include/linux/rmap.h   |5 +
 include/linux/shmem_fs.h   |6 +
 include/trace/events/mmflags.h |9 ++
 kernel/events/uprobes.c|   16 
 mm/Kconfig |9 ++
 mm/memory.c|   34 ++
 mm/migrate.c   |4 +
 mm/mprotect.c  |8 ++
 mm/page_io.c   |   11 ++-
 mm/rmap.c  |   23 ++
 mm/shmem.c |   77 +-
 mm/vmscan.c|  139 +++--
 19 files changed, 387 insertions(+), 44 deletions(-)


[RFC 3/4] ZRAM: do not swap the page that compressed size bigger than non_swap

2016-08-22 Thread Hui Zhu
New option ZRAM_NON_SWAP add a interface "non_swap" to zram.
User can set a unsigned int value to zram.
If a page that compressed size is bigger than limit, mark it as
non-swap.  Then this page will add to unevictable lru list.

This patch doesn't handle the shmem file pages.

Signed-off-by: Hui Zhu 
---
 drivers/block/zram/Kconfig| 11 +++
 drivers/block/zram/zram_drv.c | 39 +++
 drivers/block/zram/zram_drv.h |  4 
 3 files changed, 54 insertions(+)

diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index b8ecba6..525caaa 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -13,3 +13,14 @@ config ZRAM
  disks and maybe many more.
 
  See zram.txt for more information.
+
+config ZRAM_NON_SWAP
+   bool "Enable zram non-swap support"
+   depends on ZRAM
+   select NON_SWAP
+   default n
+   help
+ This option add a interface "non_swap" to zram.  User can set
+ a unsigned int value to zram.
+ If a page that compressed size is bigger than limit, mark it as
+ non-swap.  Then this page will add to unevictable lru list.
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 04365b1..8f7f1ec 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -714,6 +714,14 @@ compress_again:
goto out;
}
 
+#ifdef CONFIG_ZRAM_NON_SWAP
+   if (!is_partial_io(bvec) && PageAnon(page) &&
+   zram->non_swap && clen > zram->non_swap) {
+   ret = 0;
+   SetPageNonSwap(page);
+   goto out;
+   }
+#endif
src = zstrm->buffer;
if (unlikely(clen > max_zpage_size)) {
clen = PAGE_SIZE;
@@ -1180,6 +1188,31 @@ static const struct block_device_operations zram_devops 
= {
.owner = THIS_MODULE
 };
 
+#ifdef CONFIG_ZRAM_NON_SWAP
+static ssize_t non_swap_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct zram *zram = dev_to_zram(dev);
+
+   return scnprintf(buf, PAGE_SIZE, "%u\n", zram->non_swap);
+}
+
+static ssize_t non_swap_store(struct device *dev,
+ struct device_attribute *attr, const char *buf,
+ size_t len)
+{
+   struct zram *zram = dev_to_zram(dev);
+
+   zram->non_swap = (unsigned int)memparse(buf, NULL);
+
+   if (zram->non_swap > max_zpage_size)
+   pr_warn("Nonswap should small than max_zpage_size %zu\n",
+   max_zpage_size);
+
+   return len;
+}
+#endif
+
 static DEVICE_ATTR_WO(compact);
 static DEVICE_ATTR_RW(disksize);
 static DEVICE_ATTR_RO(initstate);
@@ -1190,6 +1223,9 @@ static DEVICE_ATTR_RW(mem_limit);
 static DEVICE_ATTR_RW(mem_used_max);
 static DEVICE_ATTR_RW(max_comp_streams);
 static DEVICE_ATTR_RW(comp_algorithm);
+#ifdef CONFIG_ZRAM_NON_SWAP
+static DEVICE_ATTR_RW(non_swap);
+#endif
 
 static struct attribute *zram_disk_attrs[] = {
&dev_attr_disksize.attr,
@@ -1210,6 +1246,9 @@ static struct attribute *zram_disk_attrs[] = {
&dev_attr_mem_used_max.attr,
&dev_attr_max_comp_streams.attr,
&dev_attr_comp_algorithm.attr,
+#ifdef CONFIG_ZRAM_NON_SWAP
+   &dev_attr_non_swap.attr,
+#endif
&dev_attr_io_stat.attr,
&dev_attr_mm_stat.attr,
&dev_attr_debug_stat.attr,
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 74fcf10..bd5f38a 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -119,5 +119,9 @@ struct zram {
 * zram is claimed so open request will be failed
 */
bool claim; /* Protected by bdev->bd_mutex */
+
+#ifdef CONFIG_ZRAM_NON_SWAP
+   unsigned int non_swap;
+#endif
 };
 #endif
-- 
1.9.1



[RFC 2/4] Add non-swap page flag to mark a page will not swap

2016-08-22 Thread Hui Zhu
After a page marked non-swap flag in swap driver, it will add to
unevictable lru list.
This page will be kept in this status before its data changed.

Signed-off-by: Hui Zhu 
---
 fs/proc/meminfo.c  |  6 ++
 include/linux/mm_inline.h  | 20 ++--
 include/linux/mmzone.h |  3 +++
 include/linux/page-flags.h |  8 
 include/trace/events/mmflags.h |  9 -
 kernel/events/uprobes.c| 16 +++-
 mm/Kconfig |  5 +
 mm/memory.c| 34 ++
 mm/migrate.c   |  4 
 mm/mprotect.c  |  8 
 mm/vmscan.c| 41 -
 11 files changed, 149 insertions(+), 5 deletions(-)

diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index b9a8c81..5c79b2e 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -79,6 +79,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
"SwapTotal:  %8lu kB\n"
"SwapFree:   %8lu kB\n"
+#ifdef CONFIG_NON_SWAP
+   "NonSwap:%8lu kB\n"
+#endif
"Dirty:  %8lu kB\n"
"Writeback:  %8lu kB\n"
"AnonPages:  %8lu kB\n"
@@ -138,6 +141,9 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
 #endif
K(i.totalswap),
K(i.freeswap),
+#ifdef CONFIG_NON_SWAP
+   K(global_page_state(NR_NON_SWAP)),
+#endif
K(global_node_page_state(NR_FILE_DIRTY)),
K(global_node_page_state(NR_WRITEBACK)),
K(global_node_page_state(NR_ANON_MAPPED)),
diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h
index 71613e8..92298ce 100644
--- a/include/linux/mm_inline.h
+++ b/include/linux/mm_inline.h
@@ -46,15 +46,31 @@ static __always_inline void update_lru_size(struct lruvec 
*lruvec,
 static __always_inline void add_page_to_lru_list(struct page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
-   update_lru_size(lruvec, lru, page_zonenum(page), hpage_nr_pages(page));
+   int nr_pages = hpage_nr_pages(page);
+   enum zone_type zid = page_zonenum(page);
+#ifdef CONFIG_NON_SWAP
+   if (PageNonSwap(page)) {
+   lru = LRU_UNEVICTABLE;
+   update_lru_size(lruvec, NR_NON_SWAP, zid, nr_pages);
+   }
+#endif
+   update_lru_size(lruvec, lru, zid, nr_pages);
list_add(&page->lru, &lruvec->lists[lru]);
 }
 
 static __always_inline void del_page_from_lru_list(struct page *page,
struct lruvec *lruvec, enum lru_list lru)
 {
+   int nr_pages = hpage_nr_pages(page);
+   enum zone_type zid = page_zonenum(page);
+#ifdef CONFIG_NON_SWAP
+   if (PageNonSwap(page)) {
+   lru = LRU_UNEVICTABLE;
+   update_lru_size(lruvec, NR_NON_SWAP, zid, -nr_pages);
+   }
+#endif
list_del(&page->lru);
-   update_lru_size(lruvec, lru, page_zonenum(page), -hpage_nr_pages(page));
+   update_lru_size(lruvec, lru, zid, -nr_pages);
 }
 
 /**
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index d572b78..da08d20 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -138,6 +138,9 @@ enum zone_stat_item {
NUMA_OTHER, /* allocation from other node */
 #endif
NR_FREE_CMA_PAGES,
+#ifdef CONFIG_NON_SWAP
+   NR_NON_SWAP,
+#endif
NR_VM_ZONE_STAT_ITEMS };
 
 enum node_stat_item {
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 74e4dda..0cd80db9 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -105,6 +105,9 @@ enum pageflags {
PG_young,
PG_idle,
 #endif
+#ifdef CONFIG_NON_SWAP
+   PG_non_swap,
+#endif
__NR_PAGEFLAGS,
 
/* Filesystems */
@@ -303,6 +306,11 @@ PAGEFLAG(Reclaim, reclaim, PF_NO_TAIL)
 PAGEFLAG(Readahead, reclaim, PF_NO_COMPOUND)
TESTCLEARFLAG(Readahead, reclaim, PF_NO_COMPOUND)
 
+#ifdef CONFIG_NON_SWAP
+PAGEFLAG(NonSwap, non_swap, PF_NO_TAIL)
+   TESTSCFLAG(NonSwap, non_swap, PF_NO_TAIL)
+#endif
+
 #ifdef CONFIG_HIGHMEM
 /*
  * Must use a macro here due to header dependency issues. page_zone() is not
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 5a81ab4..1c0ccc9 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -79,6 +79,12 @@
 #define IF_HAVE_PG_IDLE(flag,string)
 #endif
 
+#ifdef CONFIG_NON_SWAP
+#define IF_HAVE_PG_NON_SWAP(flag,string) ,{1UL << flag, string}
+#else
+#define IF_HAVE_PG_NON_SWAP(flag,string)
+#endif
+
 #define __def_pageflag_names   \
{1UL << PG_locked,  "locked"},  \
{1UL <&

[RFC 1/4] vmscan.c: shrink_page_list: unmap anon pages after pageout

2016-08-22 Thread Hui Zhu
The page is unmapped when ZRAM get the compressed size.  At it is added
to swapcache.
To remove it from swapcache need set each pte back to point to pfn.
But these is not a way to do it.

This patch set each pte readonly before pageout.  Then when the page is
written when save its data to ZRAM, its pte will be set to dirty.
After pageout, shrink_page_list will check the pte and re-dirty the page.
After pageout successfully and page is not dirty, unmap the page.

This patch doesn't handle the shmem file pages that use swap too.
The reason is I just find a hack way the make sure a page is shmem file
page. Then I separate code of shmem file pages to last patch of this
series.

Signed-off-by: Hui Zhu 
---
 include/linux/rmap.h |  5 
 mm/Kconfig   |  4 +++
 mm/page_io.c | 11 ---
 mm/rmap.c| 28 ++
 mm/vmscan.c  | 81 +---
 5 files changed, 108 insertions(+), 21 deletions(-)

diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index b46bb56..4259c46 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -88,6 +88,11 @@ enum ttu_flags {
TTU_LZFREE = 8, /* lazy free mode */
TTU_SPLIT_HUGE_PMD = 16,/* split huge PMD if any */
 
+#ifdef CONFIG_LATE_UNMAP
+   TTU_CHECK_DIRTY = (1 << 5), /* Check dirty mode */
+   TTU_READONLY = (1 << 6),/* Change readonly mode */
+#endif
+
TTU_IGNORE_MLOCK = (1 << 8),/* ignore mlock */
TTU_IGNORE_ACCESS = (1 << 9),   /* don't age */
TTU_IGNORE_HWPOISON = (1 << 10),/* corrupted page is recoverable */
diff --git a/mm/Kconfig b/mm/Kconfig
index 78a23c5..57ecdb3 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -704,3 +704,7 @@ config ARCH_USES_HIGH_VMA_FLAGS
bool
 config ARCH_HAS_PKEYS
bool
+
+config LATE_UNMAP
+   bool
+   depends on SWAP
diff --git a/mm/page_io.c b/mm/page_io.c
index 16bd82fa..adaf801 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -237,10 +237,13 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
 {
int ret = 0;
 
-   if (try_to_free_swap(page)) {
-   unlock_page(page);
-   goto out;
-   }
+#ifdef CONFIG_LATE_UNMAP
+   if (!(PageAnon(page) && page_mapped(page)))
+#endif
+   if (try_to_free_swap(page)) {
+   unlock_page(page);
+   goto out;
+   }
if (frontswap_store(page) == 0) {
set_page_writeback(page);
unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index 1ef3640..d484f95 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1488,6 +1488,29 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
}
}
 
+#ifdef CONFIG_LATE_UNMAP
+   if ((flags & TTU_CHECK_DIRTY) || (flags & TTU_READONLY)) {
+   BUG_ON(!PageAnon(page));
+
+   pteval = *pte;
+
+   BUG_ON(pte_write(pteval) &&
+  page_mapcount(page) + page_swapcount(page) > 1);
+
+   if ((flags & TTU_CHECK_DIRTY) && pte_dirty(pteval)) {
+   set_page_dirty(page);
+   pteval = pte_mkclean(pteval);
+   }
+
+   if (flags & TTU_READONLY)
+   pteval = pte_wrprotect(pteval);
+
+   if (!pte_same(*pte, pteval))
+   set_pte_at(mm, address, pte, pteval);
+   goto out_unmap;
+   }
+#endif
+
/* Nuke the page table entry. */
flush_cache_page(vma, address, page_to_pfn(page));
if (should_defer_flush(mm, flags)) {
@@ -1657,6 +1680,11 @@ int try_to_unmap(struct page *page, enum ttu_flags flags)
else
ret = rmap_walk(page, &rwc);
 
+#ifdef CONFIG_LATE_UNMAP
+   if ((flags & (TTU_READONLY | TTU_CHECK_DIRTY)) &&
+   ret == SWAP_AGAIN)
+   ret = SWAP_SUCCESS;
+#endif
if (ret != SWAP_MLOCK && !page_mapcount(page)) {
ret = SWAP_SUCCESS;
if (rp.lazyfreed && !PageDirty(page))
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 374d95d..32fef7d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -494,12 +494,19 @@ void drop_slab(void)
 
 static inline int is_page_cache_freeable(struct page *page)
 {
+   int count = page_count(page) - page_has_private(page);
+
+#ifdef CONFIG_LATE_UNMAP
+   if (PageAnon(page))
+   count -= page_mapcount(page);
+#endif
+
/*
 * A freeable page cache page is referenced only by the caller
 * that isolated the page, the page cache radix tree and
 * optional buffer heads at page->private.
 */
-   return page_count(page) - page_has_private(page) == 2;
+   return count == 2;
 }
 
 static int may_write_to_inode

[RFC 4/4] vmscan.c: zram: add non swap support for shmem file pages

2016-08-22 Thread Hui Zhu
This patch add the whole support for shmem file pages non swap.
To make sure a page is shmem file page, check mapping->a_ops == &shmem_aops.
I think it is really a hack way.

There are not a lot of shmem file pages will be swapped out.

Signed-off-by: Hui Zhu 
---
 drivers/block/zram/zram_drv.c |  3 +-
 include/linux/shmem_fs.h  |  6 
 mm/page_io.c  |  2 +-
 mm/rmap.c |  5 ---
 mm/shmem.c| 77 ++-
 mm/vmscan.c   | 27 +++
 6 files changed, 89 insertions(+), 31 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 8f7f1ec..914c096 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -715,8 +715,7 @@ compress_again:
}
 
 #ifdef CONFIG_ZRAM_NON_SWAP
-   if (!is_partial_io(bvec) && PageAnon(page) &&
-   zram->non_swap && clen > zram->non_swap) {
+   if (!is_partial_io(bvec) && zram->non_swap && clen > zram->non_swap) {
ret = 0;
SetPageNonSwap(page);
goto out;
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ff078e7..fd44473 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -124,4 +124,10 @@ static inline bool shmem_huge_enabled(struct 
vm_area_struct *vma)
 }
 #endif
 
+extern const struct address_space_operations shmem_aops;
+
+#ifdef CONFIG_LATE_UNMAP
+extern void shmem_page_unmap(struct page *page);
+#endif
+
 #endif
diff --git a/mm/page_io.c b/mm/page_io.c
index adaf801..5fd3069 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -238,7 +238,7 @@ int swap_writepage(struct page *page, struct 
writeback_control *wbc)
int ret = 0;
 
 #ifdef CONFIG_LATE_UNMAP
-   if (!(PageAnon(page) && page_mapped(page)))
+   if (!page_mapped(page))
 #endif
if (try_to_free_swap(page)) {
unlock_page(page);
diff --git a/mm/rmap.c b/mm/rmap.c
index d484f95..418f731 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1490,13 +1490,8 @@ static int try_to_unmap_one(struct page *page, struct 
vm_area_struct *vma,
 
 #ifdef CONFIG_LATE_UNMAP
if ((flags & TTU_CHECK_DIRTY) || (flags & TTU_READONLY)) {
-   BUG_ON(!PageAnon(page));
-
pteval = *pte;
 
-   BUG_ON(pte_write(pteval) &&
-  page_mapcount(page) + page_swapcount(page) > 1);
-
if ((flags & TTU_CHECK_DIRTY) && pte_dirty(pteval)) {
set_page_dirty(page);
pteval = pte_mkclean(pteval);
diff --git a/mm/shmem.c b/mm/shmem.c
index fd8b2b5..556d853 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -182,7 +182,6 @@ static inline void shmem_unacct_blocks(unsigned long flags, 
long pages)
 }
 
 static const struct super_operations shmem_ops;
-static const struct address_space_operations shmem_aops;
 static const struct file_operations shmem_file_operations;
 static const struct inode_operations shmem_inode_operations;
 static const struct inode_operations shmem_dir_inode_operations;
@@ -1178,6 +1177,55 @@ out:
return error;
 }
 
+#define SHMEM_WRITEPAGE_LOCK   \
+   do {\
+   mutex_lock(&shmem_swaplist_mutex);  \
+   if (list_empty(&info->swaplist))\
+   list_add_tail(&info->swaplist,  \
+ &shmem_swaplist); \
+   } while (0)
+
+#define SHMEM_WRITEPAGE_SWAP   \
+   do {\
+   spin_lock(&info->lock); \
+   shmem_recalc_inode(inode);  \
+   info->swapped++;\
+   spin_unlock(&info->lock);   \
+   swap_shmem_alloc(swap); \
+   shmem_delete_from_page_cache(page,  \
+swp_to_radix_entry(swap)); \
+   } while (0)
+
+#define SHMEM_WRITEPAGE_UNLOCK \
+   do {\
+   mutex_unlock(&shmem_swaplist_mutex);\
+   } while (0)
+
+#define SHMEM_WRITEPAGE_BUG_ON \
+   do {\
+   BUG_ON(page_mapped(page));  \
+   } while (0)
+
+#ifdef CONFIG_LATE_UNMAP
+void
+shmem_page_unmap(struct page *page)
+{
+   struct shmem_inode_info *info;
+   struct address_space *mapping;
+   struct inode *inode;
+

RE: [PATCH v6] usb: serial: ftdi_sio Added 0a5c:6422 device ID for WICED USB UART dev board

2016-07-28 Thread Sheng-Hui J. Chu
From: Greg KH [mailto:gre...@linuxfoundation.org] 
Sent: Thursday, July 28, 2016 5:09 PM
To: Sheng-Hui J. Chu 
Cc: linux-...@vger.kernel.org; jo...@kernel.org; linux-kernel@vger.kernel.org
Subject: Re: [PATCH v6] usb: serial: ftdi_sio Added 0a5c:6422 device ID for 
WICED USB UART dev board

On Thu, Jul 28, 2016 at 05:01:45PM -0400, Sheng-Hui J. Chu wrote:
> BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0 UART/FIFO 
> IC.
> 
> To support BCM920706V2_EVAL dev board for WICED development on Linux.  Add 
> the VID(0a5c) and 
> PID(6422) to ftdi_sio driver to allow loading ftdi_sio for this board.
> 
> Signed-off-by: Sheng-Hui J. Chu 
> ---
>  drivers/usb/serial/ftdi_sio.c   | 1 +
>  drivers/usb/serial/ftdi_sio_ids.h | 6 
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
> index 0082080..ef19af4 100644
> --- a/drivers/usb/serial/ftdi_sio.c
> +++ b/drivers/usb/serial/ftdi_sio.c
> @@ -1008,6 +1008,7 @@ static const struct usb_device_id id_table_combined[] = 
> {
>   { USB_DEVICE(ICPDAS_VID, ICPDAS_I7560U_PID) },
>   { USB_DEVICE(ICPDAS_VID, ICPDAS_I7561U_PID) },
>   { USB_DEVICE(ICPDAS_VID, ICPDAS_I7563U_PID) },
> + { USB_DEVICE(WICED_USB_VID, WICED_USB20706V2_PID) },
>   { } /* Terminating entry */
>  };
>  
> diff --git a/drivers/usb/serial/ftdi_sio_ids.h 
> b/drivers/usb/serial/ftdi_sio_ids.h
> index c5d6c1e..b29f280 100644
> --- a/drivers/usb/serial/ftdi_sio_ids.h
> +++ b/drivers/usb/serial/ftdi_sio_ids.h
> @@ -1485,3 +1485,11 @@
>  #define CHETCO_SEASMART_DISPLAY_PID  0xA5AD /* SeaSmart NMEA2000 Display */
>  #define CHETCO_SEASMART_LITE_PID 0xA5AE /* SeaSmart Lite USB Adapter */
>  #define CHETCO_SEASMART_ANALOG_PID   0xA5AF /* SeaSmart Analog Adapter */
> +
> +/*
> + * WICED USB UART
> + */
> +#define WICED_USB_VID0x0A5C
> +#define WICED_USB20706V2_PID 0x6422
> -- 
> 2.1.4

Yeah!!!

I'll let Johan queue this up, and forward it on to my trees in a few
days.  Thanks so much for sticking with this.

greg k-h

Thank you very much for your patience and guidance.

-Jeffrey



[PATCH v6] usb: serial: ftdi_sio Added 0a5c:6422 device ID for WICED USB UART dev board

2016-07-28 Thread Sheng-Hui J. Chu
BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0 UART/FIFO IC.

To support BCM920706V2_EVAL dev board for WICED development on Linux.  Add the 
VID(0a5c) and 
PID(6422) to ftdi_sio driver to allow loading ftdi_sio for this board.

Signed-off-by: Sheng-Hui J. Chu 
---
 drivers/usb/serial/ftdi_sio.c | 1 +
 drivers/usb/serial/ftdi_sio_ids.h | 6 
 2 files changed, 7 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 0082080..ef19af4 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1008,6 +1008,7 @@ static const struct usb_device_id id_table_combined[] = {
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7560U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7561U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7563U_PID) },
+   { USB_DEVICE(WICED_USB_VID, WICED_USB20706V2_PID) },
{ } /* Terminating entry */
 };
 
diff --git a/drivers/usb/serial/ftdi_sio_ids.h 
b/drivers/usb/serial/ftdi_sio_ids.h
index c5d6c1e..b29f280 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -1485,3 +1485,11 @@
 #define CHETCO_SEASMART_DISPLAY_PID0xA5AD /* SeaSmart NMEA2000 Display */
 #define CHETCO_SEASMART_LITE_PID   0xA5AE /* SeaSmart Lite USB Adapter */
 #define CHETCO_SEASMART_ANALOG_PID 0xA5AF /* SeaSmart Analog Adapter */
+
+/*
+ * WICED USB UART
+ */
+#define WICED_USB_VID  0x0A5C
+#define WICED_USB20706V2_PID   0x6422
-- 
2.1.4



[PATCH v5] usb: serial: ftdi_sio Added 0a5c:6422 device ID for WICED USB UART dev board

2016-07-28 Thread Sheng-Hui J. Chu
BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0 UART/FIFO IC.

To support BCM920706V2_EVAL dev board for WICED development on Linux.  Add the 
VID(0a5c) and 
PID(6422) to ftdi_sio driver to allow loading ftdi_sio for this board.

Signed-off-by: Sheng-Hui Chu 
---
 drivers/usb/serial/ftdi_sio.c | 1 +
 drivers/usb/serial/ftdi_sio_ids.h | 6 
 2 files changed, 7 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 0082080..ef19af4 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1008,6 +1008,7 @@ static const struct usb_device_id id_table_combined[] = {
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7560U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7561U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7563U_PID) },
+   { USB_DEVICE(WICED_USB_VID, WICED_USB20706V2_PID) },
{ } /* Terminating entry */
 };
 
diff --git a/drivers/usb/serial/ftdi_sio_ids.h 
b/drivers/usb/serial/ftdi_sio_ids.h
index c5d6c1e..b29f280 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -1485,3 +1485,11 @@
 #define CHETCO_SEASMART_DISPLAY_PID0xA5AD /* SeaSmart NMEA2000 Display */
 #define CHETCO_SEASMART_LITE_PID   0xA5AE /* SeaSmart Lite USB Adapter */
 #define CHETCO_SEASMART_ANALOG_PID 0xA5AF /* SeaSmart Analog Adapter */
+
+/*
+ * WICED USB UART
+ */
+#define WICED_USB_VID  0x0A5C
+#define WICED_USB20706V2_PID   0x6422
-- 
2.1.4



[PATCH v4] usb: serial: ftdi_sio Added 0a5c:6422 device ID for WICED USB UART dev board

2016-07-28 Thread Sheng-Hui J. Chu
BCM20706V2_EVAL is a WICED dev board designed with FT2232H USB 2.0 UART/FIFO
IC.

To support BCM920706V2_EVAL dev board for WICED development on Linux.  Add
the VID(0a5c) and 
PID(6422) to ftdi_sio driver to allow loading ftdi_sio for this board.

Signed-off-by: Sheng-Hui J. Chu 
---
 drivers/usb/serial/ftdi_sio.c | 1 +
 drivers/usb/serial/ftdi_sio_ids.h | 6 
 2 files changed, 7 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 0082080..ef19af4 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1008,6 +1008,7 @@ static const struct usb_device_id id_table_combined[]
= {
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7560U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7561U_PID) },
{ USB_DEVICE(ICPDAS_VID, ICPDAS_I7563U_PID) },
+   { USB_DEVICE(WICED_USB_VID, WICED_USB20706V2_PID) },
{ } /* Terminating entry */
 };
 
diff --git a/drivers/usb/serial/ftdi_sio_ids.h
b/drivers/usb/serial/ftdi_sio_ids.h
index c5d6c1e..b29f280 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -1485,3 +1485,11 @@
 #define CHETCO_SEASMART_DISPLAY_PID0xA5AD /* SeaSmart NMEA2000 Display
*/
 #define CHETCO_SEASMART_LITE_PID   0xA5AE /* SeaSmart Lite USB Adapter
*/
 #define CHETCO_SEASMART_ANALOG_PID 0xA5AF /* SeaSmart Analog Adapter */
+
+/*
+ * WICED USB UART
+ */
+#define WICED_USB_VID  0x0A5C
+#define WICED_USB20706V2_PID   0x6422
-- 
2.1.4



[PATCH RFC v2 01/12] Kconfig change

2016-04-17 Thread Bill Huey (hui)
Add the selection options for the cyclic scheduler

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/Kconfig | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 3e84315..8da9796 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -73,6 +73,11 @@ config RTC_DEBUG
  Say yes here to enable debugging support in the RTC framework
  and individual RTC drivers.
 
+config RTC_CYCLIC
+   bool "RTC cyclic executive scheduler support"
+   help
+ Frame/Cyclic executive scheduler support through the RTC interface
+
 comment "RTC interfaces"
 
 config RTC_INTF_SYSFS
-- 
2.5.0



[PATCH RFC v2 02/12] Reroute rtc update irqs to the cyclic scheduler handler

2016-04-17 Thread Bill Huey (hui)
Redirect rtc update irqs so that it drives the cyclic scheduler timer
handler instead. Let the handler determine which slot to activate next.
Similar to scheduler tick handling but just for the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/interface.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9ef5f6f..6d39d40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "../kernel/sched/cyclic.h"
+#endif
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -488,6 +492,9 @@ EXPORT_SYMBOL_GPL(rtc_update_irq_enable);
 void rtc_handle_legacy_irq(struct rtc_device *rtc, int num, int mode)
 {
unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int handled = 0;
+#endif
 
/* mark one irq of the appropriate mode */
spin_lock_irqsave(&rtc->irq_lock, flags);
@@ -500,7 +507,23 @@ void rtc_handle_legacy_irq(struct rtc_device *rtc, int 
num, int mode)
rtc->irq_task->func(rtc->irq_task->private_data);
spin_unlock_irqrestore(&rtc->irq_task_lock, flags);
 
+#ifdef CONFIG_RTC_CYCLIC
+   /* wake up slot_curr if overrun task */
+   if (RTC_PF) {
+   if (rt_overrun_rq_admitted()) {
+   /* advance the cursor, overrun report */
+   rt_overrun_timer_handler(rtc);
+   handled = 1;
+   }
+   }
+
+   if (!handled) {
+   wake_up_interruptible(&rtc->irq_queue);
+   }
+#else
wake_up_interruptible(&rtc->irq_queue);
+#endif
+
kill_fasync(&rtc->async_queue, SIGIO, POLL_IN);
 }
 
-- 
2.5.0



[PATCH RFC v2 07/12] kernel/userspace additions for addition ioctl() support for rtc

2016-04-17 Thread Bill Huey (hui)
Add additional ioctl() values to rtc so that it can 'admit' the calling
thread into a red-black tree for tracking, set the execution slot pattern,
support for setting whether read() will yield or block.

Signed-off-by: Bill Huey (hui) 
---
 include/uapi/linux/rtc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/rtc.h b/include/uapi/linux/rtc.h
index f8c82e6..76c9254 100644
--- a/include/uapi/linux/rtc.h
+++ b/include/uapi/linux/rtc.h
@@ -94,6 +94,10 @@ struct rtc_pll_info {
 #define RTC_VL_READ_IOR('p', 0x13, int)/* Voltage low detector */
 #define RTC_VL_CLR _IO('p', 0x14)  /* Clear voltage low 
information */
 
+#define RTC_OV_ADMIT   _IOW('p', 0x15, unsigned long)   /* Set test   */
+#define RTC_OV_REPLEN  _IOW('p', 0x16, unsigned long)   /* Set test   */
+#define RTC_OV_YIELD   _IOW('p', 0x17, unsigned long)   /* Set test   */
+
 /* interrupt flags */
 #define RTC_IRQF 0x80  /* Any of the following is active */
 #define RTC_PF 0x40/* Periodic interrupt */
-- 
2.5.0



[PATCH RFC v2 04/12] Anonymous struct initialization

2016-04-17 Thread Bill Huey (hui)
Anonymous struct initialization

Signed-off-by: Bill Huey (hui) 
---
 include/linux/init_task.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f2cb8d4..ac9b0d9 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,23 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+# define INIT_RT_OVERRUN(tsk)  \
+   .rt_overrun = { \
+   .count = 0, \
+   .task_list = 
LIST_HEAD_INIT(tsk.rt.rt_overrun.task_list), \
+   .type = 0,  \
+   .color = 0, \
+   .slots = 0, \
+   .yield = 0, \
+   .machine_state = 0, \
+   .last_machine_state = 0,\
+   .last_task_state = 0,   \
+   }
+#else
+# define INIT_RT_OVERRUN(tsk)
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -210,6 +227,7 @@ extern struct task_group root_task_group;
.rt = { \
.run_list   = LIST_HEAD_INIT(tsk.rt.run_list),  \
.time_slice = RR_TIMESLICE, \
+   INIT_RT_OVERRUN(tsk)\
},  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
INIT_PUSHABLE_TASKS(tsk)\
-- 
2.5.0



[PATCH RFC v2 03/12] Add cyclic support to rtc-dev.c

2016-04-17 Thread Bill Huey (hui)
wait-queue changes to rtc_dev_read so that it can support overrun count
reporting when multiple threads are blocked against a single wait object.

ioctl() additions to allow for those calling it to admit the thread into
the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/rtc-dev.c | 167 ++
 1 file changed, 167 insertions(+)

diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index a6d9434..82c5cff 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -18,6 +18,15 @@
 #include 
 #include "rtc-core.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include 
+#include 
+
+#include <../kernel/sched/sched.h>
+#include <../kernel/sched/cyclic.h>
+//#include <../kernel/sched/cyclic_rt.h>
+#endif
+
 static dev_t rtc_devt;
 
 #define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */
@@ -29,6 +38,10 @@ static int rtc_dev_open(struct inode *inode, struct file 
*file)
struct rtc_device, char_dev);
const struct rtc_class_ops *ops = rtc->ops;
 
+#ifdef CONFIG_RTC_CYCLIC
+   reset_rt_overrun();
+#endif
+
if (test_and_set_bit_lock(RTC_DEV_BUSY, &rtc->flags))
return -EBUSY;
 
@@ -153,13 +166,26 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
 {
struct rtc_device *rtc = file->private_data;
 
+#ifdef CONFIG_RTC_CYCLIC
+   DEFINE_WAIT_FUNC(wait, single_default_wake_function);
+#else
DECLARE_WAITQUEUE(wait, current);
+#endif
unsigned long data;
+#ifdef CONFIG_RTC_CYCLIC
+   unsigned long flags;
+   int wake = 0, block = 0;
+#endif
ssize_t ret;
 
if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
return -EINVAL;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_task_yield(current))
+   goto yield;
+printk("%s: 0 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
add_wait_queue(&rtc->irq_queue, &wait);
do {
__set_current_state(TASK_INTERRUPTIBLE);
@@ -169,23 +195,65 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
rtc->irq_data = 0;
spin_unlock_irq(&rtc->irq_lock);
 
+#ifdef CONFIG_RTC_CYCLIC
+if (block) {
+   block = 0;
+   if (wake) {
+   printk("%s: wake \n", __func__);
+   wake = 0;
+   } else {
+   printk("%s: ~wake \n", __func__);
+   }
+}
+#endif
if (data != 0) {
+#ifdef CONFIG_RTC_CYCLIC
+   /* overrun reporting */
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   raw_spin_unlock_irqrestore(&rt_overrun_lock, flags);
ret = 0;
+printk("%s: 1 color = %d \n", __func__, current->rt.rt_overrun.color);
break;
}
+#else
+   ret = 0;
+   break;
+   }
+#endif
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
+#ifdef CONFIG_RTC_CYCLIC
+printk("%s: 2 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
break;
}
if (signal_pending(current)) {
+#ifdef CONFIG_RTC_CYCLIC
+printk("%s: 3 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
ret = -ERESTARTSYS;
break;
}
+#ifdef CONFIG_RTC_CYCLIC
+   block = 1;
+#endif
schedule();
+#ifdef CONFIG_RTC_CYCLIC
+   /* debugging */
+   wake = 1;
+#endif
} while (1);
set_current_state(TASK_RUNNING);
remove_wait_queue(&rtc->irq_queue, &wait);
 
+#ifdef CONFIG_RTC_CYCLIC
+ret:
+#endif
if (ret == 0) {
/* Check for any data updates */
if (rtc->ops->read_callback)
@@ -201,6 +269,29 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
sizeof(unsigned long);
}
return ret;
+
+#ifdef CONFIG_RTC_CYCLIC
+yield:
+
+   spin_lock_irq(&rtc->irq_lock);
+   data = rtc->irq_data;
+   rtc->irq_data = 0;
+   spin_unlock_irq(&rtc->irq_lock);
+
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_c

[PATCH RFC v2 08/12] Compilation support

2016-04-17 Thread Bill Huey (hui)
Makefile changes to support the menuconfig option

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 414d9c1..1e12a32 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -23,4 +23,5 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_RTC_CYCLIC) += cyclic.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq.o
-- 
2.5.0



[PATCH RFC v2 05/12] Task tracking per file descriptor

2016-04-17 Thread Bill Huey (hui)
Task tracking per file descriptor for thread death clean up.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/class.c | 3 +++
 include/linux/rtc.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 74fd974..ad570b9 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -201,6 +201,9 @@ struct rtc_device *rtc_device_register(const char *name, 
struct device *dev,
rtc->irq_freq = 1;
rtc->max_user_freq = 64;
rtc->dev.parent = dev;
+#ifdef CONFIG_RTC_CYCLIC
+   INIT_LIST_HEAD(&rtc->rt_overrun_tasks); //struct list_head
+#endif
rtc->dev.class = rtc_class;
rtc->dev.groups = rtc_get_dev_attribute_groups();
rtc->dev.release = rtc_device_release;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index b693ada..1424550 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -114,6 +114,9 @@ struct rtc_timer {
 struct rtc_device {
struct device dev;
struct module *owner;
+#ifdef CONFIG_RTC_CYCLIC
+   struct list_head rt_overrun_tasks;
+#endif
 
int id;
char name[RTC_DEVICE_NAME_SIZE];
-- 
2.5.0



[PATCH RFC v2 09/12] Add priority support for the cyclic scheduler

2016-04-17 Thread Bill Huey (hui)
Initial bits to prevent priority changing of cyclic scheduler tasks by
only allow them to be SCHED_FIFO. Fairly hacky at this time and will need
revisiting because of the security concerns.

Affects task death handling since it uses an additional scheduler class
hook for clean up at death. Must be SCHED_FIFO.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8b489fc..76634d3 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -87,6 +87,10 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -2092,6 +2096,10 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+   RB_CLEAR_NODE(&p->rt.rt_overrun.node);
+#endif
+
RB_CLEAR_NODE(&p->dl.rb_node);
init_dl_task_timer(&p->dl);
__dl_clear_params(p);
@@ -3899,6 +3907,11 @@ recheck:
if (dl_policy(policy))
return -EPERM;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_policy(p, policy))
+   return -EPERM;
+#endif
+
/*
 * Treat SCHED_IDLE as nice 20. Only allow a switch to
 * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
-- 
2.5.0



[PATCH RFC v2 10/12] Export SCHED_FIFO/RT requeuing functions

2016-04-17 Thread Bill Huey (hui)
SCHED_FIFO/RT tail/head runqueue insertion support, initial thread death
support via a hook to the scheduler class. Thread death must include
additional semantics to remove/discharge an admitted task properly.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/rt.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..1d77adc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+extern int rt_overrun_task_admitted1(struct rq *rq, struct task_struct *p);
+#endif
+
 int sched_rr_timeslice = RR_TIMESLICE;
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
@@ -1321,8 +1326,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, 
int flags)
 
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
+#ifdef CONFIG_RTC_CYCLIC
+   /* if admitted and the current slot then head, otherwise tail */
+   if (rt_overrun_task_admitted1(rq, p)) {
+   if (rt_overrun_task_active(p)) {
+   flags |= ENQUEUE_HEAD;
+   }
+   }
 
enqueue_rt_entity(rt_se, flags);
+#else
+   enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+#endif
 
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1367,6 +1382,18 @@ static void requeue_task_rt(struct rq *rq, struct 
task_struct *p, int head)
}
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+void dequeue_task_rt2(struct rq *rq, struct task_struct *p, int flags)
+{
+   dequeue_task_rt(rq, p, flags);
+}
+
+void requeue_task_rt2(struct rq *rq, struct task_struct *p, int head)
+{
+   requeue_task_rt(rq, p, head);
+}
+#endif
+
 static void yield_task_rt(struct rq *rq)
 {
requeue_task_rt(rq, rq->curr, 0);
@@ -2177,6 +2204,10 @@ void __init init_sched_rt_class(void)
zalloc_cpumask_var_node(&per_cpu(local_cpu_mask, i),
GFP_KERNEL, cpu_to_node(i));
}
+
+#ifdef CONFIG_RTC_CYCLIC
+   init_rt_overrun();
+#endif
 }
 #endif /* CONFIG_SMP */
 
@@ -2322,6 +2353,13 @@ static unsigned int get_rr_interval_rt(struct rq *rq, 
struct task_struct *task)
return 0;
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+static void task_dead_rt(struct task_struct *p)
+{
+   rt_overrun_entry_delete(p);
+}
+#endif
+
 const struct sched_class rt_sched_class = {
.next   = &fair_sched_class,
.enqueue_task   = enqueue_task_rt,
@@ -2344,6 +2382,9 @@ const struct sched_class rt_sched_class = {
 #endif
 
.set_curr_task  = set_curr_task_rt,
+#ifdef CONFIG_RTC_CYCLIC
+   .task_dead  = task_dead_rt,
+#endif
.task_tick  = task_tick_rt,
 
.get_rr_interval= get_rr_interval_rt,
-- 
2.5.0



[PATCH RFC v2 00/12] Cyclic Scheduler Against RTC

2016-04-17 Thread Bill Huey (hui)
64/32bit architecture related changes. Change for m68k architecture
compilation problems.

Need to mask out the lower 4 bits of the hash inserts itself into the
rbtree, better statistical reporting of overrun events, more procfs
support for setting the interrupt source, yield testing and backwards
compatible testing next.

bill

---

Sample output from the test program. The delay in the userspace program
is constant but the time frame interval is reduced in half for each row.
It only tests periodic interrupts for now. Note that the number of events
reported increases the frame interval is reduced.

'Slots' shows the frame bit pattern.


billh@machine:~$ (cd bwi && sudo ./rtctest) 2>&1 | tee out

RTC Driver Test Example.


Periodic IRQ rate is 64Hz.
Counting 20 interrupts at:
2Hz:.1 0x0001,.2 0x0001,.3 0x0001,.4 0x0001,.5 0x0001,
4Hz:.1 0x,.2 0x0001,.3 0x,.4 0x0001,.5 0x,
8Hz:.1 0x0001,.2 0x0001,.3 0x,.4 0x0001,.5 0x0001,
16Hz:   .1 0x0001,.2 0x0002,.3 0x0002,.4 0x0001,.5 0x0002,
32Hz:   .1 0x0003,.2 0x0003,.3 0x0003,.4 0x0003,.5 0x0003,
64Hz:   .1 0x0007,.2 0x0006,.3 0x0006,.4 0x0007,.5 0x0006,

 *** Test complete ***

created thread 0
thread id = 0
slots = 0x9249249249249249

created thread 1
thread id = 1
slots = 0x4924924924924924

created thread 2
thread id = 2
slots = 0x2492492492492492
tid 0, 0x0056
thread exited running SCHED_RR = 0
tid 2, 0x0057
thread exited running SCHED_RR = 2
tid 1, 0x0052
thread exited running SCHED_RR = 1
pthread done
billh@machine:~$ 

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 167 
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 612 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1466 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0



[PATCH RFC v2 11/12] Cyclic scheduler support

2016-04-17 Thread Bill Huey (hui)
Core implementation of the cyclic scheduler that includes admittance
handling, thread death supprot, cyclic timer tick handler, primitive proc
debugging interface, wait-queue modifications.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/cyclic.c| 612 +++
 kernel/sched/cyclic.h|  86 +++
 kernel/sched/cyclic_rt.h |   7 +
 3 files changed, 705 insertions(+)
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

diff --git a/kernel/sched/cyclic.c b/kernel/sched/cyclic.c
new file mode 100644
index 000..3b4c74d
--- /dev/null
+++ b/kernel/sched/cyclic.c
@@ -0,0 +1,612 @@
+/*
+ * cyclic scheduler for rtc support
+ *
+ * Copyright (C) Bill Huey
+ * Author: Bill Huey 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+#include 
+#include "sched.h"
+#include "cyclic.h"
+#include "cyclic_rt.h"
+
+#include 
+#include 
+
+DEFINE_RAW_SPINLOCK(rt_overrun_lock);
+struct rb_root rt_overrun_tree = RB_ROOT;
+
+#define MASK2 0xfff0
+
+#ifdef CONFIG_PHYS_ADDR_T_64BIT
+#define PTR_FMT "0x%08llx"
+#else
+#define PTR_FMT "0x%04x"
+#endif
+
+#define dprintk printk
+
+//CONFIG_PHYS_ADDR_T_64BIT phys_addr_t
+static int cmp_ptr2(void *p, void *q)
+{
+   return (((phys_addr_t)p & MASK2) - ((phys_addr_t)q & MASK2));
+}
+
+#define CMP_PTR2(p,q) cmp_ptr2(p, q)
+
+static
+struct task_struct *_rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+   struct task_struct *ret = NULL;
+   struct rb_node *node = root->rb_node;
+
+   while (node) { // double_rq_lock(struct rq *, struct rq *) cpu_rq
+   struct task_struct *task = container_of(node,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR2(p, task);
+
+   if (result < 0)
+   node = node->rb_left;
+   else if (result > 0)
+   node = node->rb_right;
+   else {
+   ret = task;
+   goto exit;
+   }
+   }
+exit:
+   return ret;
+}
+
+static int rt_overrun_task_runnable(struct task_struct *p)
+{
+   return task_on_rq_queued(p);
+}
+
+/* avoiding excessive debug printing, splitting the entry point */
+static
+struct task_struct *rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+dprintk("%s: \n", __func__);
+   return _rt_overrun_entry_find(root, p);
+}
+
+static int _rt_overrun_entry_insert(struct rb_root *root, struct task_struct 
*p)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+dprintk("%s: \n", __func__);
+   while (*new) {
+   struct task_struct *task = container_of(*new,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR2(p, task);
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return 0;
+   }
+
+   /* Add new node and rebalance tree. */
+   rb_link_node(&p->rt.rt_overrun.node, parent, new);
+   rb_insert_color(&p->rt.rt_overrun.node, root);
+
+   return 1;
+}
+
+static void _rt_overrun_entry_delete(struct task_struct *p)
+{
+   struct task_struct *task;
+   int i;
+
+   task = rt_overrun_entry_find(&rt_overrun_tree, p);
+
+   if (task) {
+   dprintk("%s: p color %d - comm %s - slots 0x%016llx\n",
+   __func__, task->rt.rt_overrun.color, task->comm,
+   task->rt.rt_overrun.slots);
+
+   rb_erase(&task->rt.rt_overrun.node, &rt_overrun_tree);
+   list_del(&task->rt.rt_overrun.task_list);
+   for (i = 0; i < SLOTS; ++i) {
+   if (rt_admit_rq.curr[i] == p)
+   rt_admit_rq.curr[i] = NULL;
+   }
+
+   if (rt_admit_curr == p)
+   rt_admit_curr = NULL;
+   }
+}
+
+void rt_overrun_entry_delete(struct task_struct *p)
+{
+   unsigned long flags;
+
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   _rt_overrun_entry_delete(p);
+   raw_spin_unlock_irqrestore(&rt_overrun_lock, flags);
+}
+
+/* forward */
+int rt_overrun_task_active(struct task_struct *p);
+
+#define PROCFS_MAX_SIZE  

[PATCH RFC v2 06/12] Add anonymous struct to sched_rt_entity

2016-04-17 Thread Bill Huey (hui)
Add an anonymous struct to support admittance using a red-black tree,
overrun tracking, state for whether or not to yield or block, debugging
support, execution slot pattern for the scheduler.

Signed-off-by: Bill Huey (hui) 
---
 include/linux/sched.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 52c4847..23c3173 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1307,6 +1307,21 @@ struct sched_rt_entity {
/* rq "owned" by this entity/group: */
struct rt_rq*my_q;
 #endif
+#ifdef CONFIG_RTC_CYCLIC
+   struct {
+   struct rb_node node; /* admittance structure */
+   struct list_head task_list;
+   unsigned long count; /* overrun count per slot */
+   int type, color, yield;
+   u64 slots;
+
+   /* debug */
+   unsigned long last_task_state;
+
+   /* instrumentation  */
+   unsigned int machine_state, last_machine_state;
+   } rt_overrun;
+#endif
 };
 
 struct sched_dl_entity {
-- 
2.5.0



[PATCH RFC v2 12/12] Cyclic/rtc documentation

2016-04-17 Thread Bill Huey (hui)
Initial attempt at documentation with a test program

Signed-off-by: Bill Huey (hui) 
---
 Documentation/scheduler/sched-cyclic-rtc.txt | 468 +++
 1 file changed, 468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt

diff --git a/Documentation/scheduler/sched-cyclic-rtc.txt 
b/Documentation/scheduler/sched-cyclic-rtc.txt
new file mode 100644
index 000..4d22381
--- /dev/null
+++ b/Documentation/scheduler/sched-cyclic-rtc.txt
@@ -0,0 +1,468 @@
+[in progress]
+
+"Work Conserving"
+
+When a task is active and calls read(), it will block/yield depending on
+is requested from the cyclic scheduler. A RT_OV_YIELD call to ioctl()
+specifies the behavior for the calling thread.
+
+In the case where read() is called before the time slice is over, it will
+allow other tasks to run with the leftover time.
+
+"Overrun Reporting/Apps"
+
+Calls to read() will return the overrun count and zero the counter. This
+can be used to adjust the execution time of the thread so that it can run
+within that slot so that thread can meet some deadline constraint.
+
+[no decision has been made to return a more meaningful set of numbers as
+you can just get time stamps and do the math in userspace but it could
+be changed to do so]
+
+The behavior of the read() depends on whether it has been admitted or not
+via an ioctl() using RTC_OV_ADMIT. If it is then it will return the overrun
+count. If this is not admitted then it returns value corresponding to the
+default read() behavior for rtc.
+
+See the sample test sources for details.
+
+Using a video game as an example, having a rendering engine overrunning its
+slot driving by a vertical retrace interrupt can cause visual skipping and
+hurt interactivity. Adapting the computation from the read() result can
+allow for the frame buffer swap at the frame interrupt. If read() reports
+and it can simplify calculations and adapt to fit within that slot.
+It would then allow the program to respond to events (touches, buttons)
+minimizing the possibility of perceived pauses.
+
+The slot allocation scheme for the video game must have some inherit
+definition of interactivity. That determines appropriate slot allocation
+amognst a mixture of soft/hard real-time. A general policy must be created
+for the system, and all programs, to meet a real-time criteria.
+
+"Admittance"
+
+Admittance of a task is done through a ioctl() call using RTC_OV_ADMIT.
+This passes 64 bit wide bitmap that maps onto a entries in the slot map.
+
+(slot map of two threads)
+execution direction ->
+
+1000 1000 1000 1000...
+0100 0100 0100 0100...
+
+(bit pattern of two threads)
+0001 0001 0001 0001...
+0010 0010 0010 0010...
+
+(hex)
+0x
+0x
+
+The slot map is an array of 64 entries of threads. An index is increment
+through determine what the next active thread-slot will be. The end of the
+index set in /proc/rt_overrun_proc
+
+"Slot/slice activation"
+
+Move the task to the front of the SCHED_FIFO list when active, the tail when
+inactive.
+
+"RTC Infrastructure and Interrupt Routing"
+
+The cyclic scheduler is driven by the update interrupt in the RTC
+infrastructure but can be rerouted to any periodic interrupt source.
+
+One of those applications could be when interrupts from a display refresh
+happen or some interval where an external controller such as a drum pad,
+touch event or whatever.
+
+"Embedded Environments"
+
+This is single run queue only and targeting embedded scenarios where not all
+cores are guaranteed to be available. Older Qualcomm MSM kernels have a very
+aggressive cpu hotplug as a means of fully powering off cores. The only
+guaranteed CPU to run is CPU 0.
+
+"Project History"
+
+This was originally created when I was at HP/Palm to solve issues related
+to touch event handling and lag working with the real-time media subsystem.
+The typical workaround used to prevent skipping is to use large buffers to
+prevent data underruns. The programs running at SCHED_FIFO which can
+starve the system from handling external events in a timely manner like
+buttons or touch events. The lack of a globally defined policy of how to
+use real-time resources can causes long pauses between handling touch
+events and other kinds of implicit deadline misses.
+
+By choosing some kind of slot execution pattern, it was hoped that it that
+can be controlled globally across the system so that some basic interactive
+guarantees can be met. Whether the tasks be some combination of soft or
+hard real-time, a mechanism like this can help guide how SCHED_FIFO tasks
+are run versus letting SCHED_FIFO tasks run wildly.
+
+"Future work"
+
+Possible integration with the deadline scheduler. Power management
+awareness, CPU clock governor. Turning off the scheduler tick when there
+are no runnable tasks, other things...
+
+"Power management"
+
+Governor awareness...
+
+[m

[PATCH RFC v2 00/12] Cyclic Scheduler Against RTC

2016-04-17 Thread Bill Huey (hui)
64/32bit architecture related changes. Change for m68k architecture
compilation problems.

Need to mask out the lower 4 bits of the hash inserts itself into the
rbtree, better statistical reporting of overrun events, more procfs
support for setting the interrupt source, yield testing and backwards
compatible testing next.

bill

---

Sample output from the test program. The delay in the userspace program
is constant but the time frame interval is reduced in half for each row.
It only tests periodic interrupts for now. Note that the number of events
reported increases the frame interval is reduced.

'Slots' shows the frame bit pattern.


billh@machine:~$ (cd bwi && sudo ./rtctest) 2>&1 | tee out

RTC Driver Test Example.


Periodic IRQ rate is 64Hz.
Counting 20 interrupts at:
2Hz:.1 0x0001,.2 0x0001,.3 0x0001,.4 0x0001,.5 0x0001,
4Hz:.1 0x,.2 0x0001,.3 0x,.4 0x0001,.5 0x,
8Hz:.1 0x0001,.2 0x0001,.3 0x,.4 0x0001,.5 0x0001,
16Hz:   .1 0x0001,.2 0x0002,.3 0x0002,.4 0x0001,.5 0x0002,
32Hz:   .1 0x0003,.2 0x0003,.3 0x0003,.4 0x0003,.5 0x0003,
64Hz:   .1 0x0007,.2 0x0006,.3 0x0006,.4 0x0007,.5 0x0006,

 *** Test complete ***

created thread 0
thread id = 0
slots = 0x9249249249249249

created thread 1
thread id = 1
slots = 0x4924924924924924

created thread 2
thread id = 2
slots = 0x2492492492492492
tid 0, 0x0056
thread exited running SCHED_RR = 0
tid 2, 0x0057
thread exited running SCHED_RR = 2
tid 1, 0x0052
thread exited running SCHED_RR = 1
pthread done
billh@machine:~$ 

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 167 
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 612 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1466 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0



[PATCH RFC v1 01/12] Kconfig change

2016-04-13 Thread Bill Huey (hui)
Add the selection options for the cyclic scheduler

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/Kconfig | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 544bd34..8a1b704 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -73,6 +73,11 @@ config RTC_DEBUG
  Say yes here to enable debugging support in the RTC framework
  and individual RTC drivers.
 
+config RTC_CYCLIC
+   bool "RTC cyclic executive scheduler support"
+   help
+ Frame/Cyclic executive scheduler support through the RTC interface
+
 comment "RTC interfaces"
 
 config RTC_INTF_SYSFS
-- 
2.5.0



[PATCH RFC v1 10/12] Export SCHED_FIFO/RT requeuing functions

2016-04-13 Thread Bill Huey (hui)
SCHED_FIFO/RT tail/head runqueue insertion support, initial thread death
support via a hook to the scheduler class. Thread death must include
additional semantics to remove/discharge an admitted task properly.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/rt.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..1d77adc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+extern int rt_overrun_task_admitted1(struct rq *rq, struct task_struct *p);
+#endif
+
 int sched_rr_timeslice = RR_TIMESLICE;
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
@@ -1321,8 +1326,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, 
int flags)
 
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
+#ifdef CONFIG_RTC_CYCLIC
+   /* if admitted and the current slot then head, otherwise tail */
+   if (rt_overrun_task_admitted1(rq, p)) {
+   if (rt_overrun_task_active(p)) {
+   flags |= ENQUEUE_HEAD;
+   }
+   }
 
enqueue_rt_entity(rt_se, flags);
+#else
+   enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+#endif
 
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1367,6 +1382,18 @@ static void requeue_task_rt(struct rq *rq, struct 
task_struct *p, int head)
}
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+void dequeue_task_rt2(struct rq *rq, struct task_struct *p, int flags)
+{
+   dequeue_task_rt(rq, p, flags);
+}
+
+void requeue_task_rt2(struct rq *rq, struct task_struct *p, int head)
+{
+   requeue_task_rt(rq, p, head);
+}
+#endif
+
 static void yield_task_rt(struct rq *rq)
 {
requeue_task_rt(rq, rq->curr, 0);
@@ -2177,6 +2204,10 @@ void __init init_sched_rt_class(void)
zalloc_cpumask_var_node(&per_cpu(local_cpu_mask, i),
GFP_KERNEL, cpu_to_node(i));
}
+
+#ifdef CONFIG_RTC_CYCLIC
+   init_rt_overrun();
+#endif
 }
 #endif /* CONFIG_SMP */
 
@@ -2322,6 +2353,13 @@ static unsigned int get_rr_interval_rt(struct rq *rq, 
struct task_struct *task)
return 0;
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+static void task_dead_rt(struct task_struct *p)
+{
+   rt_overrun_entry_delete(p);
+}
+#endif
+
 const struct sched_class rt_sched_class = {
.next   = &fair_sched_class,
.enqueue_task   = enqueue_task_rt,
@@ -2344,6 +2382,9 @@ const struct sched_class rt_sched_class = {
 #endif
 
.set_curr_task  = set_curr_task_rt,
+#ifdef CONFIG_RTC_CYCLIC
+   .task_dead  = task_dead_rt,
+#endif
.task_tick  = task_tick_rt,
 
.get_rr_interval= get_rr_interval_rt,
-- 
2.5.0



[PATCH RFC v1 03/12] Add cyclic support to rtc-dev.c

2016-04-13 Thread Bill Huey (hui)
wait-queue changes to rtc_dev_read so that it can support overrun count
reporting when multiple threads are blocked against a single wait object.

ioctl() additions to allow for those calling it to admit the thread to the
cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/rtc-dev.c | 167 ++
 1 file changed, 167 insertions(+)

diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index a6d9434..82c5cff 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -18,6 +18,15 @@
 #include 
 #include "rtc-core.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include 
+#include 
+
+#include <../kernel/sched/sched.h>
+#include <../kernel/sched/cyclic.h>
+//#include <../kernel/sched/cyclic_rt.h>
+#endif
+
 static dev_t rtc_devt;
 
 #define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */
@@ -29,6 +38,10 @@ static int rtc_dev_open(struct inode *inode, struct file 
*file)
struct rtc_device, char_dev);
const struct rtc_class_ops *ops = rtc->ops;
 
+#ifdef CONFIG_RTC_CYCLIC
+   reset_rt_overrun();
+#endif
+
if (test_and_set_bit_lock(RTC_DEV_BUSY, &rtc->flags))
return -EBUSY;
 
@@ -153,13 +166,26 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
 {
struct rtc_device *rtc = file->private_data;
 
+#ifdef CONFIG_RTC_CYCLIC
+   DEFINE_WAIT_FUNC(wait, single_default_wake_function);
+#else
DECLARE_WAITQUEUE(wait, current);
+#endif
unsigned long data;
+#ifdef CONFIG_RTC_CYCLIC
+   unsigned long flags;
+   int wake = 0, block = 0;
+#endif
ssize_t ret;
 
if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
return -EINVAL;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_task_yield(current))
+   goto yield;
+printk("%s: 0 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
add_wait_queue(&rtc->irq_queue, &wait);
do {
__set_current_state(TASK_INTERRUPTIBLE);
@@ -169,23 +195,65 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
rtc->irq_data = 0;
spin_unlock_irq(&rtc->irq_lock);
 
+#ifdef CONFIG_RTC_CYCLIC
+if (block) {
+   block = 0;
+   if (wake) {
+   printk("%s: wake \n", __func__);
+   wake = 0;
+   } else {
+   printk("%s: ~wake \n", __func__);
+   }
+}
+#endif
if (data != 0) {
+#ifdef CONFIG_RTC_CYCLIC
+   /* overrun reporting */
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   raw_spin_unlock_irqrestore(&rt_overrun_lock, flags);
ret = 0;
+printk("%s: 1 color = %d \n", __func__, current->rt.rt_overrun.color);
break;
}
+#else
+   ret = 0;
+   break;
+   }
+#endif
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
+#ifdef CONFIG_RTC_CYCLIC
+printk("%s: 2 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
break;
}
if (signal_pending(current)) {
+#ifdef CONFIG_RTC_CYCLIC
+printk("%s: 3 color = %d \n", __func__, current->rt.rt_overrun.color);
+#endif
ret = -ERESTARTSYS;
break;
}
+#ifdef CONFIG_RTC_CYCLIC
+   block = 1;
+#endif
schedule();
+#ifdef CONFIG_RTC_CYCLIC
+   /* debugging */
+   wake = 1;
+#endif
} while (1);
set_current_state(TASK_RUNNING);
remove_wait_queue(&rtc->irq_queue, &wait);
 
+#ifdef CONFIG_RTC_CYCLIC
+ret:
+#endif
if (ret == 0) {
/* Check for any data updates */
if (rtc->ops->read_callback)
@@ -201,6 +269,29 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
sizeof(unsigned long);
}
return ret;
+
+#ifdef CONFIG_RTC_CYCLIC
+yield:
+
+   spin_lock_irq(&rtc->irq_lock);
+   data = rtc->irq_data;
+   rtc->irq_data = 0;
+   spin_unlock_irq(&rtc->irq_lock);
+
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_c

[PATCH RFC v1 07/12] kernel/userspace additions for addition ioctl() support for rtc

2016-04-13 Thread Bill Huey (hui)
Add additional ioctl() values to rtc so that it can 'admit' the calling
thread into a red-black tree for tracking, set the execution slot pattern,
support for setting whether read() will yield or block.

Signed-off-by: Bill Huey (hui) 
---
 include/uapi/linux/rtc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/rtc.h b/include/uapi/linux/rtc.h
index f8c82e6..76c9254 100644
--- a/include/uapi/linux/rtc.h
+++ b/include/uapi/linux/rtc.h
@@ -94,6 +94,10 @@ struct rtc_pll_info {
 #define RTC_VL_READ_IOR('p', 0x13, int)/* Voltage low detector */
 #define RTC_VL_CLR _IO('p', 0x14)  /* Clear voltage low 
information */
 
+#define RTC_OV_ADMIT   _IOW('p', 0x15, unsigned long)   /* Set test   */
+#define RTC_OV_REPLEN  _IOW('p', 0x16, unsigned long)   /* Set test   */
+#define RTC_OV_YIELD   _IOW('p', 0x17, unsigned long)   /* Set test   */
+
 /* interrupt flags */
 #define RTC_IRQF 0x80  /* Any of the following is active */
 #define RTC_PF 0x40/* Periodic interrupt */
-- 
2.5.0



[PATCH RFC v1 05/12] Task tracking per file descriptor

2016-04-13 Thread Bill Huey (hui)
Task tracking per file descriptor for thread death clean up.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/class.c | 3 +++
 include/linux/rtc.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 74fd974..ad570b9 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -201,6 +201,9 @@ struct rtc_device *rtc_device_register(const char *name, 
struct device *dev,
rtc->irq_freq = 1;
rtc->max_user_freq = 64;
rtc->dev.parent = dev;
+#ifdef CONFIG_RTC_CYCLIC
+   INIT_LIST_HEAD(&rtc->rt_overrun_tasks); //struct list_head
+#endif
rtc->dev.class = rtc_class;
rtc->dev.groups = rtc_get_dev_attribute_groups();
rtc->dev.release = rtc_device_release;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index b693ada..1424550 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -114,6 +114,9 @@ struct rtc_timer {
 struct rtc_device {
struct device dev;
struct module *owner;
+#ifdef CONFIG_RTC_CYCLIC
+   struct list_head rt_overrun_tasks;
+#endif
 
int id;
char name[RTC_DEVICE_NAME_SIZE];
-- 
2.5.0



[PATCH RFC v1 00/12] Cyclic Scheduler Against RTC

2016-04-13 Thread Bill Huey (hui)
Hi,

Simple compilation updates here along with an admittance logic clean up.
The test program wasn't working properly without it. Slipped up in the rush
to get it out.

64 bit portability fixes coming next. I made a bogus assumption about
needing an RB tree for admittance. That'll go next.

I'd like to also credit Marco Ballesio from Palm as well for the
multimedia insights.  I omitted that in my first message. Hope the
build bots like these changes :)

bill

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 167 
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1474 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0



[PATCH RFC v1 08/12] Compilation support

2016-04-13 Thread Bill Huey (hui)
Makefile changes to support the menuconfig option

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 302d6eb..df8e131 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_RTC_CYCLIC) += cyclic.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq.o
-- 
2.5.0



[PATCH RFC v1 11/12] Cyclic scheduler support

2016-04-13 Thread Bill Huey (hui)
Core implementation of the cyclic scheduler that includes admittance
handling, thread death supprot, cyclic timer tick handler, primitive proc
debugging interface, wait-queue modifications.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 +++
 kernel/sched/cyclic_rt.h |   7 +
 3 files changed, 713 insertions(+)
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

diff --git a/kernel/sched/cyclic.c b/kernel/sched/cyclic.c
new file mode 100644
index 000..bf4c982
--- /dev/null
+++ b/kernel/sched/cyclic.c
@@ -0,0 +1,620 @@
+/*
+ * cyclic scheduler for rtc support
+ *
+ * Copyright (C) Bill Huey
+ * Author: Bill Huey 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+#include 
+#include "sched.h"
+#include "cyclic.h"
+#include "cyclic_rt.h"
+
+#include 
+#include 
+
+DEFINE_RAW_SPINLOCK(rt_overrun_lock);
+struct rb_root rt_overrun_tree = RB_ROOT;
+
+#define MASK2 0xfff0
+
+/* must revisit again when I get more time to fix the possbility of
+ * overflow here and 32 bit portability */
+static int cmp_ptr_unsigned_long(long *p, long *q)
+{
+   int result = ((unsigned long)p & MASK2) - ((unsigned long)q & MASK2);
+
+   WARN_ON(sizeof(long *) != 8);
+
+   if (!result)
+   return 0;
+   else if (result > 0)
+   return 1;
+   else
+   return -1;
+}
+
+static int eq_ptr_unsigned_long(long *p, long *q)
+{
+   return (((long)p & MASK2) == ((long)q & MASK2));
+}
+
+#define CMP_PTR_LONG(p,q) cmp_ptr_unsigned_long((long *)p, (long *)q)
+
+static
+struct task_struct *_rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+   struct task_struct *ret = NULL;
+   struct rb_node *node = root->rb_node;
+
+   while (node) { // double_rq_lock(struct rq *, struct rq *) cpu_rq
+   struct task_struct *task = container_of(node,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   if (result < 0)
+   node = node->rb_left;
+   else if (result > 0)
+   node = node->rb_right;
+   else {
+   ret = task;
+   goto exit;
+   }
+   }
+exit:
+   return ret;
+}
+
+static int rt_overrun_task_runnable(struct task_struct *p)
+{
+   return task_on_rq_queued(p);
+}
+
+/* avoiding excessive debug printing, splitting the entry point */
+static
+struct task_struct *rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+printk("%s: \n", __func__);
+   return _rt_overrun_entry_find(root, p);
+}
+
+static int _rt_overrun_entry_insert(struct rb_root *root, struct task_struct 
*p)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+printk("%s: \n", __func__);
+   while (*new) {
+   struct task_struct *task = container_of(*new,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return 0;
+   }
+
+   /* Add new node and rebalance tree. */
+   rb_link_node(&p->rt.rt_overrun.node, parent, new);
+   rb_insert_color(&p->rt.rt_overrun.node, root);
+
+   return 1;
+}
+
+static void _rt_overrun_entry_delete(struct task_struct *p)
+{
+   struct task_struct *task;
+   int i;
+
+   task = rt_overrun_entry_find(&rt_overrun_tree, p);
+
+   if (task) {
+   printk("%s: p color %d - comm %s - slots 0x%016llx\n",
+   __func__, task->rt.rt_overrun.color, task->comm,
+   task->rt.rt_overrun.slots);
+
+   rb_erase(&task->rt.rt_overrun.node, &rt_overrun_tree);
+   list_del(&task->rt.rt_overrun.task_list);
+   for (i = 0; i < SLOTS; ++i) {
+   if (rt_admit_rq.curr[i] == p)
+   rt_admit_rq.curr[i] = NULL;
+   }
+
+   if (rt_admit_curr == p)
+   rt_admit_curr = NULL;
+   }
+}
+
+void rt_overrun_entry_delete(struct task_str

[PATCH RFC v1 09/12] Add priority support for the cyclic scheduler

2016-04-13 Thread Bill Huey (hui)
Initial bits to prevent priority changing of cyclic scheduler tasks by
only allow them to be SCHED_FIFO. Fairly hacky at this time and will need
revisiting because of the security concerns.

Affects task death handling since it uses an additional scheduler class
hook for clean up at death. Must be SCHED_FIFO.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44db0ff..cf6cf57 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -87,6 +87,10 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -2074,6 +2078,10 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+   RB_CLEAR_NODE(&p->rt.rt_overrun.node);
+#endif
+
RB_CLEAR_NODE(&p->dl.rb_node);
init_dl_task_timer(&p->dl);
__dl_clear_params(p);
@@ -3881,6 +3889,11 @@ recheck:
if (dl_policy(policy))
return -EPERM;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_policy(p, policy))
+   return -EPERM;
+#endif
+
/*
 * Treat SCHED_IDLE as nice 20. Only allow a switch to
 * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
-- 
2.5.0



[PATCH RFC v1 04/12] Anonymous struct initialization

2016-04-13 Thread Bill Huey (hui)
Anonymous struct initialization

Signed-off-by: Bill Huey (hui) 
---
 include/linux/init_task.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f2cb8d4..ac9b0d9 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,23 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+# define INIT_RT_OVERRUN(tsk)  \
+   .rt_overrun = { \
+   .count = 0, \
+   .task_list = 
LIST_HEAD_INIT(tsk.rt.rt_overrun.task_list), \
+   .type = 0,  \
+   .color = 0, \
+   .slots = 0, \
+   .yield = 0, \
+   .machine_state = 0, \
+   .last_machine_state = 0,\
+   .last_task_state = 0,   \
+   }
+#else
+# define INIT_RT_OVERRUN(tsk)
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -210,6 +227,7 @@ extern struct task_group root_task_group;
.rt = { \
.run_list   = LIST_HEAD_INIT(tsk.rt.run_list),  \
.time_slice = RR_TIMESLICE, \
+   INIT_RT_OVERRUN(tsk)\
},  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
INIT_PUSHABLE_TASKS(tsk)\
-- 
2.5.0



[PATCH RFC v1 12/12] Cyclic/rtc documentation

2016-04-13 Thread Bill Huey (hui)
Initial attempt at documentation with a test program

Signed-off-by: Bill Huey (hui) 
---
 Documentation/scheduler/sched-cyclic-rtc.txt | 468 +++
 1 file changed, 468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt

diff --git a/Documentation/scheduler/sched-cyclic-rtc.txt 
b/Documentation/scheduler/sched-cyclic-rtc.txt
new file mode 100644
index 000..4d22381
--- /dev/null
+++ b/Documentation/scheduler/sched-cyclic-rtc.txt
@@ -0,0 +1,468 @@
+[in progress]
+
+"Work Conserving"
+
+When a task is active and calls read(), it will block/yield depending on
+is requested from the cyclic scheduler. A RT_OV_YIELD call to ioctl()
+specifies the behavior for the calling thread.
+
+In the case where read() is called before the time slice is over, it will
+allow other tasks to run with the leftover time.
+
+"Overrun Reporting/Apps"
+
+Calls to read() will return the overrun count and zero the counter. This
+can be used to adjust the execution time of the thread so that it can run
+within that slot so that thread can meet some deadline constraint.
+
+[no decision has been made to return a more meaningful set of numbers as
+you can just get time stamps and do the math in userspace but it could
+be changed to do so]
+
+The behavior of the read() depends on whether it has been admitted or not
+via an ioctl() using RTC_OV_ADMIT. If it is then it will return the overrun
+count. If this is not admitted then it returns value corresponding to the
+default read() behavior for rtc.
+
+See the sample test sources for details.
+
+Using a video game as an example, having a rendering engine overrunning its
+slot driving by a vertical retrace interrupt can cause visual skipping and
+hurt interactivity. Adapting the computation from the read() result can
+allow for the frame buffer swap at the frame interrupt. If read() reports
+and it can simplify calculations and adapt to fit within that slot.
+It would then allow the program to respond to events (touches, buttons)
+minimizing the possibility of perceived pauses.
+
+The slot allocation scheme for the video game must have some inherit
+definition of interactivity. That determines appropriate slot allocation
+amognst a mixture of soft/hard real-time. A general policy must be created
+for the system, and all programs, to meet a real-time criteria.
+
+"Admittance"
+
+Admittance of a task is done through a ioctl() call using RTC_OV_ADMIT.
+This passes 64 bit wide bitmap that maps onto a entries in the slot map.
+
+(slot map of two threads)
+execution direction ->
+
+1000 1000 1000 1000...
+0100 0100 0100 0100...
+
+(bit pattern of two threads)
+0001 0001 0001 0001...
+0010 0010 0010 0010...
+
+(hex)
+0x
+0x
+
+The slot map is an array of 64 entries of threads. An index is increment
+through determine what the next active thread-slot will be. The end of the
+index set in /proc/rt_overrun_proc
+
+"Slot/slice activation"
+
+Move the task to the front of the SCHED_FIFO list when active, the tail when
+inactive.
+
+"RTC Infrastructure and Interrupt Routing"
+
+The cyclic scheduler is driven by the update interrupt in the RTC
+infrastructure but can be rerouted to any periodic interrupt source.
+
+One of those applications could be when interrupts from a display refresh
+happen or some interval where an external controller such as a drum pad,
+touch event or whatever.
+
+"Embedded Environments"
+
+This is single run queue only and targeting embedded scenarios where not all
+cores are guaranteed to be available. Older Qualcomm MSM kernels have a very
+aggressive cpu hotplug as a means of fully powering off cores. The only
+guaranteed CPU to run is CPU 0.
+
+"Project History"
+
+This was originally created when I was at HP/Palm to solve issues related
+to touch event handling and lag working with the real-time media subsystem.
+The typical workaround used to prevent skipping is to use large buffers to
+prevent data underruns. The programs running at SCHED_FIFO which can
+starve the system from handling external events in a timely manner like
+buttons or touch events. The lack of a globally defined policy of how to
+use real-time resources can causes long pauses between handling touch
+events and other kinds of implicit deadline misses.
+
+By choosing some kind of slot execution pattern, it was hoped that it that
+can be controlled globally across the system so that some basic interactive
+guarantees can be met. Whether the tasks be some combination of soft or
+hard real-time, a mechanism like this can help guide how SCHED_FIFO tasks
+are run versus letting SCHED_FIFO tasks run wildly.
+
+"Future work"
+
+Possible integration with the deadline scheduler. Power management
+awareness, CPU clock governor. Turning off the scheduler tick when there
+are no runnable tasks, other things...
+
+"Power management"
+
+Governor awareness...
+
+[m

[PATCH RFC v1 06/12] Add anonymous struct to sched_rt_entity

2016-04-13 Thread Bill Huey (hui)
Add an anonymous struct to support admittance using a red-black tree,
overrun tracking, state for whether or not to yield or block, debugging
support, execution slot pattern for the scheduler.

Signed-off-by: Bill Huey (hui) 
---
 include/linux/sched.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 084ed9f..cff56c6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1305,6 +1305,21 @@ struct sched_rt_entity {
/* rq "owned" by this entity/group: */
struct rt_rq*my_q;
 #endif
+#ifdef CONFIG_RTC_CYCLIC
+   struct {
+   struct rb_node node; /* admittance structure */
+   struct list_head task_list;
+   unsigned long count; /* overrun count per slot */
+   int type, color, yield;
+   u64 slots;
+
+   /* debug */
+   unsigned long last_task_state;
+
+   /* instrumentation  */
+   unsigned int machine_state, last_machine_state;
+   } rt_overrun;
+#endif
 };
 
 struct sched_dl_entity {
-- 
2.5.0



[PATCH RFC v1 02/12] Reroute rtc update irqs to the cyclic scheduler handler

2016-04-13 Thread Bill Huey (hui)
Redirect rtc update irqs so that it drives the cyclic scheduler timer
handler instead. Let the handler determine which slot to activate next.
Similar to scheduler tick handling but just for the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/interface.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9ef5f6f..6d39d40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "../kernel/sched/cyclic.h"
+#endif
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -488,6 +492,9 @@ EXPORT_SYMBOL_GPL(rtc_update_irq_enable);
 void rtc_handle_legacy_irq(struct rtc_device *rtc, int num, int mode)
 {
unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int handled = 0;
+#endif
 
/* mark one irq of the appropriate mode */
spin_lock_irqsave(&rtc->irq_lock, flags);
@@ -500,7 +507,23 @@ void rtc_handle_legacy_irq(struct rtc_device *rtc, int 
num, int mode)
rtc->irq_task->func(rtc->irq_task->private_data);
spin_unlock_irqrestore(&rtc->irq_task_lock, flags);
 
+#ifdef CONFIG_RTC_CYCLIC
+   /* wake up slot_curr if overrun task */
+   if (RTC_PF) {
+   if (rt_overrun_rq_admitted()) {
+   /* advance the cursor, overrun report */
+   rt_overrun_timer_handler(rtc);
+   handled = 1;
+   }
+   }
+
+   if (!handled) {
+   wake_up_interruptible(&rtc->irq_queue);
+   }
+#else
wake_up_interruptible(&rtc->irq_queue);
+#endif
+
kill_fasync(&rtc->async_queue, SIGIO, POLL_IN);
 }
 
-- 
2.5.0



Re: [PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-13 Thread Bill Huey (hui)
Hi,

On Wed, Apr 13, 2016 at 3:08 AM, Juri Lelli  wrote:
> I'm not sure what you mean by "localized", but I believe DEADLINE should
> be used more widely to service the same kind of applications you are
> referring to. It's still a quite new addition to the scheduler, so it is
> understandable that we still have some legacy to fight. But we can get
> better in the future.

Yeah, I've known about it for a while but it's just so hard for me to imagine
using that for the kinds of cases that I'm thinking about for mixed tasks.
I just don't have an example in my head how that would work since I don't
have a view of how something like EDF would solve some of the basic
cases. That's mostly my ignorance.

The original inspiration for this was problems with how FIFO tasks would
run for long periods of time will stall with touch event handlers. The solution
in multi-media circles seemed to be that (1) using larger buffers to avoid
dropouts were the solution only to cause starvation and other problems
with other important threads.

There has to be some kind of global view of how a system should run.
It's hard for me (self ignorance) to see how something like deadline would
run for continuously running tasks like that under those scenarios and
have that define some kind of global running policy in the system.

That's why I created this for a brain dead view of how to hack this stuff,
with some kind of crude execution pattern, to somehow get some level of
acceptable interactivity yet meet basic hard requirements with audio etc.

Might be a scenario where one would use sched_switch data to help with
deciding that. We ran into a lot of problems with the Qualcomm MSM
architecture and their power management code. Some of the hacks were
pretty brutal and wasted processor time polling the second core aggressively.

I wanted to solve all of these problems more completely and outside of the
current work being done for better or worse.

>> 2) The need for a scheduler to be driven by an external interrupt from a
>> number sources directly.
>
> If you use DEADLINE to service the activity an interrupt source might
> trigger, I think you can already do this.

I'll have to think about this. Would might having a simple example here.

>> 3) The need for a global view of the system so that power management
>> decisions can be made sensibly made in multicore systems. It's not a
>> scheduler alone but ideal would have more influence over power management
>> decision on battery powered devices, etc...
>
> That's true. But it is also already something we currently are working on.
> I don't know if you are following the schedfreq/schedutil threads [1], for
> example, but there we are discussing how to integrate scheduler and
> cpufreq more closely. And you might also be interested in the EAS effort
> [2].

Not yet but I'll look for them.

> OK. Feel free to ask if you also decide to experiment with DEADLINE and
> find any problem with it.

> [1] https://lkml.org/lkml/2016/3/17/420
> https://lkml.org/lkml/2016/2/22/1037
> [2] https://lkml.org/lkml/2015/7/7/754

Thanks, reading them now but they're quite complicated and the threads
are quite long. It'll take time to digest it all

Thanks

bill


Re: [PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-13 Thread Bill Huey (hui)
[Trying to resend this so that linux-kernel mailer doesn't reject it.
ok just found plain text mode. Will cull the CC list in future
responses]

Hi Juri,

It's not for replacing deadline first of all. I'm not fully aware of the
kind of things being done with deadline and I would like links so that I
have some kind of reference

The original motivation for doing this was for a number of reasons:

1) Current FIFO/RR policies aren't exact enough for a lot of the mixed
modern multimedia scenarios I saw working a real-world load on an Android
like system. Insufficient feedback to interactive UX tasks that include
things like jackd and pulse audio for low latency applications (music,
keyboard controllers, touch events...) across a span of tasks across the
system.

Deadline seems to be more localized to a specific application's need and
seems to be hard to use but I'm inexperienced with it. The problems would
benefit from a simpler solution.

2) The need for a scheduler to be driven by an external interrupt from a
number sources directly.

3) The need for a global view of the system so that power management
decisions can be made sensibly made in multicore systems. It's not a
scheduler alone but ideal would have more influence over power management
decision on battery powered devices, etc...

4) other reasons that should be in the docs but I got sick of writing
exhaustive documentation on the matter...

That's the best I can do for now. I need to post new version with
compilations fixes. There's a lot of problems with code regarding
portability and other issues with the initial revision.

bill


[PATCH RFC v0 03/12] Add cyclic support to rtc-dev.c

2016-04-11 Thread Bill Huey (hui)
wait-queue changes to rtc_dev_read so that it can support overrun count
reporting when multiple threads are blocked against a single wait object.

ioctl() additions to allow for those calling it to admit the thread to the
cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/rtc-dev.c | 161 ++
 1 file changed, 161 insertions(+)

diff --git a/drivers/rtc/rtc-dev.c b/drivers/rtc/rtc-dev.c
index a6d9434..0fc9a8c 100644
--- a/drivers/rtc/rtc-dev.c
+++ b/drivers/rtc/rtc-dev.c
@@ -18,6 +18,15 @@
 #include 
 #include "rtc-core.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include 
+#include 
+
+#include <../kernel/sched/sched.h>
+#include <../kernel/sched/cyclic.h>
+//#include <../kernel/sched/cyclic_rt.h>
+#endif
+
 static dev_t rtc_devt;
 
 #define RTC_DEV_MAX 16 /* 16 RTCs should be enough for everyone... */
@@ -29,6 +38,10 @@ static int rtc_dev_open(struct inode *inode, struct file 
*file)
struct rtc_device, char_dev);
const struct rtc_class_ops *ops = rtc->ops;
 
+#ifdef CONFIG_RTC_CYCLIC
+   reset_rt_overrun();
+#endif
+
if (test_and_set_bit_lock(RTC_DEV_BUSY, &rtc->flags))
return -EBUSY;
 
@@ -153,13 +166,26 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
 {
struct rtc_device *rtc = file->private_data;
 
+#ifdef CONFIG_RTC_CYCLIC
+   DEFINE_WAIT_FUNC(wait, single_default_wake_function);
+#else
DECLARE_WAITQUEUE(wait, current);
+#endif
unsigned long data;
+   unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int wake = 0, block = 0;
+#endif
ssize_t ret;
 
if (count != sizeof(unsigned int) && count < sizeof(unsigned long))
return -EINVAL;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_task_yield(current))
+   goto yield;
+#endif
+printk("%s: 0 color = %d \n", __func__, current->rt.rt_overrun.color);
add_wait_queue(&rtc->irq_queue, &wait);
do {
__set_current_state(TASK_INTERRUPTIBLE);
@@ -169,23 +195,59 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
rtc->irq_data = 0;
spin_unlock_irq(&rtc->irq_lock);
 
+if (block) {
+   block = 0;
+   if (wake) {
+   printk("%s: wake \n", __func__);
+   wake = 0;
+   } else {
+   printk("%s: ~wake \n", __func__);
+   }
+}
if (data != 0) {
+#ifdef CONFIG_RTC_CYCLIC
+   /* overrun reporting */
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   raw_spin_unlock_irqrestore(&rt_overrun_lock, flags);
+   ret = 0;
+printk("%s: 1 color = %d \n", __func__, current->rt.rt_overrun.color);
+   break;
+   }
+#else
ret = 0;
break;
}
+#endif
if (file->f_flags & O_NONBLOCK) {
ret = -EAGAIN;
+printk("%s: 2 color = %d \n", __func__, current->rt.rt_overrun.color);
break;
}
if (signal_pending(current)) {
+printk("%s: 3 color = %d \n", __func__, current->rt.rt_overrun.color);
ret = -ERESTARTSYS;
break;
}
+#ifdef CONFIG_RTC_CYCLIC
+   block = 1;
+#endif
schedule();
+#ifdef CONFIG_RTC_CYCLIC
+   /* debugging */
+   wake = 1;
+#endif
} while (1);
set_current_state(TASK_RUNNING);
remove_wait_queue(&rtc->irq_queue, &wait);
 
+#ifdef CONFIG_RTC_CYCLIC
+ret:
+#endif
if (ret == 0) {
/* Check for any data updates */
if (rtc->ops->read_callback)
@@ -201,6 +263,29 @@ rtc_dev_read(struct file *file, char __user *buf, size_t 
count, loff_t *ppos)
sizeof(unsigned long);
}
return ret;
+
+#ifdef CONFIG_RTC_CYCLIC
+yield:
+
+   spin_lock_irq(&rtc->irq_lock);
+   data = rtc->irq_data;
+   rtc->irq_data = 0;
+   spin_unlock_irq(&rtc->irq_lock);
+
+   raw_spin_lock_irqsave(&rt_overrun_lock, flags);
+   if (_on_rt_overrun_admitted(current)) {
+   /* pass back to userspace */
+   data = rt_task_count(current);
+   rt_task_count(current) = 0;
+   }
+   else {
+   }
+
+   raw_spin_

[PATCH RFC v0 05/12] Task tracking per file descriptor

2016-04-11 Thread Bill Huey (hui)
Task tracking per file descriptor for thread death clean up.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/class.c | 3 +++
 include/linux/rtc.h | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 74fd974..ad570b9 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -201,6 +201,9 @@ struct rtc_device *rtc_device_register(const char *name, 
struct device *dev,
rtc->irq_freq = 1;
rtc->max_user_freq = 64;
rtc->dev.parent = dev;
+#ifdef CONFIG_RTC_CYCLIC
+   INIT_LIST_HEAD(&rtc->rt_overrun_tasks); //struct list_head
+#endif
rtc->dev.class = rtc_class;
rtc->dev.groups = rtc_get_dev_attribute_groups();
rtc->dev.release = rtc_device_release;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index b693ada..1424550 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -114,6 +114,9 @@ struct rtc_timer {
 struct rtc_device {
struct device dev;
struct module *owner;
+#ifdef CONFIG_RTC_CYCLIC
+   struct list_head rt_overrun_tasks;
+#endif
 
int id;
char name[RTC_DEVICE_NAME_SIZE];
-- 
2.5.0



[PATCH RFC v0 10/12] Export SCHED_FIFO/RT requeuing functions

2016-04-11 Thread Bill Huey (hui)
SCHED_FIFO/RT tail/head runqueue insertion support, initial thread death
support via a hook to the scheduler class. Thread death must include
additional semantics to remove/discharge an admitted task properly.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/rt.c | 41 +
 1 file changed, 41 insertions(+)

diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index c41ea7a..1d77adc 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -8,6 +8,11 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+extern int rt_overrun_task_admitted1(struct rq *rq, struct task_struct *p);
+#endif
+
 int sched_rr_timeslice = RR_TIMESLICE;
 
 static int do_sched_rt_period_timer(struct rt_bandwidth *rt_b, int overrun);
@@ -1321,8 +1326,18 @@ enqueue_task_rt(struct rq *rq, struct task_struct *p, 
int flags)
 
if (flags & ENQUEUE_WAKEUP)
rt_se->timeout = 0;
+#ifdef CONFIG_RTC_CYCLIC
+   /* if admitted and the current slot then head, otherwise tail */
+   if (rt_overrun_task_admitted1(rq, p)) {
+   if (rt_overrun_task_active(p)) {
+   flags |= ENQUEUE_HEAD;
+   }
+   }
 
enqueue_rt_entity(rt_se, flags);
+#else
+   enqueue_rt_entity(rt_se, flags & ENQUEUE_HEAD);
+#endif
 
if (!task_current(rq, p) && p->nr_cpus_allowed > 1)
enqueue_pushable_task(rq, p);
@@ -1367,6 +1382,18 @@ static void requeue_task_rt(struct rq *rq, struct 
task_struct *p, int head)
}
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+void dequeue_task_rt2(struct rq *rq, struct task_struct *p, int flags)
+{
+   dequeue_task_rt(rq, p, flags);
+}
+
+void requeue_task_rt2(struct rq *rq, struct task_struct *p, int head)
+{
+   requeue_task_rt(rq, p, head);
+}
+#endif
+
 static void yield_task_rt(struct rq *rq)
 {
requeue_task_rt(rq, rq->curr, 0);
@@ -2177,6 +2204,10 @@ void __init init_sched_rt_class(void)
zalloc_cpumask_var_node(&per_cpu(local_cpu_mask, i),
GFP_KERNEL, cpu_to_node(i));
}
+
+#ifdef CONFIG_RTC_CYCLIC
+   init_rt_overrun();
+#endif
 }
 #endif /* CONFIG_SMP */
 
@@ -2322,6 +2353,13 @@ static unsigned int get_rr_interval_rt(struct rq *rq, 
struct task_struct *task)
return 0;
 }
 
+#ifdef CONFIG_RTC_CYCLIC
+static void task_dead_rt(struct task_struct *p)
+{
+   rt_overrun_entry_delete(p);
+}
+#endif
+
 const struct sched_class rt_sched_class = {
.next   = &fair_sched_class,
.enqueue_task   = enqueue_task_rt,
@@ -2344,6 +2382,9 @@ const struct sched_class rt_sched_class = {
 #endif
 
.set_curr_task  = set_curr_task_rt,
+#ifdef CONFIG_RTC_CYCLIC
+   .task_dead  = task_dead_rt,
+#endif
.task_tick  = task_tick_rt,
 
.get_rr_interval= get_rr_interval_rt,
-- 
2.5.0



[PATCH RFC v0 06/12] Add anonymous struct to sched_rt_entity

2016-04-11 Thread Bill Huey (hui)
Add an anonymous struct to support admittance using a red-black tree,
overrun tracking, state for whether or not to yield or block, debugging
support, execution slot pattern for the scheduler.

Signed-off-by: Bill Huey (hui) 
---
 include/linux/sched.h | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 084ed9f..cff56c6 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1305,6 +1305,21 @@ struct sched_rt_entity {
/* rq "owned" by this entity/group: */
struct rt_rq*my_q;
 #endif
+#ifdef CONFIG_RTC_CYCLIC
+   struct {
+   struct rb_node node; /* admittance structure */
+   struct list_head task_list;
+   unsigned long count; /* overrun count per slot */
+   int type, color, yield;
+   u64 slots;
+
+   /* debug */
+   unsigned long last_task_state;
+
+   /* instrumentation  */
+   unsigned int machine_state, last_machine_state;
+   } rt_overrun;
+#endif
 };
 
 struct sched_dl_entity {
-- 
2.5.0



[PATCH RFC v0 02/12] Reroute rtc update irqs to the cyclic scheduler handler

2016-04-11 Thread Bill Huey (hui)
Redirect rtc update irqs so that it drives the cyclic scheduler timer
handler instead. Let the handler determine which slot to activate next.
Similar to scheduler tick handling but just for the cyclic scheduler.

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/interface.c | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index 9ef5f6f..6d39d40 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -17,6 +17,10 @@
 #include 
 #include 
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "../kernel/sched/cyclic.h"
+#endif
+
 static int rtc_timer_enqueue(struct rtc_device *rtc, struct rtc_timer *timer);
 static void rtc_timer_remove(struct rtc_device *rtc, struct rtc_timer *timer);
 
@@ -488,6 +492,9 @@ EXPORT_SYMBOL_GPL(rtc_update_irq_enable);
 void rtc_handle_legacy_irq(struct rtc_device *rtc, int num, int mode)
 {
unsigned long flags;
+#ifdef CONFIG_RTC_CYCLIC
+   int handled = 0;
+#endif
 
/* mark one irq of the appropriate mode */
spin_lock_irqsave(&rtc->irq_lock, flags);
@@ -500,7 +507,23 @@ void rtc_handle_legacy_irq(struct rtc_device *rtc, int 
num, int mode)
rtc->irq_task->func(rtc->irq_task->private_data);
spin_unlock_irqrestore(&rtc->irq_task_lock, flags);
 
+#ifdef CONFIG_RTC_CYCLIC
+   /* wake up slot_curr if overrun task */
+   if (RTC_PF) {
+   if (rt_overrun_rq_admitted()) {
+   /* advance the cursor, overrun report */
+   rt_overrun_timer_handler(rtc);
+   handled = 1;
+   }
+   }
+
+   if (!handled) {
+   wake_up_interruptible(&rtc->irq_queue);
+   }
+#else
wake_up_interruptible(&rtc->irq_queue);
+#endif
+
kill_fasync(&rtc->async_queue, SIGIO, POLL_IN);
 }
 
-- 
2.5.0



[PATCH RFC v0 01/12] Kconfig change

2016-04-11 Thread Bill Huey (hui)
Add the selection options for the cyclic scheduler

Signed-off-by: Bill Huey (hui) 
---
 drivers/rtc/Kconfig | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 544bd34..8a1b704 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -73,6 +73,11 @@ config RTC_DEBUG
  Say yes here to enable debugging support in the RTC framework
  and individual RTC drivers.
 
+config RTC_CYCLIC
+   bool "RTC cyclic executive scheduler support"
+   help
+ Frame/Cyclic executive scheduler support through the RTC interface
+
 comment "RTC interfaces"
 
 config RTC_INTF_SYSFS
-- 
2.5.0



[PATCH RFC v0 08/12] Compilation support

2016-04-11 Thread Bill Huey (hui)
Makefile changes to support the menuconfig option

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/Makefile | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/sched/Makefile b/kernel/sched/Makefile
index 302d6eb..df8e131 100644
--- a/kernel/sched/Makefile
+++ b/kernel/sched/Makefile
@@ -19,4 +19,5 @@ obj-$(CONFIG_SCHED_AUTOGROUP) += auto_group.o
 obj-$(CONFIG_SCHEDSTATS) += stats.o
 obj-$(CONFIG_SCHED_DEBUG) += debug.o
 obj-$(CONFIG_CGROUP_CPUACCT) += cpuacct.o
+obj-$(CONFIG_RTC_CYCLIC) += cyclic.o
 obj-$(CONFIG_CPU_FREQ) += cpufreq.o
-- 
2.5.0



[PATCH RFC v0 07/12] kernel/userspace additions for addition ioctl() support for rtc

2016-04-11 Thread Bill Huey (hui)
Add additional ioctl() values to rtc so that it can 'admit' the calling
thread into a red-black tree for tracking, set the execution slot pattern,
support for setting whether read() will yield or block.

Signed-off-by: Bill Huey (hui) 
---
 include/uapi/linux/rtc.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/include/uapi/linux/rtc.h b/include/uapi/linux/rtc.h
index f8c82e6..76c9254 100644
--- a/include/uapi/linux/rtc.h
+++ b/include/uapi/linux/rtc.h
@@ -94,6 +94,10 @@ struct rtc_pll_info {
 #define RTC_VL_READ_IOR('p', 0x13, int)/* Voltage low detector */
 #define RTC_VL_CLR _IO('p', 0x14)  /* Clear voltage low 
information */
 
+#define RTC_OV_ADMIT   _IOW('p', 0x15, unsigned long)   /* Set test   */
+#define RTC_OV_REPLEN  _IOW('p', 0x16, unsigned long)   /* Set test   */
+#define RTC_OV_YIELD   _IOW('p', 0x17, unsigned long)   /* Set test   */
+
 /* interrupt flags */
 #define RTC_IRQF 0x80  /* Any of the following is active */
 #define RTC_PF 0x40/* Periodic interrupt */
-- 
2.5.0



[PATCH RFC v0 12/12] Cyclic/rtc documentation

2016-04-11 Thread Bill Huey (hui)
Initial attempt at documentation with a test program

Signed-off-by: Bill Huey (hui) 
---
 Documentation/scheduler/sched-cyclic-rtc.txt | 468 +++
 1 file changed, 468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt

diff --git a/Documentation/scheduler/sched-cyclic-rtc.txt 
b/Documentation/scheduler/sched-cyclic-rtc.txt
new file mode 100644
index 000..4d22381
--- /dev/null
+++ b/Documentation/scheduler/sched-cyclic-rtc.txt
@@ -0,0 +1,468 @@
+[in progress]
+
+"Work Conserving"
+
+When a task is active and calls read(), it will block/yield depending on
+is requested from the cyclic scheduler. A RT_OV_YIELD call to ioctl()
+specifies the behavior for the calling thread.
+
+In the case where read() is called before the time slice is over, it will
+allow other tasks to run with the leftover time.
+
+"Overrun Reporting/Apps"
+
+Calls to read() will return the overrun count and zero the counter. This
+can be used to adjust the execution time of the thread so that it can run
+within that slot so that thread can meet some deadline constraint.
+
+[no decision has been made to return a more meaningful set of numbers as
+you can just get time stamps and do the math in userspace but it could
+be changed to do so]
+
+The behavior of the read() depends on whether it has been admitted or not
+via an ioctl() using RTC_OV_ADMIT. If it is then it will return the overrun
+count. If this is not admitted then it returns value corresponding to the
+default read() behavior for rtc.
+
+See the sample test sources for details.
+
+Using a video game as an example, having a rendering engine overrunning its
+slot driving by a vertical retrace interrupt can cause visual skipping and
+hurt interactivity. Adapting the computation from the read() result can
+allow for the frame buffer swap at the frame interrupt. If read() reports
+and it can simplify calculations and adapt to fit within that slot.
+It would then allow the program to respond to events (touches, buttons)
+minimizing the possibility of perceived pauses.
+
+The slot allocation scheme for the video game must have some inherit
+definition of interactivity. That determines appropriate slot allocation
+amognst a mixture of soft/hard real-time. A general policy must be created
+for the system, and all programs, to meet a real-time criteria.
+
+"Admittance"
+
+Admittance of a task is done through a ioctl() call using RTC_OV_ADMIT.
+This passes 64 bit wide bitmap that maps onto a entries in the slot map.
+
+(slot map of two threads)
+execution direction ->
+
+1000 1000 1000 1000...
+0100 0100 0100 0100...
+
+(bit pattern of two threads)
+0001 0001 0001 0001...
+0010 0010 0010 0010...
+
+(hex)
+0x
+0x
+
+The slot map is an array of 64 entries of threads. An index is increment
+through determine what the next active thread-slot will be. The end of the
+index set in /proc/rt_overrun_proc
+
+"Slot/slice activation"
+
+Move the task to the front of the SCHED_FIFO list when active, the tail when
+inactive.
+
+"RTC Infrastructure and Interrupt Routing"
+
+The cyclic scheduler is driven by the update interrupt in the RTC
+infrastructure but can be rerouted to any periodic interrupt source.
+
+One of those applications could be when interrupts from a display refresh
+happen or some interval where an external controller such as a drum pad,
+touch event or whatever.
+
+"Embedded Environments"
+
+This is single run queue only and targeting embedded scenarios where not all
+cores are guaranteed to be available. Older Qualcomm MSM kernels have a very
+aggressive cpu hotplug as a means of fully powering off cores. The only
+guaranteed CPU to run is CPU 0.
+
+"Project History"
+
+This was originally created when I was at HP/Palm to solve issues related
+to touch event handling and lag working with the real-time media subsystem.
+The typical workaround used to prevent skipping is to use large buffers to
+prevent data underruns. The programs running at SCHED_FIFO which can
+starve the system from handling external events in a timely manner like
+buttons or touch events. The lack of a globally defined policy of how to
+use real-time resources can causes long pauses between handling touch
+events and other kinds of implicit deadline misses.
+
+By choosing some kind of slot execution pattern, it was hoped that it that
+can be controlled globally across the system so that some basic interactive
+guarantees can be met. Whether the tasks be some combination of soft or
+hard real-time, a mechanism like this can help guide how SCHED_FIFO tasks
+are run versus letting SCHED_FIFO tasks run wildly.
+
+"Future work"
+
+Possible integration with the deadline scheduler. Power management
+awareness, CPU clock governor. Turning off the scheduler tick when there
+are no runnable tasks, other things...
+
+"Power management"
+
+Governor awareness...
+
+[m

[PATCH RFC v0 09/12] Add priority support for the cyclic scheduler

2016-04-11 Thread Bill Huey (hui)
Initial bits to prevent priority changing of cyclic scheduler tasks by
only allow them to be SCHED_FIFO. Fairly hacky at this time and will need
revisiting because of the security concerns.

Affects task death handling since it uses an additional scheduler class
hook for clean up at death. Must be SCHED_FIFO.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/core.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 44db0ff..cf6cf57 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -87,6 +87,10 @@
 #include "../workqueue_internal.h"
 #include "../smpboot.h"
 
+#ifdef CONFIG_RTC_CYCLIC
+#include "cyclic.h"
+#endif
+
 #define CREATE_TRACE_POINTS
 #include 
 
@@ -2074,6 +2078,10 @@ static void __sched_fork(unsigned long clone_flags, 
struct task_struct *p)
memset(&p->se.statistics, 0, sizeof(p->se.statistics));
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+   RB_CLEAR_NODE(&p->rt.rt_overrun.node);
+#endif
+
RB_CLEAR_NODE(&p->dl.rb_node);
init_dl_task_timer(&p->dl);
__dl_clear_params(p);
@@ -3881,6 +3889,11 @@ recheck:
if (dl_policy(policy))
return -EPERM;
 
+#ifdef CONFIG_RTC_CYCLIC
+   if (rt_overrun_policy(p, policy))
+   return -EPERM;
+#endif
+
/*
 * Treat SCHED_IDLE as nice 20. Only allow a switch to
 * SCHED_NORMAL if the RLIMIT_NICE would normally permit it.
-- 
2.5.0



[PATCH RFC v0 11/12] Cyclic scheduler support

2016-04-11 Thread Bill Huey (hui)
Core implementation of the cyclic scheduler that includes admittance
handling, thread death supprot, cyclic timer tick handler, primitive proc
debugging interface, wait-queue modifications.

Signed-off-by: Bill Huey (hui) 
---
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 +++
 kernel/sched/cyclic_rt.h |   7 +
 3 files changed, 713 insertions(+)
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

diff --git a/kernel/sched/cyclic.c b/kernel/sched/cyclic.c
new file mode 100644
index 000..8ce34bd
--- /dev/null
+++ b/kernel/sched/cyclic.c
@@ -0,0 +1,620 @@
+/*
+ * cyclic scheduler for rtc support
+ *
+ * Copyright (C) Bill Huey
+ * Author: Bill Huey 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+*/
+
+#include 
+#include 
+#include 
+#include "sched.h"
+#include "cyclic.h"
+#include "cyclic_rt.h"
+
+#include 
+#include 
+
+DEFINE_RAW_SPINLOCK(rt_overrun_lock);
+struct rb_root rt_overrun_tree = RB_ROOT;
+
+#define MASK2 0xfff0
+
+/* must revisit again when I get more time to fix the possbility of
+ * overflow here and 32 bit portability */
+static int cmp_ptr_unsigned_long(long *p, long *q)
+{
+   int result = ((unsigned long)p & MASK2) - ((unsigned long)q & MASK2);
+
+   WARN_ON(sizeof(long *) != 8);
+
+   if (!result)
+   return 0;
+   else if (result > 0)
+   return 1;
+   else
+   return -1;
+}
+
+static int eq_ptr_unsigned_long(long *p, long *q)
+{
+   return (((long)p & MASK2) == ((long)q & MASK2));
+}
+
+#define CMP_PTR_LONG(p,q) cmp_ptr_unsigned_long((long *)p, (long *)q)
+
+static
+struct task_struct *_rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+   struct task_struct *ret = NULL;
+   struct rb_node *node = root->rb_node;
+
+   while (node) { // double_rq_lock(struct rq *, struct rq *) cpu_rq
+   struct task_struct *task = container_of(node,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   if (result < 0)
+   node = node->rb_left;
+   else if (result > 0)
+   node = node->rb_right;
+   else {
+   ret = task;
+   goto exit;
+   }
+   }
+exit:
+   return ret;
+}
+
+static int rt_overrun_task_runnable(struct task_struct *p)
+{
+   return task_on_rq_queued(p);
+}
+
+/* avoiding excessive debug printing, splitting the entry point */
+static
+struct task_struct *rt_overrun_entry_find(struct rb_root *root,
+   struct task_struct *p)
+{
+printk("%s: \n", __func__);
+   return _rt_overrun_entry_find(root, p);
+}
+
+static int _rt_overrun_entry_insert(struct rb_root *root, struct task_struct 
*p)
+{
+   struct rb_node **new = &(root->rb_node), *parent = NULL;
+
+printk("%s: \n", __func__);
+   while (*new) {
+   struct task_struct *task = container_of(*new,
+   struct task_struct, rt.rt_overrun.node);
+
+   int result = CMP_PTR_LONG(p, task);
+
+   parent = *new;
+   if (result < 0)
+   new = &((*new)->rb_left);
+   else if (result > 0)
+   new = &((*new)->rb_right);
+   else
+   return 0;
+   }
+
+   /* Add new node and rebalance tree. */
+   rb_link_node(&p->rt.rt_overrun.node, parent, new);
+   rb_insert_color(&p->rt.rt_overrun.node, root);
+
+   return 1;
+}
+
+static void _rt_overrun_entry_delete(struct task_struct *p)
+{
+   struct task_struct *task;
+   int i;
+
+   task = rt_overrun_entry_find(&rt_overrun_tree, p);
+
+   if (task) {
+   printk("%s: p color %d - comm %s - slots 0x%016llx\n",
+   __func__, task->rt.rt_overrun.color, task->comm,
+   task->rt.rt_overrun.slots);
+
+   rb_erase(&task->rt.rt_overrun.node, &rt_overrun_tree);
+   list_del(&task->rt.rt_overrun.task_list);
+   for (i = 0; i < SLOTS; ++i) {
+   if (rt_admit_rq.curr[i] == p)
+   rt_admit_rq.curr[i] = NULL;
+   }
+
+   if (rt_admit_curr == p)
+   rt_admit_curr = NULL;
+   }
+}
+
+void rt_overrun_entry_delete(struct task_str

[PATCH RFC v0 00/12] Cyclic Scheduler Against RTC

2016-04-11 Thread Bill Huey (hui)
Hi,

This a crude cyclic scheduler implementation. It uses SCHED_FIFO tasks
and runs them according to a map pattern specified by a 64 bit mask. Each
bit corresponds to an entry into an 64 entry array of
'struct task_struct'. This works single core CPU 0 only for now.

Threads are 'admitted' to this map by an extension to the ioctl() via the
of (rtc) real-time clock interface. The bit pattern then determines when
the task will run or activate next.

The /dev/rtc interface is choosen for this purpose because of its
accessibilty to userspace. For example, the mplayer program already use
it as a timer source and could possibly benefit from being sync to a
vertical retrace interrupt during decoding. Could be an OpenGL program
needing precisely scheduler support for those same handling vertical
retrace interrupts, low latency audio and timely handling of touch
events amognst other uses.

There is also a need for some kind of blocking/yielding interface that can
return an overrun count for when the thread utilizes more time than
allocated for that frame. The read() function in rtc is overloaded for this
purpose and reports overrun events. Yield functionality has yet to be fully
tested.

I apologize for any informal or misused of terminology as I haven't fully
reviewed all of the academic literature regarding these kind of schedulers.
I welcome suggestions and corrects etc

Special thanks to includes...

Peter Ziljstra (Intel), Steve Rostedt (Red Hat), Rik van Riel (Red Hat) for
encouraging me to continue working in the Linux kernel community and being
generally positive and supportive.

KY Srinivasan (formerly Novell now Microsoft) for discussion of real-time
schedulers and pointers to specifics on that topic. It was just a single
discussion but was basically the inspiration for this kind of work.

Amir Frenkel (Palm), Kenneth Albanowski (Palm), Bdale Garbee (HP) for the
amazing place that was Palm, Kenneth for being a co-conspirator with this
scheduler. This scheduler was inspired by performance work that I did
at Palm's kernel group along with discussions with the multimedia team
before HP kill webOS off. Sad and infuriating moment.

Maybe, in a short while, the community will understand the value of these
patches for -rt and start solving the general phenomenon of high performance
multi-media and user interactivity problems more properly with both a
scheduler like this and -rt shipped as default in the near future.

[Also, I'd love some kind of sponsorship to continue what I think is
critical work versus heading back into the valley]

---

Bill Huey (hui) (12):
  Kconfig change
  Reroute rtc update irqs to the cyclic scheduler handler
  Add cyclic support to rtc-dev.c
  Anonymous struct initialization
  Task tracking per file descriptor
  Add anonymous struct to sched_rt_entity
  kernel/userspace additions for addition ioctl() support for rtc
  Compilation support
  Add priority support for the cyclic scheduler
  Export SCHED_FIFO/RT requeuing functions
  Cyclic scheduler support
  Cyclic/rtc documentation

 Documentation/scheduler/sched-cyclic-rtc.txt | 468 
 drivers/rtc/Kconfig  |   5 +
 drivers/rtc/class.c  |   3 +
 drivers/rtc/interface.c  |  23 +
 drivers/rtc/rtc-dev.c| 161 +++
 include/linux/init_task.h|  18 +
 include/linux/rtc.h  |   3 +
 include/linux/sched.h|  15 +
 include/uapi/linux/rtc.h |   4 +
 kernel/sched/Makefile|   1 +
 kernel/sched/core.c  |  13 +
 kernel/sched/cyclic.c| 620 +++
 kernel/sched/cyclic.h|  86 
 kernel/sched/cyclic_rt.h |   7 +
 kernel/sched/rt.c|  41 ++
 15 files changed, 1468 insertions(+)
 create mode 100644 Documentation/scheduler/sched-cyclic-rtc.txt
 create mode 100644 kernel/sched/cyclic.c
 create mode 100644 kernel/sched/cyclic.h
 create mode 100644 kernel/sched/cyclic_rt.h

-- 
2.5.0



[PATCH RFC v0 04/12] Anonymous struct initialization

2016-04-11 Thread Bill Huey (hui)
Anonymous struct initialization

Signed-off-by: Bill Huey (hui) 
---
 include/linux/init_task.h | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index f2cb8d4..308caf6 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -183,6 +183,23 @@ extern struct task_group root_task_group;
 # define INIT_KASAN(tsk)
 #endif
 
+#ifdef CONFIG_RTC_CYCLIC
+# define INIT_RT_OVERRUN(tsk)  \
+   .rt_overrun = { \
+   .count = 0, \
+   .task_list = 
LIST_HEAD_INIT(tsk.rt.rt_overrun.task_list), \
+   .type = 0,  \
+   .color = 0, \
+   .slots = 0, \
+   .yield = 0, \
+   .machine_state = 0, \
+   .last_machine_state = 0,\
+   .last_task_state = 0,   \
+   },
+#else
+# define INIT_RT_OVERRUN
+#endif
+
 /*
  *  INIT_TASK is used to set up the first task table, touch at
  * your own risk!. Base=0, limit=0x1f (=2MB)
@@ -210,6 +227,7 @@ extern struct task_group root_task_group;
.rt = { \
.run_list   = LIST_HEAD_INIT(tsk.rt.run_list),  \
.time_slice = RR_TIMESLICE, \
+   INIT_RT_OVERRUN(tsk)\
},  \
.tasks  = LIST_HEAD_INIT(tsk.tasks),\
INIT_PUSHABLE_TASKS(tsk)\
-- 
2.5.0



[PATCH 2/3] zsmalloc: make its page "PageMobile"

2015-11-27 Thread Hui Zhu
The idea of this patch is same with prev version [1].  But it use the
migration frame in [1].

[1] http://comments.gmane.org/gmane.linux.kernel.mm/140014
[2] https://lkml.org/lkml/2015/7/7/21

Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 214 --
 1 file changed, 209 insertions(+), 5 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 57c91a5..5034aac 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -53,10 +53,13 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
 #include 
+#include 
+#include 
 
 /*
  * This must be power of 2 and greater than of equal to sizeof(link_free).
@@ -217,6 +220,8 @@ struct size_class {
 
/* huge object: pages_per_zspage == 1 && maxobj_per_zspage == 1 */
bool huge;
+
+   atomic_t count;
 };
 
 /*
@@ -281,6 +286,10 @@ struct zs_migration {
 #define ZS_MIGRATION(p) ((struct zs_migration *)((p)->freelist))
 #define ZS_META(p) ((struct zs_meta *)&(ZS_MIGRATION(p)->index))
 
+static struct inode *zs_inode;
+static DEFINE_SPINLOCK(zs_migration_lock);
+static DEFINE_RWLOCK(zs_tag_rwlock);
+
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
struct vm_struct *vm; /* vm area for mapping object that span pages */
@@ -307,7 +316,7 @@ static void destroy_handle_cache(struct zs_pool *pool)
 static unsigned long alloc_handle(struct zs_pool *pool)
 {
return (unsigned long)kmem_cache_alloc(pool->handle_cachep,
-   pool->flags & ~__GFP_HIGHMEM);
+   pool->flags & ~(__GFP_HIGHMEM | __GFP_MOVABLE));
 }
 
 static void free_handle(struct zs_pool *pool, unsigned long handle)
@@ -914,9 +923,12 @@ static void reset_page(struct page *page)
clear_bit(PG_private, &page->flags);
clear_bit(PG_private_2, &page->flags);
set_page_private(page, 0);
-   free_migration(page->freelist);
-   page->freelist = NULL;
+   if (page->freelist) {
+   free_migration(page->freelist);
+   page->freelist = NULL;
+   }
page_mapcount_reset(page);
+   page->mapping = NULL;
 }
 
 static void free_zspage(struct page *first_page)
@@ -927,6 +939,8 @@ static void free_zspage(struct page *first_page)
BUG_ON(!is_first_page(first_page));
BUG_ON(get_inuse_obj(first_page));
 
+   spin_lock(&zs_migration_lock);
+
head_extra = (struct page *)page_private(first_page);
 
reset_page(first_page);
@@ -934,7 +948,7 @@ static void free_zspage(struct page *first_page)
 
/* zspage with only 1 system page */
if (!head_extra)
-   return;
+   goto out;
 
list_for_each_entry_safe(nextm, tmp, &ZS_MIGRATION(head_extra)->lru,
 lru) {
@@ -945,6 +959,9 @@ static void free_zspage(struct page *first_page)
}
reset_page(head_extra);
__free_page(head_extra);
+
+out:
+   spin_unlock(&zs_migration_lock);
 }
 
 /* Initialize a newly allocated zspage */
@@ -1018,6 +1035,7 @@ static struct page *alloc_zspage(struct size_class 
*class, gfp_t flags)
page = alloc_page(flags);
if (!page)
goto cleanup;
+   page->mapping = zs_inode->i_mapping;
page->freelist = alloc_migration(flags);
if (!page->freelist) {
__free_page(page);
@@ -1327,6 +1345,7 @@ void *zs_map_object(struct zs_pool *pool, unsigned long 
handle,
BUG_ON(in_interrupt());
 
/* From now on, migration cannot move the object */
+   read_lock(&zs_tag_rwlock);
pin_tag(handle);
 
obj = handle_to_obj(handle);
@@ -1395,6 +1414,7 @@ void zs_unmap_object(struct zs_pool *pool, unsigned long 
handle)
}
put_cpu_var(zs_map_area);
unpin_tag(handle);
+   read_unlock(&zs_tag_rwlock);
 }
 EXPORT_SYMBOL_GPL(zs_unmap_object);
 
@@ -1431,6 +1451,16 @@ static unsigned long obj_malloc(struct page *first_page,
 }
 
 
+static void set_zspage_mobile(struct size_class *class, struct page *page)
+{
+   BUG_ON(!is_first_page(page));
+
+   while (page) {
+   __SetPageMobile(page);
+   page = get_next_page(page);
+   }
+}
+
 /**
  * zs_malloc - Allocate block of given size from pool.
  * @pool: pool to allocate from
@@ -1474,6 +1504,7 @@ unsigned long zs_malloc(struct zs_pool *pool, size_t size)
&pool->pages_allocated);
 
spin_lock(&class->lock);
+   set_zspage_mobile(class, first_page);
zs_stat_inc(class, OBJ_ALLOCATED, get_maxobj_per_zspage(
class->size, class->pages_per_zspage));
}
@@ -1526,6 +1557,7 @@ void zs_free(struct zs_pool *pool, unsigned long handle)
if (unlikely(!handle))
 

[PATCH v3 0/3] zsmalloc: make its pages can be migrated

2015-11-27 Thread Hui Zhu
These patches updated according to the review for the prev version [1].
So they are based on "[RFCv3 0/5] enable migration of driver pages" [2]
and "[RFC zsmalloc 0/4] meta diet" [3].

Hui Zhu (3):
zsmalloc: make struct can move
zsmalloc: mark its page "PageMobile"
zram: make create "__GFP_MOVABLE" pool

[1] http://comments.gmane.org/gmane.linux.kernel.mm/140014
[2] https://lkml.org/lkml/2015/7/7/21
[3] https://lkml.org/lkml/2015/8/10/90

 drivers/block/zram/zram_drv.c |4 
 mm/zsmalloc.c |  392 +-
 2 files changed, 316 insertions(+), 80 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] zsmalloc: make struct can be migrated

2015-11-27 Thread Hui Zhu
After "[RFC zsmalloc 0/4] meta diet" [1], the struct it close to
be migrated.
But the LRU is still used.  And to use the migration frame in [2], need
a way to get class through page struct.
So this patch add a new struct zs_migration and store it in struct page.

[1] https://lkml.org/lkml/2015/8/10/90
[2] https://lkml.org/lkml/2015/7/7/21

Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 178 ++
 1 file changed, 104 insertions(+), 74 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 1b18144..57c91a5 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -17,10 +17,10 @@
  *
  * Usage of struct page fields:
  * page->first_page: points to the first component (0-order) page
- * page->index (union with page->freelist): offset of the first object
- * starting in this page.
- * page->lru: links together all component pages (except the first page)
- * of a zspage
+ * ZS_MIGRATION(page)->index: offset of the first object starting in
+ * this page
+ * ZS_MIGRATION(page)->lru: links together all component pages (except
+ * the first page) of a zspage
  *
  * For _first_ page only:
  *
@@ -28,9 +28,9 @@
  * component page after the first page
  * If the page is first_page for huge object, it stores handle.
  * Look at size_class->huge.
- * page->lru: links together first pages of various zspages.
+ * ZS_MIGRATION(page)->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
- * page->freelist: override by struct zs_meta
+ * ZS_MIGRATION(page)->index: override by struct zs_meta
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
@@ -136,7 +136,7 @@
 #define INUSE_BITS 11
 #define INUSE_MASK ((1 << INUSE_BITS) - 1)
 #define ETC_BITS   ((sizeof(unsigned long) * 8) - FREE_OBJ_IDX_BITS - \
-   CLASS_IDX_BITS - FULLNESS_BITS - INUSE_BITS)
+   FULLNESS_BITS - INUSE_BITS)
 /*
  * On systems with 4K page size, this gives 255 size classes! There is a
  * trader-off here:
@@ -266,12 +266,21 @@ struct zs_pool {
  */
 struct zs_meta {
unsigned long free_idx:FREE_OBJ_IDX_BITS;
-   unsigned long class_idx:CLASS_IDX_BITS;
unsigned long fullness:FULLNESS_BITS;
unsigned long inuse:INUSE_BITS;
unsigned long etc:ETC_BITS;
 };
 
+struct zs_migration {
+   unsigned long index;
+   struct size_class *class;
+   struct list_head lru;
+   struct page *page;
+};
+
+#define ZS_MIGRATION(p) ((struct zs_migration *)((p)->freelist))
+#define ZS_META(p) ((struct zs_meta *)&(ZS_MIGRATION(p)->index))
+
 struct mapping_area {
 #ifdef CONFIG_PGTABLE_MAPPING
struct vm_struct *vm; /* vm area for mapping object that span pages */
@@ -311,6 +320,19 @@ static void record_obj(unsigned long handle, unsigned long 
obj)
*(unsigned long *)handle = obj;
 }
 
+struct kmem_cache *zs_migration_cachep;
+
+static struct migration *alloc_migration(gfp_t flags)
+{
+   return (struct migration *)kmem_cache_alloc(zs_migration_cachep,
+   flags & ~__GFP_HIGHMEM);
+}
+
+static void free_migration(struct migration *migration)
+{
+   kmem_cache_free(zs_migration_cachep, (void *)migration);
+}
+
 /* zpool driver */
 
 #ifdef CONFIG_ZPOOL
@@ -414,7 +436,7 @@ static int get_inuse_obj(struct page *page)
 
BUG_ON(!is_first_page(page));
 
-   m = (struct zs_meta *)&page->freelist;
+   m = ZS_META(page);
 
return m->inuse;
 }
@@ -425,48 +447,22 @@ static void set_inuse_obj(struct page *page, int inc)
 
BUG_ON(!is_first_page(page));
 
-   m = (struct zs_meta *)&page->freelist;
+   m = ZS_META(page);
m->inuse += inc;
 }
 
 static void set_free_obj_idx(struct page *first_page, int idx)
 {
-   struct zs_meta *m = (struct zs_meta *)&first_page->freelist;
+   struct zs_meta *m = ZS_META(first_page);
m->free_idx = idx;
 }
 
 static unsigned long get_free_obj_idx(struct page *first_page)
 {
-   struct zs_meta *m = (struct zs_meta *)&first_page->freelist;
+   struct zs_meta *m = ZS_META(first_page);
return m->free_idx;
 }
 
-static void get_zspage_mapping(struct page *page, unsigned int *class_idx,
-   enum fullness_group *fullness)
-{
-   struct zs_meta *m;
-   BUG_ON(!is_first_page(page));
-
-   m = (struct zs_meta *)&page->freelist;
-   *fullness = m->fullness;
-   *class_idx = m->class_idx;
-}
-
-static void set_zspage_mapping(struct page *page, unsigned int class_idx,
-   enum fullness_group fullness)
-{
-   struct zs_meta *m;
-
-   BUG_ON(!is_first_page(page));
-
-   BUG_ON(

[PATCH 3/3] zram: make create "__GFP_MOVABLE" pool

2015-11-27 Thread Hui Zhu
Change the flags when call zs_create_pool to make zram alloc movable
zsmalloc page.

Signed-off-by: Hui Zhu 
---
 drivers/block/zram/zram_drv.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9fa15bb..8f3f524 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -514,7 +514,9 @@ static struct zram_meta *zram_meta_alloc(char *pool_name, 
u64 disksize)
goto out_error;
}
 
-   meta->mem_pool = zs_create_pool(pool_name, GFP_NOIO | __GFP_HIGHMEM);
+   meta->mem_pool
+   = zs_create_pool(pool_name,
+GFP_NOIO | __GFP_HIGHMEM | __GFP_MOVABLE);
if (!meta->mem_pool) {
pr_err("Error creating memory pool\n");
goto out_error;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC v2 1/3] migrate: new struct migration and add it to struct page

2015-10-19 Thread Hui Zhu
On Thu, Oct 15, 2015 at 5:53 PM, Minchan Kim  wrote:
> On Thu, Oct 15, 2015 at 11:27:15AM +0200, Vlastimil Babka wrote:
>> On 10/15/2015 11:09 AM, Hui Zhu wrote:
>> >I got that add function interfaces is really not a good idea.
>> >So I add a new struct migration to put all migration interfaces and add
>> >this struct to struct page as union of "mapping".
>>
>> That's better, but not as flexible as the previously proposed
>> approaches that Sergey pointed you at:
>>
>>  http://lkml.iu.edu/hypermail/linux/kernel/1507.0/03233.html
>>  http://lkml.iu.edu/hypermail/linux/kernel/1508.1/00696.html
>>
>> There the operations are reachable via mapping, so we can support
>> the special operations migration also when mapping is otherwise
>> needed; your patch excludes mapping.
>>
>
> Hello Hui,
>
> FYI, I take over the work from Gioh and have a plan to improve the work.
> So, Could you wait a bit? Of course, if you have better idea, feel free
> to post it.
>
> Thanks.

Hi Minchan and Vlastimil,

If you don't mind. I want to wait the patches and focus on page
movable of zsmalloc part.
What do you think about it?

Best,
Hui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] zsmalloc: remove unless line in obj_free

2015-10-13 Thread Hui Zhu
Thanks.  I will post a new version later.

Best,
Hui

On Tue, Oct 13, 2015 at 4:00 PM, Sergey Senozhatsky
 wrote:
> On (10/13/15 14:31), Hui Zhu wrote:
>> Signed-off-by: Hui Zhu 
>
> s/unless/useless/
>
> other than that
>
> Reviewed-by: Sergey Senozhatsky 
>
> -ss
>
>> ---
>>  mm/zsmalloc.c | 3 ---
>>  1 file changed, 3 deletions(-)
>>
>> diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
>> index f135b1b..c7338f0 100644
>> --- a/mm/zsmalloc.c
>> +++ b/mm/zsmalloc.c
>> @@ -1428,8 +1428,6 @@ static void obj_free(struct zs_pool *pool, struct 
>> size_class *class,
>>   struct page *first_page, *f_page;
>>   unsigned long f_objidx, f_offset;
>>   void *vaddr;
>> - int class_idx;
>> - enum fullness_group fullness;
>>
>>   BUG_ON(!obj);
>>
>> @@ -1437,7 +1435,6 @@ static void obj_free(struct zs_pool *pool, struct 
>> size_class *class,
>>   obj_to_location(obj, &f_page, &f_objidx);
>>   first_page = get_first_page(f_page);
>>
>> - get_zspage_mapping(first_page, &class_idx, &fullness);
>>   f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
>>
>>   vaddr = kmap_atomic(f_page);
>> --
>> 1.9.1
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] zsmalloc: remove useless line in obj_free

2015-10-13 Thread Hui Zhu
Signed-off-by: Hui Zhu 
Reviewed-by: Sergey Senozhatsky 
---
 mm/zsmalloc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..c7338f0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1428,8 +1428,6 @@ static void obj_free(struct zs_pool *pool, struct 
size_class *class,
struct page *first_page, *f_page;
unsigned long f_objidx, f_offset;
void *vaddr;
-   int class_idx;
-   enum fullness_group fullness;
 
BUG_ON(!obj);
 
@@ -1437,7 +1435,6 @@ static void obj_free(struct zs_pool *pool, struct 
size_class *class,
obj_to_location(obj, &f_page, &f_objidx);
first_page = get_first_page(f_page);
 
-   get_zspage_mapping(first_page, &class_idx, &fullness);
f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
 
vaddr = kmap_atomic(f_page);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] zsmalloc: remove unless line in obj_free

2015-10-12 Thread Hui Zhu
Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..c7338f0 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1428,8 +1428,6 @@ static void obj_free(struct zs_pool *pool, struct 
size_class *class,
struct page *first_page, *f_page;
unsigned long f_objidx, f_offset;
void *vaddr;
-   int class_idx;
-   enum fullness_group fullness;
 
BUG_ON(!obj);
 
@@ -1437,7 +1435,6 @@ static void obj_free(struct zs_pool *pool, struct 
size_class *class,
obj_to_location(obj, &f_page, &f_objidx);
first_page = get_first_page(f_page);
 
-   get_zspage_mapping(first_page, &class_idx, &fullness);
f_offset = obj_idx_to_offset(f_page, f_objidx, class->size);
 
vaddr = kmap_atomic(f_page);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] zsmalloc: fix obj_to_head use page_private(page) as value but not pointer

2015-10-06 Thread Hui Zhu
In function obj_malloc:
if (!class->huge)
/* record handle in the header of allocated chunk */
link->handle = handle;
else
/* record handle in first_page->private */
set_page_private(first_page, handle);
The huge's page save handle to private directly.

But in obj_to_head:
if (class->huge) {
VM_BUG_ON(!is_first_page(page));
return *(unsigned long *)page_private(page);
} else
return *(unsigned long *)obj;
It is used as a pointer.

The reason why there is no problem until now is huge-class page is
born with ZS_FULL so it couldn't be migrated.
Therefore, it shouldn't be real bug in practice.
However, we need this patch for future-work "VM-aware zsmalloced
page migration" to reduce external fragmentation.

Signed-off-by: Hui Zhu 
Acked-by: Minchan Kim 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..e881d4f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -824,7 +824,7 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
 {
if (class->huge) {
VM_BUG_ON(!is_first_page(page));
-   return *(unsigned long *)page_private(page);
+   return page_private(page);
} else
return *(unsigned long *)obj;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] zsmalloc: fix obj_to_head use page_private(page) as value but not pointer

2015-10-06 Thread Hui Zhu
On Tue, Oct 6, 2015 at 9:54 PM, Minchan Kim  wrote:
> Hello,
>
> On Mon, Oct 05, 2015 at 04:23:01PM +0800, Hui Zhu wrote:
>> In function obj_malloc:
>>   if (!class->huge)
>>   /* record handle in the header of allocated chunk */
>>   link->handle = handle;
>>   else
>>   /* record handle in first_page->private */
>>   set_page_private(first_page, handle);
>> The huge's page save handle to private directly.
>>
>> But in obj_to_head:
>>   if (class->huge) {
>>   VM_BUG_ON(!is_first_page(page));
>>   return page_private(page);
>
> Typo.
> return *(unsigned long*)page_private(page);
>
> Please fix the description.
>
>>   } else
>>   return *(unsigned long *)obj;
>> It is used as a pointer.
>>
>> So change obj_to_head use page_private(page) as value but not pointer
>> in obj_to_head.
>
> The reason why there is no problem until now is huge-class page is
> born with ZS_FULL so it couldn't be migrated.
> Therefore, it shouldn't be real bug in practice.
> However, we need this patch for future-work "VM-aware zsmalloced
> page migration" to reduce external fragmentation.
>
>>
>> Signed-off-by: Hui Zhu 
>
> With fixing the comment,
>
> Acked-by: Minchan Kim 
>
> Thanks for the fix, Hui.
>

Thanks!  I will post a new version.

Best,
Hui

> --
> Kind regards,
> Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] zsmalloc: fix obj_to_head use page_private(page) as value but not pointer

2015-10-05 Thread Hui Zhu
In function obj_malloc:
if (!class->huge)
/* record handle in the header of allocated chunk */
link->handle = handle;
else
/* record handle in first_page->private */
set_page_private(first_page, handle);
The huge's page save handle to private directly.

But in obj_to_head:
if (class->huge) {
VM_BUG_ON(!is_first_page(page));
return page_private(page);
} else
return *(unsigned long *)obj;
It is used as a pointer.

So change obj_to_head use page_private(page) as value but not pointer
in obj_to_head.

Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..e881d4f 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -824,7 +824,7 @@ static unsigned long obj_to_head(struct size_class *class, 
struct page *page,
 {
if (class->huge) {
VM_BUG_ON(!is_first_page(page));
-   return *(unsigned long *)page_private(page);
+   return page_private(page);
} else
return *(unsigned long *)obj;
 }
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] zsmalloc: add comments for ->inuse to zspage

2015-09-23 Thread Hui Zhu
Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..f62f2fb 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -38,6 +38,7 @@
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
+ * page->inuse: the objects number that is used in this zspage
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] zsmalloc: add comments for ->inuse to zspage

2015-09-21 Thread Hui Zhu
Signed-off-by: Hui Zhu 
---
 mm/zsmalloc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index f135b1b..1f66d5b 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -38,6 +38,7 @@
  * page->lru: links together first pages of various zspages.
  * Basically forming list of zspages in a fullness group.
  * page->mapping: class index and fullness group of the zspage
+ * page->inuse: the pages number that is used in this zspage
  *
  * Usage of struct page flags:
  * PG_private: identifies the first component page
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [V2 PATCH] ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec

2015-07-01 Thread Hui Wang

On 06/29/2015 08:49 AM, Hui Wang wrote:

On 06/27/2015 11:03 AM, Raymond Yau wrote:

Most Thinkpad Edge series laptops use conexant codec, so far
although



Is there anything I can debug or any information I can collect
from my
box to examine this?
What is the linux distribution on your machine? And use 
showkey to

catch
the keycode of that button.

I'm running OpenSUSE 13.1. The reported keycode of the power
button is
116.
It seems the keycode is correct, it is power keycode rather the 
mute

keycode.

Could you please do some debug, let us find which line of code is

the

root cause for this problem. for example:

after running the line, the problem shows up:

1. if (ACPI_SUCCESS(acpi_get_devices("LEN0068", acpi_check_cb,
&found, NULL)) && found) // in the 
sound/pci/hda/thinkpad_helper.c,

is_thinkpad()

This evaluates to true


2. return ACPI_SUCCESS(acpi_get_devices("IBM0068",

acpi_check_cb,

&found, NULL)) && found; // same as above

3. if (led_set_func(TPACPI_LED_MUTE, false) >= 0) { 
//in the

sound/pci/hda/thinkpad_helper.c, hda_fixup_thinkpad_acpi()

...and this


4. if (led_set_func(TPACPI_LED_MICMUTE, false) >= 0) { // same as
above


...and this as well. spec->num_adc_nids is 1.
If we change the code like below, does the power button can work 
well?


in the thinkpad_helper.c, hda_fixup_thinkpad_acpi()


  if (led_set_func(TPACPI_LED_MUTE, false) >= 0) {
  /*
  old_vmaster_hook = spec->vmaster_mute.hook;
  spec->vmaster_mute.hook = update_tpacpi_mute_led;
  removefunc = false;
 */

Disabling only this block resolves the issue.

So Below two lines make the power button change to the reset button.

drivers/platform/x86/thinkpad_acpi.c  mute_led_on_off():

acpi_get_handle(hkey_handle, "SSMS", &temp);
acpi_evalf(hkey_handle, &output, "SSMS", "dd", 1);


http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/platform/x86/thinkpad_acpi.c?id=9a417ec0c9d1f7af5394333411fc4d98adb8761b 



It seem that software mute also depend on HAUM and SAUM ACPI interface

Seem regression of the above patch is SSMS is not supported

https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1450947

Which models of thinkpad are tested by the author ?
A lot, we tested this patch on all Lenovo machines with mute led we 
have. I need to check the models name and provide them at a later time.


What I can remeber now are x230, x240, x250, L560 and x1



___
Alsa-devel mailing list
alsa-de...@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [V2 PATCH] ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec

2015-06-28 Thread Hui Wang

On 06/27/2015 11:03 AM, Raymond Yau wrote:

Most Thinkpad Edge series laptops use conexant codec, so far
although



Is there anything I can debug or any information I can collect
from my
box to examine this?

What is the linux distribution on your machine? And use showkey to
catch
the keycode of that button.

I'm running OpenSUSE 13.1. The reported keycode of the power
button is
116.

It seems the keycode is correct, it is power keycode rather the mute
keycode.

Could you please do some debug, let us find which line of code is

the

root cause for this problem. for example:

after running the line, the problem shows up:

1. if (ACPI_SUCCESS(acpi_get_devices("LEN0068", acpi_check_cb,
&found, NULL)) && found) // in the sound/pci/hda/thinkpad_helper.c,
is_thinkpad()

This evaluates to true


2. return ACPI_SUCCESS(acpi_get_devices("IBM0068",

acpi_check_cb,

&found, NULL)) && found; // same as above

3. if (led_set_func(TPACPI_LED_MUTE, false) >= 0) { //in the
sound/pci/hda/thinkpad_helper.c, hda_fixup_thinkpad_acpi()

...and this


4. if (led_set_func(TPACPI_LED_MICMUTE, false) >= 0) { // same as
above


...and this as well. spec->num_adc_nids is 1.

If we change the code like below, does the power button can work well?

in the thinkpad_helper.c, hda_fixup_thinkpad_acpi()


  if (led_set_func(TPACPI_LED_MUTE, false) >= 0) {
  /*
  old_vmaster_hook = spec->vmaster_mute.hook;
  spec->vmaster_mute.hook = update_tpacpi_mute_led;
  removefunc = false;
 */

Disabling only this block resolves the issue.

So Below two lines make the power button change to the reset button.

drivers/platform/x86/thinkpad_acpi.c  mute_led_on_off():

acpi_get_handle(hkey_handle, "SSMS", &temp);
acpi_evalf(hkey_handle, &output, "SSMS", "dd", 1);



http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/platform/x86/thinkpad_acpi.c?id=9a417ec0c9d1f7af5394333411fc4d98adb8761b

It seem that software mute also depend on HAUM and SAUM ACPI interface

Seem regression of the above patch is SSMS is not supported

https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1450947

Which models of thinkpad are tested by the author ?
A lot, we tested this patch on all Lenovo machines with mute led we 
have. I need to check the models name and provide them at a later time.



___
Alsa-devel mailing list
alsa-de...@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Call for Topics and Sponsors

2015-06-25 Thread Hui Zhu
*
Call for Topics and Sponsors

Workshop on Open Source Development Tools 2015
Beijing, China
Sep. 12, 2015 (TBD)
HelloGCC Work Group (www.hellogcc.org)
*
Open Source Development Tools Workshop is a meeting for open
source software developers. You can share your work, study and
learning experience of open source software development here.
Our main topics is open source development tools.

The content of topics can be:
* GNU toolchain (gcc, binutils, gdb, etc)
* Clang/LLVM toolchain
* Other tools of open source development, debug and simulation

The form of topics can be:
* the introduction of your own work
* the introduction of your work did in the past
* tutorial, experience and etc
* other forms of presentation, such as lightning talk

If you have some topics, please contact us:
* send email to hello...@freelists.org (need to subscribe
http://www.freelists.org/list/hellogcc first)
* login into freenode IRC #hellogcc room

Important Date:
* the deadline of topics and sponsors solicitation: Aug 1st, 2015

Previous Meetings:
* OSDT 2014: http://www.hellogcc.org/?p=33910
* HelloGCC 2013: http://www.hellogcc.org/?p=33518
* HelloGCC 2012: http://linux.chinaunix.net/hellogcc2012
* HelloGCC 2011: http://linux.chinaunix.net/hellogcc2011
* HelloGCC 2010: http://linux.chinaunix.net/hellogcc2010
* HelloGCC 2009: http://www.aka-kernel.org/news/hellogcc/index.html

If you want to sponsor us, we will very appreciate and please contact us via
hellogcc.workgr...@gmail.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [V2 PATCH] ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec

2015-06-25 Thread Hui Wang

On 06/25/2015 07:02 PM, Jan Kiszka wrote:

On 2015-06-24 10:46, Hui Wang wrote:

On 06/24/2015 01:37 PM, Jan Kiszka wrote:

On 2015-05-23 18:22, Jan Kiszka wrote:

On 2015-05-23 18:06, Raymond Yau wrote:

2015-5-23 下午4:50 於 "Jan Kiszka"  寫道:

Most Thinkpad Edge series laptops use conexant codec, so
far
although



Is there anything I can debug or any information I can collect
from my
box to examine this?

What is the linux distribution on your machine? And use
showkey to
catch
the keycode of that button.

I'm running OpenSUSE 13.1. The reported keycode of the power
button is
116.

It seems the keycode is correct, it is power keycode rather
the mute
keycode.

Could you please do some debug, let us find which line of code is

the

root cause for this problem. for example:

after running the line, the problem shows up:

1. if (ACPI_SUCCESS(acpi_get_devices("LEN0068",
acpi_check_cb,
&found, NULL)) && found) // in the
sound/pci/hda/thinkpad_helper.c,
is_thinkpad()

This evaluates to true


2. return ACPI_SUCCESS(acpi_get_devices("IBM0068",

acpi_check_cb,

&found, NULL)) && found; // same as above

3. if (led_set_func(TPACPI_LED_MUTE, false) >= 0) {
//in the
sound/pci/hda/thinkpad_helper.c, hda_fixup_thinkpad_acpi()

...and this


4. if (led_set_func(TPACPI_LED_MICMUTE, false) >= 0) { // same as
above


...and this as well. spec->num_adc_nids is 1.

If we change the code like below, does the power button can work
well?

in the thinkpad_helper.c, hda_fixup_thinkpad_acpi()


   if (led_set_func(TPACPI_LED_MUTE, false) >= 0) {
   /*
   old_vmaster_hook = spec->vmaster_mute.hook;
   spec->vmaster_mute.hook = update_tpacpi_mute_led;
   removefunc = false;
  */

Disabling only this block resolves the issue.

So Below two lines make the power button change to the reset button.

drivers/platform/x86/thinkpad_acpi.c  mute_led_on_off():

acpi_get_handle(hkey_handle, "SSMS", &temp);
acpi_evalf(hkey_handle, &output, "SSMS", "dd", 1);


@alexhung,
Do you have any idea why this can affect the power button behavior?


I think we all lost track of this issue, but it unfortunately still
exists in the latest kernel, requiring custom builds here. How can we
proceed?

http://mailman.alsa-project.org/pipermail/alsa-devel/2015-May/091561.html


If you cannot find "SSMS" in  your T520 ACPI dump, this mean mute LED
cannot be turn on/off similar to T510

There is an entry (see attached disassembly), but this device has at
least no physical LED to drive.

Some hotkey leds are embedded on button.  Through the pictures I found
on the internet (thinkpad t520), it looks like there is a led at the
center of the mute button.

Again, I'm on a X121e, and that has only a single physical LED for
signaling the power state. The mute button is behind key combination of
the keyboard.

Jan
There is no reason to change a power button to a reset button after 
accessing the acpi device "SSMS", the "SSMS" is for the mute led instead 
of the power management.


I think it is better you login to the lenovo website and look for the 
latest BIOS image, then upgrade the BIOS on your machine to see if it 
can solve the problem or not.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [alsa-devel] [V2 PATCH] ALSA: hda - Enable mute/mic-mute LEDs for more Thinkpads with Conexant codec

2015-06-24 Thread Hui Wang

On 06/24/2015 01:37 PM, Jan Kiszka wrote:

On 2015-05-23 18:22, Jan Kiszka wrote:

On 2015-05-23 18:06, Raymond Yau wrote:

2015-5-23 下午4:50 於 "Jan Kiszka"  寫道:

Most Thinkpad Edge series laptops use conexant codec, so far
although



Is there anything I can debug or any information I can collect
from my
box to examine this?

What is the linux distribution on your machine? And use showkey to
catch
the keycode of that button.

I'm running OpenSUSE 13.1. The reported keycode of the power
button is
116.

It seems the keycode is correct, it is power keycode rather the mute
keycode.

Could you please do some debug, let us find which line of code is

the

root cause for this problem. for example:

after running the line, the problem shows up:

1. if (ACPI_SUCCESS(acpi_get_devices("LEN0068", acpi_check_cb,
&found, NULL)) && found) // in the sound/pci/hda/thinkpad_helper.c,
is_thinkpad()

This evaluates to true


2. return ACPI_SUCCESS(acpi_get_devices("IBM0068",

acpi_check_cb,

&found, NULL)) && found; // same as above

3. if (led_set_func(TPACPI_LED_MUTE, false) >= 0) { //in the
sound/pci/hda/thinkpad_helper.c, hda_fixup_thinkpad_acpi()

...and this


4. if (led_set_func(TPACPI_LED_MICMUTE, false) >= 0) { // same as
above


...and this as well. spec->num_adc_nids is 1.

If we change the code like below, does the power button can work well?

in the thinkpad_helper.c, hda_fixup_thinkpad_acpi()


  if (led_set_func(TPACPI_LED_MUTE, false) >= 0) {
  /*
  old_vmaster_hook = spec->vmaster_mute.hook;
  spec->vmaster_mute.hook = update_tpacpi_mute_led;
  removefunc = false;
 */

Disabling only this block resolves the issue.

So Below two lines make the power button change to the reset button.

drivers/platform/x86/thinkpad_acpi.c  mute_led_on_off():

acpi_get_handle(hkey_handle, "SSMS", &temp);
acpi_evalf(hkey_handle, &output, "SSMS", "dd", 1);


@alexhung,
Do you have any idea why this can affect the power button behavior?


I think we all lost track of this issue, but it unfortunately still
exists in the latest kernel, requiring custom builds here. How can we
proceed?

http://mailman.alsa-project.org/pipermail/alsa-devel/2015-May/091561.html

If you cannot find "SSMS" in  your T520 ACPI dump, this mean mute LED
cannot be turn on/off similar to T510

There is an entry (see attached disassembly), but this device has at
least no physical LED to drive.
Some hotkey leds are embedded on button.  Through the pictures I found 
on the internet (thinkpad t520), it looks like there is a led at the 
center of the mute button.



Jan


Ping...

Jan
___
Alsa-devel mailing list
alsa-de...@alsa-project.org
http://mailman.alsa-project.org/mailman/listinfo/alsa-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] CMA: page_isolation: check buddy before access it

2015-05-06 Thread Hui Zhu
On Wed, May 6, 2015 at 2:28 PM, Joonsoo Kim  wrote:
> On Tue, May 05, 2015 at 11:22:59AM +0800, Hui Zhu wrote:
>> Change pfn_present to pfn_valid_within according to the review of Laura.
>>
>> I got a issue:
>> [  214.294917] Unable to handle kernel NULL pointer dereference at virtual 
>> address 082a
>> [  214.303013] pgd = cc97
>> [  214.305721] [082a] *pgd=
>> [  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>> [  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
>> [  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
>> [  214.346523] pc : []lr : []psr: 8093
>> [  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
>> [  214.358005] r10: 0001  r9 : 000a  r8 : 0004
>> [  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
>> [  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
>> [  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
>> user
>> [  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
>> [  214.949720] Backtrace:
>> [  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from 
>> [] (unset_migratetype_isolate+0x148/0x1b0)
>> [  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
>> [  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from 
>> [] (undo_isolate_page_range+0xd0/0xdc)
>> [  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from 
>> [] (__alloc_contig_range+0x254/0x34c)
>> [  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00 r5:0005
>> r4:c7029db4
>> [  214.997308] [] (__alloc_contig_range+0x0/0x34c) from 
>> [] (alloc_contig_range+0x14/0x18)
>> [  215.006973] [] (alloc_contig_range+0x0/0x18) from [] 
>> (dma_alloc_from_contiguous_addr+0x1ac/0x304)
>>
>> This issue is because when call unset_migratetype_isolate to unset a part
>> of CMA memory, it try to access the buddy page to get its status:
>>   if (order >= pageblock_order) {
>>   page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
>>   buddy_idx = __find_buddy_index(page_idx, order);
>>   buddy = page + (buddy_idx - page_idx);
>>
>>   if (!is_migrate_isolate_page(buddy)) {
>> But the begin addr of this part of CMA memory is very close to a part of
>> memory that is reserved in the boot time (not in buddy system).
>> So add a check before access it.
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>  mm/page_isolation.c | 3 ++-
>>  1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 755a42c..eb22d1f 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page, 
>> unsigned migratetype)
>>   buddy_idx = __find_buddy_index(page_idx, order);
>>   buddy = page + (buddy_idx - page_idx);
>>
>> - if (!is_migrate_isolate_page(buddy)) {
>> + if (!pfn_valid_within(page_to_pfn(buddy))
>> + || !is_migrate_isolate_page(buddy)) {
>>   __isolate_free_page(page, order);
>>   kernel_map_pages(page, (1 << order), 1);
>>   set_page_refcounted(page);
>
> Hello,
>
> This isolation is for merging buddy pages. If buddy is not valid, we
> don't need to isolate page, because we can't merge them.
> I think that correct code would be:
>
> pfn_valid_within(page_to_pfn(buddy)) &&
> !is_migrate_isolate_page(buddy)
>
> But, isolation and free here is safe operation so your code will work
> fine.
>

Oops!  I posted a new version for the patch.

Thanks,
Hui

> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] CMA: page_isolation: check buddy before access it

2015-05-06 Thread Hui Zhu
Changelog:
v3, Change the behavior according to the review of Joonsoo.
v2, Change pfn_present to pfn_valid_within according to the review of Laura.

I got a issue:
[  214.294917] Unable to handle kernel NULL pointer dereference at virtual 
address 082a
[  214.303013] pgd = cc97
[  214.305721] [082a] *pgd=
[  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
[  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
[  214.346523] pc : []lr : []psr: 8093
[  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
[  214.358005] r10: 0001  r9 : 000a  r8 : 0004
[  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
[  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
[  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
user
[  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
[  214.949720] Backtrace:
[  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from 
[] (unset_migratetype_isolate+0x148/0x1b0)
[  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
[  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from 
[] (undo_isolate_page_range+0xd0/0xdc)
[  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from 
[] (__alloc_contig_range+0x254/0x34c)
[  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00 r5:0005
r4:c7029db4
[  214.997308] [] (__alloc_contig_range+0x0/0x34c) from [] 
(alloc_contig_range+0x14/0x18)
[  215.006973] [] (alloc_contig_range+0x0/0x18) from [] 
(dma_alloc_from_contiguous_addr+0x1ac/0x304)

This issue is because when call unset_migratetype_isolate to unset a part
of CMA memory, it try to access the buddy page to get its status:
if (order >= pageblock_order) {
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);

if (!is_migrate_isolate_page(buddy)) {
But the begin addr of this part of CMA memory is very close to a part of
memory that is reserved in the boot time (not in buddy system).
So add a check before access it.

Suggested-by: Laura Abbott 
Suggested-by: Joonsoo Kim 
Signed-off-by: Hui Zhu 
---
 mm/page_isolation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 755a42c..4a5624c 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page, unsigned 
migratetype)
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);
 
-   if (!is_migrate_isolate_page(buddy)) {
+   if (pfn_valid_within(page_to_pfn(buddy))
+   && !is_migrate_isolate_page(buddy)) {
__isolate_free_page(page, order);
kernel_map_pages(page, (1 << order), 1);
set_page_refcounted(page);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2] CMA: page_isolation: check buddy before access it

2015-05-05 Thread Hui Zhu
On Wed, May 6, 2015 at 5:29 AM, Andrew Morton  wrote:
> On Tue, 5 May 2015 11:22:59 +0800 Hui Zhu  wrote:
>
>> Change pfn_present to pfn_valid_within according to the review of Laura.
>>
>> I got a issue:
>> [  214.294917] Unable to handle kernel NULL pointer dereference at virtual 
>> address 082a
>> [  214.303013] pgd = cc97
>> [  214.305721] [082a] *pgd=
>> [  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>> [  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
>> [  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
>> [  214.346523] pc : []lr : []psr: 8093
>> [  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
>> [  214.358005] r10: 0001  r9 : 000a  r8 : 0004
>> [  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
>> [  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
>> [  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
>> user
>> [  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
>> [  214.949720] Backtrace:
>> [  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from 
>> [] (unset_migratetype_isolate+0x148/0x1b0)
>> [  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
>> [  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from 
>> [] (undo_isolate_page_range+0xd0/0xdc)
>> [  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from 
>> [] (__alloc_contig_range+0x254/0x34c)
>> [  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00 r5:0005
>> r4:c7029db4
>> [  214.997308] [] (__alloc_contig_range+0x0/0x34c) from 
>> [] (alloc_contig_range+0x14/0x18)
>> [  215.006973] [] (alloc_contig_range+0x0/0x18) from [] 
>> (dma_alloc_from_contiguous_addr+0x1ac/0x304)
>>
>> This issue is because when call unset_migratetype_isolate to unset a part
>> of CMA memory, it try to access the buddy page to get its status:
>>   if (order >= pageblock_order) {
>>   page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
>>   buddy_idx = __find_buddy_index(page_idx, order);
>>   buddy = page + (buddy_idx - page_idx);
>>
>>   if (!is_migrate_isolate_page(buddy)) {
>> But the begin addr of this part of CMA memory is very close to a part of
>> memory that is reserved in the boot time (not in buddy system).
>> So add a check before access it.
>>
>> ...
>>
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page, 
>> unsigned migratetype)
>>   buddy_idx = __find_buddy_index(page_idx, order);
>>   buddy = page + (buddy_idx - page_idx);
>>
>> - if (!is_migrate_isolate_page(buddy)) {
>> + if (!pfn_valid_within(page_to_pfn(buddy))
>> + || !is_migrate_isolate_page(buddy)) {
>>   __isolate_free_page(page, order);
>>   kernel_map_pages(page, (1 << order), 1);
>>   set_page_refcounted(page);
>
> This fix is needed in kernel versions 4.0.x isn't it?

I think it need it.

Thanks,
Hui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2] CMA: page_isolation: check buddy before access it

2015-05-04 Thread Hui Zhu
Change pfn_present to pfn_valid_within according to the review of Laura.

I got a issue:
[  214.294917] Unable to handle kernel NULL pointer dereference at virtual 
address 082a
[  214.303013] pgd = cc97
[  214.305721] [082a] *pgd=
[  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
[  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
[  214.346523] pc : []lr : []psr: 8093
[  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
[  214.358005] r10: 0001  r9 : 000a  r8 : 0004
[  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
[  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
[  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
user
[  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
[  214.949720] Backtrace:
[  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from 
[] (unset_migratetype_isolate+0x148/0x1b0)
[  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
[  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from 
[] (undo_isolate_page_range+0xd0/0xdc)
[  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from 
[] (__alloc_contig_range+0x254/0x34c)
[  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00 r5:0005
r4:c7029db4
[  214.997308] [] (__alloc_contig_range+0x0/0x34c) from [] 
(alloc_contig_range+0x14/0x18)
[  215.006973] [] (alloc_contig_range+0x0/0x18) from [] 
(dma_alloc_from_contiguous_addr+0x1ac/0x304)

This issue is because when call unset_migratetype_isolate to unset a part
of CMA memory, it try to access the buddy page to get its status:
if (order >= pageblock_order) {
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);

if (!is_migrate_isolate_page(buddy)) {
But the begin addr of this part of CMA memory is very close to a part of
memory that is reserved in the boot time (not in buddy system).
So add a check before access it.

Signed-off-by: Hui Zhu 
---
 mm/page_isolation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 755a42c..eb22d1f 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page, unsigned 
migratetype)
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);
 
-   if (!is_migrate_isolate_page(buddy)) {
+   if (!pfn_valid_within(page_to_pfn(buddy))
+   || !is_migrate_isolate_page(buddy)) {
__isolate_free_page(page, order);
kernel_map_pages(page, (1 << order), 1);
set_page_refcounted(page);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CMA: page_isolation: check buddy before access it

2015-05-04 Thread Hui Zhu
On Tue, May 5, 2015 at 2:34 AM, Laura Abbott  wrote:
> On 05/04/2015 02:41 AM, Hui Zhu wrote:
>>
>> I got a issue:
>> [  214.294917] Unable to handle kernel NULL pointer dereference at virtual
>> address 082a
>> [  214.303013] pgd = cc97
>> [  214.305721] [082a] *pgd=
>> [  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
>> [  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
>> [  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
>> [  214.346523] pc : []lr : []psr: 8093
>> [  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
>> [  214.358005] r10: 0001  r9 : 000a  r8 : 0004
>> [  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
>> [  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
>> [  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM
>> Segment user
>> [  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
>> [  214.949720] Backtrace:
>> [  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from
>> [] (unset_migratetype_isolate+0x148/0x1b0)
>> [  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
>> [  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from
>> [] (undo_isolate_page_range+0xd0/0xdc)
>> [  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from
>> [] (__alloc_contig_range+0x254/0x34c)
>> [  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00
>> r5:0005
>> r4:c7029db4
>> [  214.997308] [] (__alloc_contig_range+0x0/0x34c) from
>> [] (alloc_contig_range+0x14/0x18)
>> [  215.006973] [] (alloc_contig_range+0x0/0x18) from
>> [] (dma_alloc_from_contiguous_addr+0x1ac/0x304)
>>
>> This issue is because when call unset_migratetype_isolate to unset a part
>> of CMA memory, it try to access the buddy page to get its status:
>> if (order >= pageblock_order) {
>> page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) -
>> 1);
>> buddy_idx = __find_buddy_index(page_idx, order);
>> buddy = page + (buddy_idx - page_idx);
>>
>> if (!is_migrate_isolate_page(buddy)) {
>> But the begin addr of this part of CMA memory is very close to a part of
>> memory that is reserved in the boot time (not in buddy system).
>> So add a check before access it.
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>   mm/page_isolation.c | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 755a42c..434730b 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page,
>> unsigned migratetype)
>> buddy_idx = __find_buddy_index(page_idx, order);
>> buddy = page + (buddy_idx - page_idx);
>>
>> -   if (!is_migrate_isolate_page(buddy)) {
>> +   if (!pfn_present(page_to_pfn(buddy))
>> +       || !is_migrate_isolate_page(buddy)) {
>> __isolate_free_page(page, order);
>> kernel_map_pages(page, (1 << order), 1);
>> set_page_refcounted(page);
>>
>
> I think you want to use pfn_valid_within instead of pfn_present.

Thanks.  I will post a new version for it.

Best,
Hui

>
> Thanks,
> Laura
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CMA: page_isolation: check buddy before access it

2015-05-04 Thread Hui Zhu
I got a issue:
[  214.294917] Unable to handle kernel NULL pointer dereference at virtual 
address 082a
[  214.303013] pgd = cc97
[  214.305721] [082a] *pgd=
[  214.309316] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[  214.335704] PC is at get_pageblock_flags_group+0x5c/0xb0
[  214.341030] LR is at unset_migratetype_isolate+0x148/0x1b0
[  214.346523] pc : []lr : []psr: 8093
[  214.346523] sp : c7029d00  ip : 0105  fp : c7029d1c
[  214.358005] r10: 0001  r9 : 000a  r8 : 0004
[  214.363231] r7 : 6013  r6 : 00a4  r5 : c0a357e4  r4 : 
[  214.369761] r3 : 0826  r2 : 0002  r1 :   r0 : 003f
[  214.376291] Flags: Nzcv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment 
user
[  214.383516] Control: 10c5387d  Table: 2cb7006a  DAC: 0015
[  214.949720] Backtrace:
[  214.952192] [] (get_pageblock_flags_group+0x0/0xb0) from 
[] (unset_migratetype_isolate+0x148/0x1b0)
[  214.962978]  r7:6013 r6:c0a357c0 r5:c0a357e4 r4:c1555000
[  214.968693] [] (unset_migratetype_isolate+0x0/0x1b0) from 
[] (undo_isolate_page_range+0xd0/0xdc)
[  214.979222] [] (undo_isolate_page_range+0x0/0xdc) from 
[] (__alloc_contig_range+0x254/0x34c)
[  214.989398]  r9:000abc00 r8:c7028000 r7:000b1f53 r6:000b3e00 r5:0005
r4:c7029db4
[  214.997308] [] (__alloc_contig_range+0x0/0x34c) from [] 
(alloc_contig_range+0x14/0x18)
[  215.006973] [] (alloc_contig_range+0x0/0x18) from [] 
(dma_alloc_from_contiguous_addr+0x1ac/0x304)

This issue is because when call unset_migratetype_isolate to unset a part
of CMA memory, it try to access the buddy page to get its status:
if (order >= pageblock_order) {
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);

if (!is_migrate_isolate_page(buddy)) {
But the begin addr of this part of CMA memory is very close to a part of
memory that is reserved in the boot time (not in buddy system).
So add a check before access it.

Signed-off-by: Hui Zhu 
---
 mm/page_isolation.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/page_isolation.c b/mm/page_isolation.c
index 755a42c..434730b 100644
--- a/mm/page_isolation.c
+++ b/mm/page_isolation.c
@@ -101,7 +101,8 @@ void unset_migratetype_isolate(struct page *page, unsigned 
migratetype)
buddy_idx = __find_buddy_index(page_idx, order);
buddy = page + (buddy_idx - page_idx);
 
-   if (!is_migrate_isolate_page(buddy)) {
+   if (!pfn_present(page_to_pfn(buddy))
+   || !is_migrate_isolate_page(buddy)) {
__isolate_free_page(page, order);
kernel_map_pages(page, (1 << order), 1);
set_page_refcounted(page);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] CMA: treat free cma pages as non-free if not ALLOC_CMA on watermark checking

2015-01-18 Thread Hui Zhu
On Mon, Jan 19, 2015 at 2:55 PM, Minchan Kim  wrote:
> Hello,
>
> On Sun, Jan 18, 2015 at 04:32:59PM +0800, Hui Zhu wrote:
>> From: Hui Zhu 
>>
>> The original of this patch [1] is part of Joonsoo's CMA patch series.
>> I made a patch [2] to fix the issue of this patch.  Joonsoo reminded me
>> that this issue affect current kernel too.  So made a new one for upstream.
>
> Recently, we found many problems of CMA and Joonsoo tried to add more
> hooks into MM like agressive allocation but I suggested adding new zone
> would be more desirable than more hooks in mm fast path in various aspect.
> (ie, remove lots of hooks in hot path of MM, don't need reclaim hooks
>  for special CMA pages, don't need custom fair allocation for CMA).
>
> Joonsoo is investigating the direction so please wait.
> If it turns out we have lots of hurdle to go that way,
> this direction(ie, putting more hooks) should be second plan.

OK.  Thanks.

Best,
Hui

>
> Thanks.
>
>>
>> Current code treat free cma pages as non-free if not ALLOC_CMA in the first
>> check:
>> if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
>>   return false;
>> But in the loop after that, it treat free cma pages as free memory even
>> if not ALLOC_CMA.
>> So this one substruct free_cma from free_pages before the loop if not
>> ALLOC_CMA to treat free cma pages as non-free in the loop.
>>
>> But there still have a issue is that CMA memory in each order is part
>> of z->free_area[o].nr_free, then the CMA page number of this order is
>> substructed twice.  This bug will make __zone_watermark_ok return more false.
>> This patch add cma_nr_free to struct free_area that just record the number
>> of CMA pages.  And add it back in the order loop to handle the substruct
>> twice issue.
>>
>> The last issue of this patch should handle is pointed by Joonsoo in [3].
>> If pageblock for CMA is isolated, cma_nr_free would be miscalculated.
>> This patch add two functions nr_free_inc and nr_free_dec to change the
>> values of nr_free and cma_nr_free.  If the migratetype is MIGRATE_ISOLATE,
>> they will not change the value of nr_free.
>> Change __mod_zone_freepage_state to doesn't record isolated page to
>> NR_FREE_PAGES.
>> And add code to move_freepages to record the page number that isolated:
>>   if (is_migrate_isolate(migratetype))
>>   nr_free_dec(&zone->free_area[order],
>>   get_freepage_migratetype(page));
>>   else
>>   nr_free_inc(&zone->free_area[order], migratetype);
>> Then the isolate issue is handled.
>>
>> This patchset is based on fc7f0dd381720ea5ee5818645f7d0e9dece41cb0.
>>
>> [1] https://lkml.org/lkml/2014/5/28/110
>> [2] https://lkml.org/lkml/2014/12/25/43
>> [3] https://lkml.org/lkml/2015/1/4/220
>>
>> Signed-off-by: Joonsoo Kim 
>> Signed-off-by: Hui Zhu 
>> Signed-off-by: Weixing Liu 
>> ---
>>  include/linux/mmzone.h |  3 +++
>>  include/linux/vmstat.h |  4 +++-
>>  mm/page_alloc.c| 59 
>> +-
>>  3 files changed, 55 insertions(+), 11 deletions(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2f0856d..094476b 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -92,6 +92,9 @@ static inline int get_pfnblock_migratetype(struct page 
>> *page, unsigned long pfn)
>>  struct free_area {
>>   struct list_headfree_list[MIGRATE_TYPES];
>>   unsigned long   nr_free;
>> +#ifdef CONFIG_CMA
>> + unsigned long   cma_nr_free;
>> +#endif
>>  };
>>
>>  struct pglist_data;
>> diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
>> index 82e7db7..f18ef00 100644
>> --- a/include/linux/vmstat.h
>> +++ b/include/linux/vmstat.h
>> @@ -6,6 +6,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  #include 
>>
>>  extern int sysctl_stat_interval;
>> @@ -280,7 +281,8 @@ static inline void drain_zonestat(struct zone *zone,
>>  static inline void __mod_zone_freepage_state(struct zone *zone, int 
>> nr_pages,
>>int migratetype)
>>  {
>> - __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
>> + if (!is_migrate_isolate(migratetype))
>> + __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
>>   if (is_migrate_

Re: [PATCH] mm/page_alloc: Fix race conditions on getting migratetype in buffered_rmqueue

2015-01-18 Thread Hui Zhu
On Sun, Jan 18, 2015 at 6:19 PM, Vlastimil Babka  wrote:
> On 18.1.2015 10:17, Hui Zhu wrote:
>>
>> From: Hui Zhu 
>>
>> To test the patch [1], I use KGTP and a script [2] to show
>> NR_FREE_CMA_PAGES
>> and gross of cma_nr_free.  The values are always not same.
>> I check the code of pages alloc and free and found that race conditions
>> on getting migratetype in buffered_rmqueue.
>
>
> Can you elaborate? What does this races with, are you dynamically changing
> the size of CMA area, or what? The migratetype here is based on which free
> list the page was found on. Was it misplaced then? Wasn't Joonsoo's recent
> series supposed to eliminate this?

My bad.
I thought move_freepages has race condition with this part.  But I
missed it will check PageBuddy before set_freepage_migratetype.
Sorry for that.

I will do more work around this one and [1].

Thanks for your review.

Best,
Hui

>
>> Then I add move the code of getting migratetype inside the zone->lock
>> protection part.
>
>
> Not just that, you are also reading migratetype from pageblock bitmap
> instead of the one embedded in the free page. Which is more expensive
> and we already do that more often than we would like to because of CMA.
> And it appears to be a wrong fix for a possible misplacement bug. If there's
> such misplacement, the wrong stats are not the only problem.
>
>>
>> Because this issue will affect system even if the Linux kernel does't
>> have [1].  So I post this patch separately.
>
>
> But we can't test that without [1], right? Maybe the issue is introduced by
> [1]?
>
>
>>
>> This patchset is based on fc7f0dd381720ea5ee5818645f7d0e9dece41cb0.
>>
>> [1] https://lkml.org/lkml/2015/1/18/28
>> [2] https://github.com/teawater/kgtp/blob/dev/add-ons/cma_free.py
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>   mm/page_alloc.c | 11 +++
>>   1 file changed, 7 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7633c50..f3d6922 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -1694,11 +1694,12 @@ again:
>> }
>> spin_lock_irqsave(&zone->lock, flags);
>> page = __rmqueue(zone, order, migratetype);
>> +   if (page)
>> +   migratetype = get_pageblock_migratetype(page);
>> +   else
>> +   goto failed_unlock;
>> spin_unlock(&zone->lock);
>> -   if (!page)
>> -   goto failed;
>> -   __mod_zone_freepage_state(zone, -(1 << order),
>> - get_freepage_migratetype(page));
>> +   __mod_zone_freepage_state(zone, -(1 << order),
>> migratetype);
>> }
>> __mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
>> @@ -1715,6 +1716,8 @@ again:
>> goto again;
>> return page;
>>   +failed_unlock:
>> +   spin_unlock(&zone->lock);
>>   failed:
>> local_irq_restore(flags);
>> return NULL;
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm/page_alloc: Fix race conditions on getting migratetype in buffered_rmqueue

2015-01-18 Thread Hui Zhu
From: Hui Zhu 

To test the patch [1], I use KGTP and a script [2] to show NR_FREE_CMA_PAGES
and gross of cma_nr_free.  The values are always not same.
I check the code of pages alloc and free and found that race conditions
on getting migratetype in buffered_rmqueue.
Then I add move the code of getting migratetype inside the zone->lock
protection part.

Because this issue will affect system even if the Linux kernel does't
have [1].  So I post this patch separately.

This patchset is based on fc7f0dd381720ea5ee5818645f7d0e9dece41cb0.

[1] https://lkml.org/lkml/2015/1/18/28
[2] https://github.com/teawater/kgtp/blob/dev/add-ons/cma_free.py

Signed-off-by: Hui Zhu 
---
 mm/page_alloc.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..f3d6922 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1694,11 +1694,12 @@ again:
}
spin_lock_irqsave(&zone->lock, flags);
page = __rmqueue(zone, order, migratetype);
+   if (page)
+   migratetype = get_pageblock_migratetype(page);
+   else
+   goto failed_unlock;
spin_unlock(&zone->lock);
-   if (!page)
-   goto failed;
-   __mod_zone_freepage_state(zone, -(1 << order),
- get_freepage_migratetype(page));
+   __mod_zone_freepage_state(zone, -(1 << order), migratetype);
}
 
__mod_zone_page_state(zone, NR_ALLOC_BATCH, -(1 << order));
@@ -1715,6 +1716,8 @@ again:
goto again;
return page;
 
+failed_unlock:
+   spin_unlock(&zone->lock);
 failed:
local_irq_restore(flags);
return NULL;
-- 
1.9.3

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CMA: treat free cma pages as non-free if not ALLOC_CMA on watermark checking

2015-01-18 Thread Hui Zhu
From: Hui Zhu 

The original of this patch [1] is part of Joonsoo's CMA patch series.
I made a patch [2] to fix the issue of this patch.  Joonsoo reminded me
that this issue affect current kernel too.  So made a new one for upstream.

Current code treat free cma pages as non-free if not ALLOC_CMA in the first
check:
if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
return false;
But in the loop after that, it treat free cma pages as free memory even
if not ALLOC_CMA.
So this one substruct free_cma from free_pages before the loop if not
ALLOC_CMA to treat free cma pages as non-free in the loop.

But there still have a issue is that CMA memory in each order is part
of z->free_area[o].nr_free, then the CMA page number of this order is
substructed twice.  This bug will make __zone_watermark_ok return more false.
This patch add cma_nr_free to struct free_area that just record the number
of CMA pages.  And add it back in the order loop to handle the substruct
twice issue.

The last issue of this patch should handle is pointed by Joonsoo in [3].
If pageblock for CMA is isolated, cma_nr_free would be miscalculated.
This patch add two functions nr_free_inc and nr_free_dec to change the
values of nr_free and cma_nr_free.  If the migratetype is MIGRATE_ISOLATE,
they will not change the value of nr_free.
Change __mod_zone_freepage_state to doesn't record isolated page to
NR_FREE_PAGES.
And add code to move_freepages to record the page number that isolated:
if (is_migrate_isolate(migratetype))
nr_free_dec(&zone->free_area[order],
get_freepage_migratetype(page));
else
nr_free_inc(&zone->free_area[order], migratetype);
Then the isolate issue is handled.

This patchset is based on fc7f0dd381720ea5ee5818645f7d0e9dece41cb0.

[1] https://lkml.org/lkml/2014/5/28/110
[2] https://lkml.org/lkml/2014/12/25/43
[3] https://lkml.org/lkml/2015/1/4/220

Signed-off-by: Joonsoo Kim 
Signed-off-by: Hui Zhu 
Signed-off-by: Weixing Liu 
---
 include/linux/mmzone.h |  3 +++
 include/linux/vmstat.h |  4 +++-
 mm/page_alloc.c| 59 +-
 3 files changed, 55 insertions(+), 11 deletions(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2f0856d..094476b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -92,6 +92,9 @@ static inline int get_pfnblock_migratetype(struct page *page, 
unsigned long pfn)
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
+#ifdef CONFIG_CMA
+   unsigned long   cma_nr_free;
+#endif
 };
 
 struct pglist_data;
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 82e7db7..f18ef00 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -6,6 +6,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 extern int sysctl_stat_interval;
@@ -280,7 +281,8 @@ static inline void drain_zonestat(struct zone *zone,
 static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages,
 int migratetype)
 {
-   __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
+   if (!is_migrate_isolate(migratetype))
+   __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages);
if (is_migrate_cma(migratetype))
__mod_zone_page_state(zone, NR_FREE_CMA_PAGES, nr_pages);
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..9a2b6da 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -576,6 +576,28 @@ static inline int page_is_buddy(struct page *page, struct 
page *buddy,
return 0;
 }
 
+static inline void nr_free_inc(struct free_area *area, int migratetype)
+{
+   if (!is_migrate_isolate(migratetype))
+   area->nr_free++;
+
+#ifdef CONFIG_CMA
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free++;
+#endif
+}
+
+static inline void nr_free_dec(struct free_area *area, int migratetype)
+{
+   if (!is_migrate_isolate(migratetype))
+   area->nr_free--;
+
+#ifdef CONFIG_CMA
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free--;
+#endif
+}
+
 /*
  * Freeing function for a buddy system allocator.
  *
@@ -649,7 +671,7 @@ static inline void __free_one_page(struct page *page,
clear_page_guard(zone, buddy, order, migratetype);
} else {
list_del(&buddy->lru);
-   zone->free_area[order].nr_free--;
+   nr_free_dec(&zone->free_area[order], migratetype);
rmv_page_order(buddy);
}
combined_idx = buddy_idx & page_idx;
@@ -682,7 +704,7 @@ static inline void __free_one_page(struct page *page,
 
list_add(&page->

Re: [PATCH] CMA: Fix CMA's page number is substructed twice in __zone_watermark_ok

2015-01-07 Thread Hui Zhu
On Wed, Jan 7, 2015 at 4:45 PM, Vlastimil Babka  wrote:
> On 12/30/2014 11:17 AM, Hui Zhu wrote:
>> The original of this patch [1] is used to fix the issue in Joonsoo's CMA 
>> patch
>> "CMA: always treat free cma pages as non-free on watermark checking" [2].
>>
>> Joonsoo reminded me that this issue affect current kernel too.  So made a new
>> one for upstream.
>>
>> Function __zone_watermark_ok substruct CMA pages number from free_pages
>> if system allocation can't use CMA areas:
>>   /* If allocation can't use CMA areas don't use free CMA pages */
>>   if (!(alloc_flags & ALLOC_CMA))
>>   free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);
>>
>> But after this part of code
>>   for (o = 0; o < order; o++) {
>>   /* At the next order, this order's pages become unavailable */
>>   free_pages -= z->free_area[o].nr_free << o;
>> CMA memory in each order is part of z->free_area[o].nr_free, then the CMA
>> page number of this order is substructed twice.  This bug will make
>> __zone_watermark_ok return more false.
>>
>> This patch add cma_free_area to struct free_area that just record the number
>> of CMA pages.  And add it back in the order loop to handle the substruct
>> twice issue.
>
> Le sigh.
>
> I now dub CMA "Contagious Memory Allocator".
> One can't even take a Christmas vacation without this blight to spread :(
>
> Seriously, with so much special casing everywhere in fast paths, Minchan's
> (IIRC) proposal of a special CMA zone has some appeal.
>
> But it seems to me that the bug you are fixing doesn't exist as you describe 
> it?
> free_cma is only used here:
>
> if (free_pages - free_cma <= min + z->lowmem_reserve[classzone_idx])
> return false;
>
> So it's subtracted from free_pages just temporarily for the basic order-0 
> check.
> In the higher-order magic loop, it's not used at all?
>

I am so sorry  that I made a mistake when I split this patch from the
patch series.

The original of this patch is to fix the issue around Joonsoo's update
of __zone_watermark_ok:
if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

if (free_pages <= min + z->lowmem_reserve[classzone_idx])
return false;

Joonsoo, what about submit this change to upstream first?

Thanks,
Hui


> Vlastimil
>
>
>> [1] https://lkml.org/lkml/2014/12/25/43
>> [2] https://lkml.org/lkml/2014/5/28/110
>>
>> Signed-off-by: Hui Zhu 
>> Signed-off-by: Weixing Liu 
>> ---
>>  include/linux/mmzone.h |  3 +++
>>  mm/page_alloc.c| 22 ++
>>  2 files changed, 25 insertions(+)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 2f0856d..094476b 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -92,6 +92,9 @@ static inline int get_pfnblock_migratetype(struct page 
>> *page, unsigned long pfn)
>>  struct free_area {
>>   struct list_headfree_list[MIGRATE_TYPES];
>>   unsigned long   nr_free;
>> +#ifdef CONFIG_CMA
>> + unsigned long   cma_nr_free;
>> +#endif
>>  };
>>
>>  struct pglist_data;
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 7633c50..026cf27 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -650,6 +650,8 @@ static inline void __free_one_page(struct page *page,
>>   } else {
>>   list_del(&buddy->lru);
>>   zone->free_area[order].nr_free--;
>> + if (is_migrate_cma(migratetype))
>> + zone->free_area[order].cma_nr_free--;
>>   rmv_page_order(buddy);
>>   }
>>   combined_idx = buddy_idx & page_idx;
>> @@ -683,6 +685,8 @@ static inline void __free_one_page(struct page *page,
>>   list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
>>  out:
>>   zone->free_area[order].nr_free++;
>> + if (is_migrate_cma(migratetype))
>> + zone->free_area[order].cma_nr_free++;
>>  }
>>
>>  static inline int free_pages_check(struct page *page)
>> @@ -937,6 +941,8 @@ static inline void expand(struct zone *zone, struct page 
>> *page,
>>   }
>>   list_add(&page[size].lru, &area->free_list[migratetype]);
>>   

[PATCH] samples: hw_breakpoint: check the return value of kallsyms_lookup_name

2015-01-03 Thread Hui Zhu
data_breakpoint.ko can insert successful but cannot catch any change of
the data in my part because kallsyms_lookup_name rerurn 0 each time.
So add code to check the return value of kallsyms_lookup_name.

Signed-off-by: Hui Zhu 
---
 samples/hw_breakpoint/data_breakpoint.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f322..4fbf93b 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -52,27 +52,30 @@ static void sample_hbp_handler(struct perf_event *bp,
 
 static int __init hw_break_module_init(void)
 {
-   int ret;
+   int ret = 0;
struct perf_event_attr attr;
 
hw_breakpoint_init(&attr);
attr.bp_addr = kallsyms_lookup_name(ksym_name);
+   if (!attr.bp_addr) {
+   ret = -ENXIO;
+   printk(KERN_INFO "Get address for %s failed\n", ksym_name);
+   goto out;
+   }
+
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
sample_hbp = register_wide_hw_breakpoint(&attr, sample_hbp_handler, 
NULL);
if (IS_ERR((void __force *)sample_hbp)) {
ret = PTR_ERR((void __force *)sample_hbp);
-   goto fail;
+   printk(KERN_INFO "Breakpoint registration failed\n");
+   goto out;
}
 
printk(KERN_INFO "HW Breakpoint for %s write installed\n", ksym_name);
 
-   return 0;
-
-fail:
-   printk(KERN_INFO "Breakpoint registration failed\n");
-
+out:
return ret;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] CMA: Fix CMA's page number is substructed twice in __zone_watermark_ok

2014-12-30 Thread Hui Zhu
The original of this patch [1] is used to fix the issue in Joonsoo's CMA patch
"CMA: always treat free cma pages as non-free on watermark checking" [2].

Joonsoo reminded me that this issue affect current kernel too.  So made a new
one for upstream.

Function __zone_watermark_ok substruct CMA pages number from free_pages
if system allocation can't use CMA areas:
/* If allocation can't use CMA areas don't use free CMA pages */
if (!(alloc_flags & ALLOC_CMA))
free_cma = zone_page_state(z, NR_FREE_CMA_PAGES);

But after this part of code
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;
CMA memory in each order is part of z->free_area[o].nr_free, then the CMA
page number of this order is substructed twice.  This bug will make
__zone_watermark_ok return more false.

This patch add cma_free_area to struct free_area that just record the number
of CMA pages.  And add it back in the order loop to handle the substruct
twice issue.

[1] https://lkml.org/lkml/2014/12/25/43
[2] https://lkml.org/lkml/2014/5/28/110

Signed-off-by: Hui Zhu 
Signed-off-by: Weixing Liu 
---
 include/linux/mmzone.h |  3 +++
 mm/page_alloc.c| 22 ++
 2 files changed, 25 insertions(+)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 2f0856d..094476b 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -92,6 +92,9 @@ static inline int get_pfnblock_migratetype(struct page *page, 
unsigned long pfn)
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
+#ifdef CONFIG_CMA
+   unsigned long   cma_nr_free;
+#endif
 };
 
 struct pglist_data;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 7633c50..026cf27 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -650,6 +650,8 @@ static inline void __free_one_page(struct page *page,
} else {
list_del(&buddy->lru);
zone->free_area[order].nr_free--;
+   if (is_migrate_cma(migratetype))
+   zone->free_area[order].cma_nr_free--;
rmv_page_order(buddy);
}
combined_idx = buddy_idx & page_idx;
@@ -683,6 +685,8 @@ static inline void __free_one_page(struct page *page,
list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
 out:
zone->free_area[order].nr_free++;
+   if (is_migrate_cma(migratetype))
+   zone->free_area[order].cma_nr_free++;
 }
 
 static inline int free_pages_check(struct page *page)
@@ -937,6 +941,8 @@ static inline void expand(struct zone *zone, struct page 
*page,
}
list_add(&page[size].lru, &area->free_list[migratetype]);
area->nr_free++;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free++;
set_page_order(&page[size], high);
}
 }
@@ -1020,6 +1026,8 @@ struct page *__rmqueue_smallest(struct zone *zone, 
unsigned int order,
list_del(&page->lru);
rmv_page_order(page);
area->nr_free--;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free--;
expand(zone, page, order, current_order, area, migratetype);
set_freepage_migratetype(page, migratetype);
return page;
@@ -1208,6 +1216,8 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, 
int start_migratetype)
page = list_entry(area->free_list[migratetype].next,
struct page, lru);
area->nr_free--;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free--;
 
new_type = try_to_steal_freepages(zone, page,
  start_migratetype,
@@ -1597,6 +1607,8 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
/* Remove page from free list */
list_del(&page->lru);
zone->free_area[order].nr_free--;
+   if (is_migrate_cma(mt))
+   zone->free_area[order].cma_nr_free--;
rmv_page_order(page);
 
/* Set the pageblock if the isolated page is at least a pageblock */
@@ -1827,6 +1839,13 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;
 
+   /* If CMA's page number of this order was substructed as part
+  

Re: [PATCH 1/3] CMA: Fix the bug that CMA's page number is substructed twice

2014-12-30 Thread Hui Zhu
On Tue, Dec 30, 2014 at 12:48 PM, Joonsoo Kim  wrote:
> On Thu, Dec 25, 2014 at 05:43:26PM +0800, Hui Zhu wrote:
>> In Joonsoo's CMA patch "CMA: always treat free cma pages as non-free on
>> watermark checking" [1], it changes __zone_watermark_ok to substruct CMA
>> pages number from free_pages if system use CMA:
>>   if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
>>   free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);
>
> Hello,
>
> In fact, without that patch, watermark checking has a problem in current 
> kernel.
> If there is reserved CMA region, watermark check for high order
> allocation is done loosly. See following thread.
>
> https://lkml.org/lkml/2014/5/30/320
>
> Your patch can fix this situation, so, how about submitting this patch
> separately?
>
> Thanks.
>

Hi Joonsoo,

Thanks for your remind.  I will post a separate patch for current kernel.

Thanks,
Hui
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3] CMA: Handle the issues of aggressively allocate the

2014-12-25 Thread Hui Zhu
I tried the Joonsoo's CMA patches [1] in my part and found that they works
better than mine [2] about handle LRU and other issues even if they
don't shrink the memory before cma_alloc.  So I began to test it in my
part.
But my colleague Weixing found some issues around it.  So we make 2 patches to
handle the issues.
And I merged cma_alloc_counter from [2] to cma_alloc work better.

This patchset is based on aa39477b5692611b91ac9455ae588738852b3f60 and [1].

[1] https://lkml.org/lkml/2014/5/28/64
[2] https://lkml.org/lkml/2014/10/15/623

Hui Zhu (3):
CMA: Fix the bug that CMA's page number is substructed twice
CMA: Fix the issue that nr_try_movable just count MIGRATE_MOVABLE memory
CMA: Add cma_alloc_counter to make cma_alloc work better if it meet busy range

 include/linux/cma.h|2 +
 include/linux/mmzone.h |3 +
 mm/cma.c   |6 +++
 mm/page_alloc.c|   76 ++---
 4 files changed, 65 insertions(+), 22 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/3] CMA: Add cma_alloc_counter to make cma_alloc work better if it meet busy range

2014-12-25 Thread Hui Zhu
In [1], Joonsoo said that cma_alloc_counter is useless because pageblock
is isolated.
But if alloc_contig_range meet a busy range, it will undo_isolate_page_range
before goto try next range. At this time, __rmqueue_cma can begin allocd
CMA memory from the range.

So I add cma_alloc_counter let __rmqueue doesn't call __rmqueue_cma when
cma_alloc works.

[1] https://lkml.org/lkml/2014/10/24/26

Signed-off-by: Hui Zhu 
---
 include/linux/cma.h | 2 ++
 mm/cma.c| 6 ++
 mm/page_alloc.c | 8 +++-
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 9384ba6..155158f 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -26,6 +26,8 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 extern int cma_init_reserved_mem(phys_addr_t base,
phys_addr_t size, int order_per_bit,
struct cma **res_cma);
+
+extern atomic_t cma_alloc_counter;
 extern struct page *cma_alloc(struct cma *cma, int count, unsigned int align);
 extern bool cma_release(struct cma *cma, struct page *pages, int count);
 #endif
diff --git a/mm/cma.c b/mm/cma.c
index 6707b5d..b63f6be 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -348,6 +348,8 @@ err:
return ret;
 }
 
+atomic_t cma_alloc_counter = ATOMIC_INIT(0);
+
 /**
  * cma_alloc() - allocate pages from contiguous area
  * @cma:   Contiguous memory region for which the allocation is performed.
@@ -378,6 +380,8 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned 
int align)
bitmap_maxno = cma_bitmap_maxno(cma);
bitmap_count = cma_bitmap_pages_to_bits(cma, count);
 
+   atomic_inc(&cma_alloc_counter);
+
for (;;) {
mutex_lock(&cma->lock);
bitmap_no = bitmap_find_next_zero_area_off(cma->bitmap,
@@ -415,6 +419,8 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned 
int align)
start = bitmap_no + mask + 1;
}
 
+   atomic_dec(&cma_alloc_counter);
+
pr_debug("%s(): returned %p\n", __func__, page);
return page;
 }
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a5bbc38..0622c4c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -66,6 +66,10 @@
 #include 
 #include "internal.h"
 
+#ifdef CONFIG_CMA
+#include 
+#endif
+
 /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
 static DEFINE_MUTEX(pcp_batch_high_lock);
 #define MIN_PERCPU_PAGELIST_FRACTION   (8)
@@ -1330,7 +1334,9 @@ static struct page *__rmqueue(struct zone *zone, unsigned 
int order,
 {
struct page *page = NULL;
 
-   if (IS_ENABLED(CONFIG_CMA) && zone->managed_cma_pages) {
+   if (IS_ENABLED(CONFIG_CMA)
+   && zone->managed_cma_pages
+   && atomic_read(&cma_alloc_counter) == 0) {
if (migratetype == MIGRATE_MOVABLE
&& zone->nr_try_movable <= 0)
page = __rmqueue_cma(zone, order);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3] CMA: Fix the bug that CMA's page number is substructed twice

2014-12-25 Thread Hui Zhu
In Joonsoo's CMA patch "CMA: always treat free cma pages as non-free on
watermark checking" [1], it changes __zone_watermark_ok to substruct CMA
pages number from free_pages if system use CMA:
if (IS_ENABLED(CONFIG_CMA) && z->managed_cma_pages)
free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES);

But after this part of code
for (o = 0; o < order; o++) {
/* At the next order, this order's pages become unavailable */
free_pages -= z->free_area[o].nr_free << o;
CMA memory in each order is part of z->free_area[o].nr_free, then the CMA
page number of this order is substructed twice.  This bug will make
__zone_watermark_ok return more false.

This patch add cma_free_area to struct free_area that just record the number
of CMA pages.  And add it back in the order loop to handle the substruct
twice issue.

[1] https://lkml.org/lkml/2014/5/28/110

Signed-off-by: Hui Zhu 
Signed-off-by: Weixing Liu 
---
 include/linux/mmzone.h |  3 +++
 mm/page_alloc.c| 29 -
 2 files changed, 31 insertions(+), 1 deletion(-)

diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index ee1ce1f..7ccad93 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -92,6 +92,9 @@ static inline int get_pfnblock_migratetype(struct page *page, 
unsigned long pfn)
 struct free_area {
struct list_headfree_list[MIGRATE_TYPES];
unsigned long   nr_free;
+#ifdef CONFIG_CMA
+   unsigned long   cma_nr_free;
+#endif
 };
 
 struct pglist_data;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 1b6c82c..a8d9f03 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -650,6 +650,8 @@ static inline void __free_one_page(struct page *page,
} else {
list_del(&buddy->lru);
zone->free_area[order].nr_free--;
+   if (is_migrate_cma(migratetype))
+   zone->free_area[order].cma_nr_free--;
rmv_page_order(buddy);
}
combined_idx = buddy_idx & page_idx;
@@ -683,6 +685,8 @@ static inline void __free_one_page(struct page *page,
list_add(&page->lru, &zone->free_area[order].free_list[migratetype]);
 out:
zone->free_area[order].nr_free++;
+   if (is_migrate_cma(migratetype))
+   zone->free_area[order].cma_nr_free++;
 }
 
 static inline int free_pages_check(struct page *page)
@@ -987,6 +991,8 @@ static inline void expand(struct zone *zone, struct page 
*page,
}
list_add(&page[size].lru, &area->free_list[migratetype]);
area->nr_free++;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free++;
set_page_order(&page[size], high);
}
 }
@@ -1070,6 +1076,8 @@ struct page *__rmqueue_smallest(struct zone *zone, 
unsigned int order,
list_del(&page->lru);
rmv_page_order(page);
area->nr_free--;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free--;
expand(zone, page, order, current_order, area, migratetype);
set_freepage_migratetype(page, migratetype);
return page;
@@ -1258,6 +1266,8 @@ __rmqueue_fallback(struct zone *zone, unsigned int order, 
int start_migratetype)
page = list_entry(area->free_list[migratetype].next,
struct page, lru);
area->nr_free--;
+   if (is_migrate_cma(migratetype))
+   area->cma_nr_free--;
 
new_type = try_to_steal_freepages(zone, page,
  start_migratetype,
@@ -1682,6 +1692,8 @@ int __isolate_free_page(struct page *page, unsigned int 
order)
/* Remove page from free list */
list_del(&page->lru);
zone->free_area[order].nr_free--;
+   if (is_migrate_cma(mt))
+   zone->free_area[order].cma_nr_free--;
rmv_page_order(page);
 
/* Set the pageblock if the isolated page is at least a pageblock */
@@ -1893,6 +1905,9 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
/* free_pages may go negative - that's OK */
long min = mark;
int o;
+#ifdef CONFIG_CMA
+   bool cma_is_subbed = false;
+#endif
 
free_pages -= (1 << order) - 1;
if (alloc_flags & ALLOC_HIGH)
@@ -1905,8 +1920,10 @@ static bool __zone_watermark_ok(struct zone *z, unsigned 
int order,
 * unmovable/reclaimable allocation and they can suddenly
 * vanish through CMA allocation
 */
-   if (IS_ENABLED(CONFIG_CMA)

[PATCH 2/3] CMA: Fix the issue that nr_try_movable just count MIGRATE_MOVABLE memory

2014-12-25 Thread Hui Zhu
One of my plotform that use Joonsoo's CMA patch [1] has a device that
will alloc a lot of MIGRATE_UNMOVABLE memory when it works in a zone.
When this device works, the memory status of this zone is not OK.  Most of
CMA is not allocated but most normal memory is allocated.
This issue is because in __rmqueue:
if (IS_ENABLED(CONFIG_CMA) &&
migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
page = __rmqueue_cma(zone, order);
Just allocated MIGRATE_MOVABLE will be record in nr_try_movable in function
__rmqueue_cma but not the others.  This device allocated a lot of
MIGRATE_UNMOVABLE memory affect the behavior of this zone memory allocation.

This patch change __rmqueue to let nr_try_movable record all the memory
allocation of normal memory.

[1] https://lkml.org/lkml/2014/5/28/64

Signed-off-by: Hui Zhu 
Signed-off-by: Weixing Liu 
---
 mm/page_alloc.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index a8d9f03..a5bbc38 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1301,28 +1301,23 @@ static struct page *__rmqueue_cma(struct zone *zone, 
unsigned int order)
 {
struct page *page;
 
-   if (zone->nr_try_movable > 0)
-   goto alloc_movable;
+   if (zone->nr_try_cma <= 0) {
+   /* Reset counter */
+   zone->nr_try_movable = zone->max_try_movable;
+   zone->nr_try_cma = zone->max_try_cma;
 
-   if (zone->nr_try_cma > 0) {
-   /* Okay. Now, we can try to allocate the page from cma region */
-   zone->nr_try_cma -= 1 << order;
-   page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
-
-   /* CMA pages can vanish through CMA allocation */
-   if (unlikely(!page && order == 0))
-   zone->nr_try_cma = 0;
-
-   return page;
+   return NULL;
}
 
-   /* Reset counter */
-   zone->nr_try_movable = zone->max_try_movable;
-   zone->nr_try_cma = zone->max_try_cma;
+   /* Okay. Now, we can try to allocate the page from cma region */
+   zone->nr_try_cma -= 1 << order;
+   page = __rmqueue_smallest(zone, order, MIGRATE_CMA);
 
-alloc_movable:
-   zone->nr_try_movable -= 1 << order;
-   return NULL;
+   /* CMA pages can vanish through CMA allocation */
+   if (unlikely(!page && order == 0))
+   zone->nr_try_cma = 0;
+
+   return page;
 }
 #endif
 
@@ -1335,9 +1330,13 @@ static struct page *__rmqueue(struct zone *zone, 
unsigned int order,
 {
struct page *page = NULL;
 
-   if (IS_ENABLED(CONFIG_CMA) &&
-   migratetype == MIGRATE_MOVABLE && zone->managed_cma_pages)
-   page = __rmqueue_cma(zone, order);
+   if (IS_ENABLED(CONFIG_CMA) && zone->managed_cma_pages) {
+   if (migratetype == MIGRATE_MOVABLE
+   && zone->nr_try_movable <= 0)
+   page = __rmqueue_cma(zone, order);
+   else
+   zone->nr_try_movable -= 1 << order;
+   }
 
 retry_reserve:
if (!page)
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] (CMA_AGGRESSIVE) Update page alloc function

2014-11-27 Thread Hui Zhu
On Fri, Oct 24, 2014 at 1:28 PM, Joonsoo Kim  wrote:
> On Thu, Oct 16, 2014 at 11:35:51AM +0800, Hui Zhu wrote:
>> If page alloc function __rmqueue try to get pages from MIGRATE_MOVABLE and
>> conditions (cma_alloc_counter, cma_aggressive_free_min, cma_alloc_counter)
>> allow, MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.
>>
>> Signed-off-by: Hui Zhu 
>> ---
>>  mm/page_alloc.c | 42 +++---
>>  1 file changed, 31 insertions(+), 11 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 736d8e1..87bc326 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -65,6 +65,10 @@
>>  #include 
>>  #include "internal.h"
>>
>> +#ifdef CONFIG_CMA_AGGRESSIVE
>> +#include 
>> +#endif
>> +
>>  /* prevent >1 _updater_ of zone percpu pageset ->high and ->batch fields */
>>  static DEFINE_MUTEX(pcp_batch_high_lock);
>>  #define MIN_PERCPU_PAGELIST_FRACTION (8)
>> @@ -1189,20 +1193,36 @@ static struct page *__rmqueue(struct zone *zone, 
>> unsigned int order,
>>  {
>>   struct page *page;
>>
>> -retry_reserve:
>> +#ifdef CONFIG_CMA_AGGRESSIVE
>> + if (cma_aggressive_switch
>> + && migratetype == MIGRATE_MOVABLE
>> + && atomic_read(&cma_alloc_counter) == 0
>> + && global_page_state(NR_FREE_CMA_PAGES) > cma_aggressive_free_min
>> + + (1 << order))
>> + migratetype = MIGRATE_CMA;
>> +#endif
>> +retry:
>
> I don't get it why cma_alloc_counter should be tested.
> When cma alloc is progress, pageblock is isolated so that pages on that
> pageblock cannot be allocated. Why should we prevent aggressive
> allocation in this case?
>

Hi Joonsoo,

Even if the pageblock is isolated in the begin of function
alloc_contig_range, it will unisolate if alloc_contig_range get some
error for example "PFNs busy".  And the cma_alloc will keep call
alloc_contig_range with another address if need.

So it will decrease the contradiction between CMA allocation in
cma_alloc and __rmqueue with  cma_alloc_counter.

Thanks,
Hui

> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-11-04 Thread Hui Zhu
On Tue, Nov 4, 2014 at 3:53 PM, Minchan Kim  wrote:
> Hello,
>
> On Wed, Oct 29, 2014 at 03:43:33PM +0100, Vlastimil Babka wrote:
>> On 10/16/2014 10:55 AM, Laura Abbott wrote:
>> >On 10/15/2014 8:35 PM, Hui Zhu wrote:
>> >
>> >It's good to see another proposal to fix CMA utilization. Do you have
>> >any data about the success rate of CMA contiguous allocation after
>> >this patch series? I played around with a similar approach of using
>> >CMA for MIGRATE_MOVABLE allocations and found that although utilization
>> >did increase, contiguous allocations failed at a higher rate and were
>> >much slower. I see what this series is trying to do with avoiding
>> >allocation from CMA pages when a contiguous allocation is progress.
>> >My concern is that there would still be problems with contiguous
>> >allocation after all the MIGRATE_MOVABLE fallback has happened.
>>
>> Hi,
>>
>> did anyone try/suggest the following idea?
>>
>> - keep CMA as fallback to MOVABLE as is is now, i.e. non-agressive
>> - when UNMOVABLE (RECLAIMABLE also?) allocation fails and CMA
>> pageblocks have space, don't OOM immediately, but first try to
>> migrate some MOVABLE pages to CMA pageblocks, to make space for the
>> UNMOVABLE allocation in non-CMA pageblocks
>> - this should keep CMA pageblocks free as long as possible and
>> useful for CMA allocations, but without restricting the non-MOVABLE
>> allocations even though there is free memory (but in CMA pageblocks)
>> - the fact that a MOVABLE page could be successfully migrated to CMA
>> pageblock, means it was not pinned or otherwise non-migratable, so
>> there's a good chance it can be migrated back again if CMA
>> pageblocks need to be used by CMA allocation
>
> I suggested exactly same idea long time ago.
>
>> - it's more complex, but I guess we have most of the necessary
>> infrastructure in compaction already :)
>
> I agree but still, it doesn't solve reclaim problem(ie, VM doesn't
> need to reclaim CMA pages when memory pressure of unmovable pages
> happens). Of course, we could make VM be aware of that via introducing
> new flag of __isolate_lru_page.
>
> However, I'd like to think CMA design from the beginning.
> It made page allocation logic complicated, even very fragile as we
> had recently and now we need to add new logics to migrate like you said.
> As well, we need to fix reclaim path, too.
>
> It makes mm complicated day by day even though it doesn't do the role
> enough well(ie, big latency and frequent allocation failure) so I really
> want to stop making the mess bloated.
>
> Long time ago, when I saw Joonsoo's CMA agressive allocation patchset
> (ie, roundrobin allocation between CMA and normal movable pages)
> it was good to me at a first glance but it needs tweak of allocation
> path and doesn't solve reclaim path, either. Yes, reclaim path could
> be solved by another patch but I want to solve it altogether.
>
> At that time, I suggested big surgery to Joonsoo in offline that
> let's move CMA allocation with movable zone allocation. With it,
> we could make allocation/reclaim path simple but thing is we should
> make VM be aware of overlapping MOVABLE zone which means some of pages
> in the zone could be part of another zones but I think we already have
> logics to handle it when I read comment in isolate_freepages so I think
> the design should work.

Thanks.

>
> A thing you guys might worry is bigger CMA latency because it makes
> CMA memory usage ratio higher than the approach you mentioned but
> anyone couldn't guarantee it once memory is fully utilized.
> In addition, we have used fair zone allocator policy so it makes
> round robin allocation automatically so I believe it should be way
> to go.

Even if kernel use it to allocate the CMA memory, CMA alloc latency
will happen if most of memory is allocated and driver try to get CMA
memory.
https://lkml.org/lkml/2014/10/17/129
https://lkml.org/lkml/2014/10/17/130
These patches let cma_alloc do a shrink with function
shrink_all_memory_for_cma if need.  It handle a lot of latency issue
in my part.
And I think it can be more configurable for example some device use it
and others not.

Thanks,
Hui



>
>>
>> Thoughts?
>> Vlastimil
>>
>> >Thanks,
>> >Laura
>> >
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majord...@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
>
> --
> Kind regards,
> Minchan Kim
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-11-03 Thread Hui Zhu
On Wed, Oct 29, 2014 at 10:43 PM, Vlastimil Babka  wrote:
> On 10/16/2014 10:55 AM, Laura Abbott wrote:
>>
>> On 10/15/2014 8:35 PM, Hui Zhu wrote:
>>
>> It's good to see another proposal to fix CMA utilization. Do you have
>> any data about the success rate of CMA contiguous allocation after
>> this patch series? I played around with a similar approach of using
>> CMA for MIGRATE_MOVABLE allocations and found that although utilization
>> did increase, contiguous allocations failed at a higher rate and were
>> much slower. I see what this series is trying to do with avoiding
>> allocation from CMA pages when a contiguous allocation is progress.
>> My concern is that there would still be problems with contiguous
>> allocation after all the MIGRATE_MOVABLE fallback has happened.
>
>
> Hi,
>
> did anyone try/suggest the following idea?
>
> - keep CMA as fallback to MOVABLE as is is now, i.e. non-agressive
> - when UNMOVABLE (RECLAIMABLE also?) allocation fails and CMA pageblocks
> have space, don't OOM immediately, but first try to migrate some MOVABLE
> pages to CMA pageblocks, to make space for the UNMOVABLE allocation in
> non-CMA pageblocks
> - this should keep CMA pageblocks free as long as possible and useful for
> CMA allocations, but without restricting the non-MOVABLE allocations even
> though there is free memory (but in CMA pageblocks)
> - the fact that a MOVABLE page could be successfully migrated to CMA
> pageblock, means it was not pinned or otherwise non-migratable, so there's a
> good chance it can be migrated back again if CMA pageblocks need to be used
> by CMA allocation
> - it's more complex, but I guess we have most of the necessary
> infrastructure in compaction already :)

I think this idea make CMA allocation part become complex but make
balance and shrink code become easy because it make CMA become real
memory.
I just worry about the speed of migrate memory with this idea.  :)

Thanks,
Hui


>
> Thoughts?
> Vlastimil
>
>> Thanks,
>> Laura
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v5 2/4] mm/page_alloc: add freepage on isolate pageblock to correct buddy list

2014-11-03 Thread Hui Zhu
On Mon, Nov 3, 2014 at 4:22 PM, Heesub Shin  wrote:
> Hello,
>
>
> On 10/31/2014 04:25 PM, Joonsoo Kim wrote:
>>
>> In free_pcppages_bulk(), we use cached migratetype of freepage
>> to determine type of buddy list where freepage will be added.
>> This information is stored when freepage is added to pcp list, so
>> if isolation of pageblock of this freepage begins after storing,
>> this cached information could be stale. In other words, it has
>> original migratetype rather than MIGRATE_ISOLATE.
>>
>> There are two problems caused by this stale information. One is that
>> we can't keep these freepages from being allocated. Although this
>> pageblock is isolated, freepage will be added to normal buddy list
>> so that it could be allocated without any restriction. And the other
>> problem is incorrect freepage accounting. Freepages on isolate pageblock
>> should not be counted for number of freepage.
>>
>> Following is the code snippet in free_pcppages_bulk().
>>
>> /* MIGRATE_MOVABLE list may include MIGRATE_RESERVEs */
>> __free_one_page(page, page_to_pfn(page), zone, 0, mt);
>> trace_mm_page_pcpu_drain(page, 0, mt);
>> if (likely(!is_migrate_isolate_page(page))) {
>> __mod_zone_page_state(zone, NR_FREE_PAGES, 1);
>> if (is_migrate_cma(mt))
>> __mod_zone_page_state(zone, NR_FREE_CMA_PAGES, 1);
>> }
>>
>> As you can see above snippet, current code already handle second problem,
>> incorrect freepage accounting, by re-fetching pageblock migratetype
>> through is_migrate_isolate_page(page). But, because this re-fetched
>> information isn't used for __free_one_page(), first problem would not be
>> solved. This patch try to solve this situation to re-fetch pageblock
>> migratetype before __free_one_page() and to use it for __free_one_page().
>>
>> In addition to move up position of this re-fetch, this patch use
>> optimization technique, re-fetching migratetype only if there is
>> isolate pageblock. Pageblock isolation is rare event, so we can
>> avoid re-fetching in common case with this optimization.
>>
>> This patch also correct migratetype of the tracepoint output.
>>
>> Cc: 
>> Acked-by: Minchan Kim 
>> Acked-by: Michal Nazarewicz 
>> Acked-by: Vlastimil Babka 
>> Signed-off-by: Joonsoo Kim 
>> ---
>>   mm/page_alloc.c |   13 -
>>   1 file changed, 8 insertions(+), 5 deletions(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index f7a867e..6df23fe 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -725,14 +725,17 @@ static void free_pcppages_bulk(struct zone *zone,
>> int count,
>> /* must delete as __free_one_page list manipulates
>> */
>> list_del(&page->lru);
>> mt = get_freepage_migratetype(page);
>> +   if (unlikely(has_isolate_pageblock(zone))) {
>
>
> How about adding an additional check for 'mt == MIGRATE_MOVABLE' here? Then,
> most of get_pageblock_migratetype() calls could be avoided while the
> isolation is in progress. I am not sure this is the case on memory
> offlining. How do you think?

I think the reason is that this "mt" may be not the right value of this page.
It is set without zone->lock.

Thanks,
Hui

>
>> +   mt = get_pageblock_migratetype(page);
>> +   if (is_migrate_isolate(mt))
>> +   goto skip_counting;
>> +   }
>> +   __mod_zone_freepage_state(zone, 1, mt);
>> +
>> +skip_counting:
>> /* MIGRATE_MOVABLE list may include
>> MIGRATE_RESERVEs */
>> __free_one_page(page, page_to_pfn(page), zone, 0,
>> mt);
>> trace_mm_page_pcpu_drain(page, 0, mt);
>> -   if (likely(!is_migrate_isolate_page(page))) {
>> -   __mod_zone_page_state(zone, NR_FREE_PAGES,
>> 1);
>> -   if (is_migrate_cma(mt))
>> -   __mod_zone_page_state(zone,
>> NR_FREE_CMA_PAGES, 1);
>> -   }
>> } while (--to_free && --batch_free && !list_empty(list));
>> }
>> spin_unlock(&zone->lock);
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-11-02 Thread Hui Zhu
On Fri, Oct 24, 2014 at 1:25 PM, Joonsoo Kim  wrote:
> On Thu, Oct 16, 2014 at 11:35:47AM +0800, Hui Zhu wrote:
>> In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
>> MIGRATE_MOVABLE.
>> MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
>> order that Linux kernel want.
>>
>> If a system that has a lot of user space program is running, for
>> instance, an Android board, most of memory is in MIGRATE_MOVABLE and
>> allocated.  Before function __rmqueue_fallback get memory from
>> MIGRATE_CMA, the oom_killer will kill a task to release memory when
>> kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
>> MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
>> This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
>> kernel kill some tasks to release memory.
>>
>> This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
>> be more aggressive about allocation.
>> If function CMA_AGGRESSIVE is available, when Linux kernel call function
>> __rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
>> MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
>> doesn't have enough pages for allocation, go back to allocate memory from
>> MIGRATE_MOVABLE.
>> Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
>> MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.
>
> Hello,
>
> I did some work similar to this.
> Please reference following links.
>
> https://lkml.org/lkml/2014/5/28/64
> https://lkml.org/lkml/2014/5/28/57

> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.

This issue can be handle with some change around shrink code.  I am
trying to integrate  a patch for them.
But I am not sure we met the same issue.  Do you mind give me more
info about this part?

>
> And, aggressive allocation should be postponed until freepage counting
> bug is fixed, because aggressive allocation enlarge the possiblity
> of problem occurence. I tried to fix that bug, too. See following link.
>
> https://lkml.org/lkml/2014/10/23/90

I am following these patches.  They are great!  Thanks for your work.

Best,
Hui

>
> Thanks.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majord...@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: mailto:"d...@kvack.org";> em...@kvack.org 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/3] CMA: aggressively allocate the pages on cma reserved memory when not used

2014-10-30 Thread Hui Zhu
On Wed, May 28, 2014 at 3:04 PM, Joonsoo Kim  wrote:
> CMA is introduced to provide physically contiguous pages at runtime.
> For this purpose, it reserves memory at boot time. Although it reserve
> memory, this reserved memory can be used for movable memory allocation
> request. This usecase is beneficial to the system that needs this CMA
> reserved memory infrequently and it is one of main purpose of
> introducing CMA.
>
> But, there is a problem in current implementation. The problem is that
> it works like as just reserved memory approach. The pages on cma reserved
> memory are hardly used for movable memory allocation. This is caused by
> combination of allocation and reclaim policy.
>
> The pages on cma reserved memory are allocated if there is no movable
> memory, that is, as fallback allocation. So the time this fallback
> allocation is started is under heavy memory pressure. Although it is under
> memory pressure, movable allocation easily succeed, since there would be
> many pages on cma reserved memory. But this is not the case for unmovable
> and reclaimable allocation, because they can't use the pages on cma
> reserved memory. These allocations regard system's free memory as
> (free pages - free cma pages) on watermark checking, that is, free
> unmovable pages + free reclaimable pages + free movable pages. Because
> we already exhausted movable pages, only free pages we have are unmovable
> and reclaimable types and this would be really small amount. So watermark
> checking would be failed. It will wake up kswapd to make enough free
> memory for unmovable and reclaimable allocation and kswapd will do.
> So before we fully utilize pages on cma reserved memory, kswapd start to
> reclaim memory and try to make free memory over the high watermark. This
> watermark checking by kswapd doesn't take care free cma pages so many
> movable pages would be reclaimed. After then, we have a lot of movable
> pages again, so fallback allocation doesn't happen again. To conclude,
> amount of free memory on meminfo which includes free CMA pages is moving
> around 512 MB if I reserve 512 MB memory for CMA.
>
> I found this problem on following experiment.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> CMA reserve:0 MB512 MB
> Elapsed-time:   225.2   472.5
> Average-MemFree:322490 KB   630839 KB
>
> To solve this problem, I can think following 2 possible solutions.
> 1. allocate the pages on cma reserved memory first, and if they are
>exhausted, allocate movable pages.
> 2. interleaved allocation: try to allocate specific amounts of memory
>from cma reserved memory and then allocate from free movable memory.
>
> I tested #1 approach and found the problem. Although free memory on
> meminfo can move around low watermark, there is large fluctuation on free
> memory, because too many pages are reclaimed when kswapd is invoked.
> Reason for this behaviour is that successive allocated CMA pages are
> on the LRU list in that order and kswapd reclaim them in same order.
> These memory doesn't help watermark checking from kwapd, so too many
> pages are reclaimed, I guess.

Could you send more information about this part?  I want to do some
test around it.
I use this way in my patch.

Thanks,
Hui

>
> So, I implement #2 approach.
> One thing I should note is that we should not change allocation target
> (movable list or cma) on each allocation attempt, since this prevent
> allocated pages to be in physically succession, so some I/O devices can
> be hurt their performance. To solve this, I keep allocation target
> in at least pageblock_nr_pages attempts and make this number reflect
> ratio, free pages without free cma pages to free cma pages. With this
> approach, system works very smoothly and fully utilize the pages on
> cma reserved memory.
>
> Following is the experimental result of this patch.
>
> 4 CPUs, 1024 MB, VIRTUAL MACHINE
> make -j16
>
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   225.2   472.5
> Average-MemFree:322490 KB   630839 KB
> nr_free_cma:0   131068
> pswpin: 0   261666
> pswpout:75  1241363
>
> 
> CMA reserve:0 MB512 MB
> Elapsed-time:   222.7   224
> Average-MemFree:325595 KB   393033 KB
> nr_free_cma:0   61001
> pswpin: 0   6
> pswpout:44  502
>
> There is no difference if we don't have cma reserved memory (0 MB case).
> But, with cma reserved memory (512 MB case), we fully utiliz

[PATCH v2 3/4] (CMA_AGGRESSIVE) Update reserve custom contiguous area code

2014-10-17 Thread Hui Zhu
Update this patch according to the comments from Rafael.

Add cma_alloc_counter, cma_aggressive_switch, cma_aggressive_free_min and
cma_aggressive_shrink_switch.

cma_aggressive_switch is the swith for all CMA_AGGRESSIVE function.  It can be
controlled by sysctl vm.cma-aggressive-switch.

cma_aggressive_free_min can be controlled by sysctl
"vm.cma-aggressive-free-min".  If the number of CMA free pages is small than
this sysctl value, CMA_AGGRESSIVE will not work in page alloc code.

cma_aggressive_shrink_switch can be controlled by sysctl
"vm.cma-aggressive-shrink-switch".  If sysctl "vm.cma-aggressive-shrink-switch"
is true and free normal memory's size is smaller than the size that it want to
allocate, do memory shrink with function git commit -a --amend before driver
allocate pages from CMA.

When Linux kernel try to reserve custom contiguous area, increase the value of
cma_alloc_counter.  CMA_AGGRESSIVE will not work in page alloc code.
After reserve custom contiguous area function return, decreases the value of
cma_alloc_counter.

Signed-off-by: Hui Zhu 
---
 include/linux/cma.h |  7 +++
 kernel/sysctl.c | 27 +++
 mm/cma.c| 54 +
 3 files changed, 88 insertions(+)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index 0430ed0..df96abf 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -15,6 +15,13 @@
 
 struct cma;
 
+#ifdef CONFIG_CMA_AGGRESSIVE
+extern atomic_t cma_alloc_counter;
+extern int cma_aggressive_switch;
+extern unsigned long cma_aggressive_free_min;
+extern int cma_aggressive_shrink_switch;
+#endif
+
 extern phys_addr_t cma_get_base(struct cma *cma);
 extern unsigned long cma_get_size(struct cma *cma);
 
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 4aada6d..646929e2 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -92,6 +92,10 @@
 #include 
 #endif
 
+#ifdef CONFIG_CMA_AGGRESSIVE
+#include 
+#endif
+
 
 #if defined(CONFIG_SYSCTL)
 
@@ -1485,6 +1489,29 @@ static struct ctl_table vm_table[] = {
.mode   = 0644,
.proc_handler   = proc_doulongvec_minmax,
},
+#ifdef CONFIG_CMA_AGGRESSIVE
+   {
+   .procname   = "cma-aggressive-switch",
+   .data   = &cma_aggressive_switch,
+   .maxlen = sizeof(int),
+   .mode   = 0600,
+   .proc_handler   = proc_dointvec,
+   },
+   {
+   .procname   = "cma-aggressive-free-min",
+   .data   = &cma_aggressive_free_min,
+   .maxlen = sizeof(unsigned long),
+   .mode   = 0600,
+   .proc_handler   = proc_doulongvec_minmax,
+   },
+   {
+   .procname   = "cma-aggressive-shrink-switch",
+   .data   = &cma_aggressive_shrink_switch,
+   .maxlen = sizeof(int),
+   .mode   = 0600,
+   .proc_handler   = proc_dointvec,
+   },
+#endif
{ }
 };
 
diff --git a/mm/cma.c b/mm/cma.c
index 963bc4a..1cf341c 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -33,6 +33,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct cma {
unsigned long   base_pfn;
@@ -127,6 +128,27 @@ err:
return -EINVAL;
 }
 
+#ifdef CONFIG_CMA_AGGRESSIVE
+/* The counter for the dma_alloc_from_contiguous and
+   dma_release_from_contiguous.  */
+atomic_t cma_alloc_counter = ATOMIC_INIT(0);
+
+/* Swich of CMA_AGGRESSIVE.  */
+int cma_aggressive_switch __read_mostly;
+
+/* If the number of CMA free pages is small than this value, CMA_AGGRESSIVE 
will
+   not work. */
+#ifdef CONFIG_CMA_AGGRESSIVE_FREE_MIN
+unsigned long cma_aggressive_free_min __read_mostly =
+   CONFIG_CMA_AGGRESSIVE_FREE_MIN;
+#else
+unsigned long cma_aggressive_free_min __read_mostly = 500;
+#endif
+
+/* Swich of CMA_AGGRESSIVE shink.  */
+int cma_aggressive_shrink_switch __read_mostly;
+#endif
+
 static int __init cma_init_reserved_areas(void)
 {
int i;
@@ -138,6 +160,22 @@ static int __init cma_init_reserved_areas(void)
return ret;
}
 
+#ifdef CONFIG_CMA_AGGRESSIVE
+   cma_aggressive_switch = 0;
+#ifdef CONFIG_CMA_AGGRESSIVE_PHY_MAX
+   if (memblock_phys_mem_size() <= CONFIG_CMA_AGGRESSIVE_PHY_MAX)
+#else
+   if (memblock_phys_mem_size() <= 0x4000)
+#endif
+   cma_aggressive_switch = 1;
+
+   cma_aggressive_shrink_switch = 0;
+#ifdef CONFIG_CMA_AGGRESSIVE_SHRINK
+   if (cma_aggressive_switch)
+   cma_aggressive_shrink_switch = 1;
+#endif
+#endif
+
return 0;
 }
 core_initcall(cma_init_reserved_areas);
@@ -312,6 +350,11 @@ struct page *cma_alloc(struct cma *cma, int count, 
unsigned int align)
unsigned long bitmap_maxno, bitmap_no, bitmap_coun

[PATCH v2 2/4] (CMA_AGGRESSIVE) Add new function shrink_all_memory_for_cma

2014-10-17 Thread Hui Zhu
Update this patch according to the comments from Rafael.

Function shrink_all_memory_for_cma try to free `nr_to_reclaim' of memory.
CMA aggressive shrink function will call this functon to free `nr_to_reclaim' of
memory.

Signed-off-by: Hui Zhu 
---
 mm/vmscan.c | 58 +++---
 1 file changed, 43 insertions(+), 15 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..658dc8d 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3404,6 +3404,28 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
zone_type classzone_idx)
wake_up_interruptible(&pgdat->kswapd_wait);
 }
 
+#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE
+static unsigned long __shrink_all_memory(struct scan_control *sc)
+{
+   struct reclaim_state reclaim_state;
+   struct zonelist *zonelist = node_zonelist(numa_node_id(), sc->gfp_mask);
+   struct task_struct *p = current;
+   unsigned long nr_reclaimed;
+
+   p->flags |= PF_MEMALLOC;
+   lockdep_set_current_reclaim_state(sc->gfp_mask);
+   reclaim_state.reclaimed_slab = 0;
+   p->reclaim_state = &reclaim_state;
+
+   nr_reclaimed = do_try_to_free_pages(zonelist, sc);
+
+   p->reclaim_state = NULL;
+   lockdep_clear_current_reclaim_state();
+   p->flags &= ~PF_MEMALLOC;
+
+   return nr_reclaimed;
+}
+
 #ifdef CONFIG_HIBERNATION
 /*
  * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
@@ -3415,7 +3437,6 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
zone_type classzone_idx)
  */
 unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
 {
-   struct reclaim_state reclaim_state;
struct scan_control sc = {
.nr_to_reclaim = nr_to_reclaim,
.gfp_mask = GFP_HIGHUSER_MOVABLE,
@@ -3425,24 +3446,31 @@ unsigned long shrink_all_memory(unsigned long 
nr_to_reclaim)
.may_swap = 1,
.hibernation_mode = 1,
};
-   struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
-   struct task_struct *p = current;
-   unsigned long nr_reclaimed;
-
-   p->flags |= PF_MEMALLOC;
-   lockdep_set_current_reclaim_state(sc.gfp_mask);
-   reclaim_state.reclaimed_slab = 0;
-   p->reclaim_state = &reclaim_state;
 
-   nr_reclaimed = do_try_to_free_pages(zonelist, &sc);
+   return __shrink_all_memory(&sc);
+}
+#endif /* CONFIG_HIBERNATION */
 
-   p->reclaim_state = NULL;
-   lockdep_clear_current_reclaim_state();
-   p->flags &= ~PF_MEMALLOC;
+#ifdef CONFIG_CMA_AGGRESSIVE
+/*
+ * Try to free `nr_to_reclaim' of memory, system-wide, for CMA aggressive
+ * shrink function.
+ */
+void shrink_all_memory_for_cma(unsigned long nr_to_reclaim)
+{
+   struct scan_control sc = {
+   .nr_to_reclaim = nr_to_reclaim,
+   .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM,
+   .priority = DEF_PRIORITY,
+   .may_writepage = !laptop_mode,
+   .may_unmap = 1,
+   .may_swap = 1,
+   };
 
-   return nr_reclaimed;
+   __shrink_all_memory(&sc);
 }
-#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_CMA_AGGRESSIVE */
+#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
not required for correctness.  So if the last cpu in a node goes
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4] (CMA_AGGRESSIVE) Make CMA memory be more aggressive about allocation

2014-10-15 Thread Hui Zhu
In fallbacks of page_alloc.c, MIGRATE_CMA is the fallback of
MIGRATE_MOVABLE.
MIGRATE_MOVABLE will use MIGRATE_CMA when it doesn't have a page in
order that Linux kernel want.

If a system that has a lot of user space program is running, for
instance, an Android board, most of memory is in MIGRATE_MOVABLE and
allocated.  Before function __rmqueue_fallback get memory from
MIGRATE_CMA, the oom_killer will kill a task to release memory when
kernel want get MIGRATE_UNMOVABLE memory because fallbacks of
MIGRATE_UNMOVABLE are MIGRATE_RECLAIMABLE and MIGRATE_MOVABLE.
This status is odd.  The MIGRATE_CMA has a lot free memory but Linux
kernel kill some tasks to release memory.

This patch series adds a new function CMA_AGGRESSIVE to make CMA memory
be more aggressive about allocation.
If function CMA_AGGRESSIVE is available, when Linux kernel call function
__rmqueue try to get pages from MIGRATE_MOVABLE and conditions allow,
MIGRATE_CMA will be allocated as MIGRATE_MOVABLE first.  If MIGRATE_CMA
doesn't have enough pages for allocation, go back to allocate memory from
MIGRATE_MOVABLE.
Then the memory of MIGRATE_MOVABLE can be kept for MIGRATE_UNMOVABLE and
MIGRATE_RECLAIMABLE which doesn't have fallback MIGRATE_CMA.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4] (CMA_AGGRESSIVE) Add argument hibernation to function shrink_all_memory

2014-10-15 Thread Hui Zhu
Function shrink_all_memory try to free `nr_to_reclaim' of memory.
CMA_AGGRESSIVE_SHRINK function will call this functon to free `nr_to_reclaim' of
memory.  It need different scan_control with current caller function
hibernate_preallocate_memory.

If hibernation is true, the caller is hibernate_preallocate_memory.
if not, the caller is CMA alloc function.

Signed-off-by: Hui Zhu 
---
 include/linux/swap.h|  3 ++-
 kernel/power/snapshot.c |  2 +-
 mm/vmscan.c | 19 +--
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 37a585b..9f2cb43 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -335,7 +335,8 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct 
mem_cgroup *mem,
gfp_t gfp_mask, bool noswap,
struct zone *zone,
unsigned long *nr_scanned);
-extern unsigned long shrink_all_memory(unsigned long nr_pages);
+extern unsigned long shrink_all_memory(unsigned long nr_pages,
+  bool hibernation);
 extern int vm_swappiness;
 extern int remove_mapping(struct address_space *mapping, struct page *page);
 extern unsigned long vm_total_pages;
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c
index 791a618..a00fc35 100644
--- a/kernel/power/snapshot.c
+++ b/kernel/power/snapshot.c
@@ -1657,7 +1657,7 @@ int hibernate_preallocate_memory(void)
 * NOTE: If this is not done, performance will be hurt badly in some
 * test cases.
 */
-   shrink_all_memory(saveable - size);
+   shrink_all_memory(saveable - size, true);
 
/*
 * The number of saveable pages in memory was too high, so apply some
diff --git a/mm/vmscan.c b/mm/vmscan.c
index dcb4707..fdcfa30 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -3404,7 +3404,7 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
zone_type classzone_idx)
wake_up_interruptible(&pgdat->kswapd_wait);
 }
 
-#ifdef CONFIG_HIBERNATION
+#if defined CONFIG_HIBERNATION || defined CONFIG_CMA_AGGRESSIVE
 /*
  * Try to free `nr_to_reclaim' of memory, system-wide, and return the number of
  * freed pages.
@@ -3413,22 +3413,29 @@ void wakeup_kswapd(struct zone *zone, int order, enum 
zone_type classzone_idx)
  * LRU order by reclaiming preferentially
  * inactive > active > active referenced > active mapped
  */
-unsigned long shrink_all_memory(unsigned long nr_to_reclaim)
+unsigned long shrink_all_memory(unsigned long nr_to_reclaim, bool hibernation)
 {
struct reclaim_state reclaim_state;
struct scan_control sc = {
.nr_to_reclaim = nr_to_reclaim,
-   .gfp_mask = GFP_HIGHUSER_MOVABLE,
.priority = DEF_PRIORITY,
-   .may_writepage = 1,
.may_unmap = 1,
.may_swap = 1,
-   .hibernation_mode = 1,
};
struct zonelist *zonelist = node_zonelist(numa_node_id(), sc.gfp_mask);
struct task_struct *p = current;
unsigned long nr_reclaimed;
 
+   if (hibernation) {
+   sc.hibernation_mode = 1;
+   sc.may_writepage = 1;
+   sc.gfp_mask = GFP_HIGHUSER_MOVABLE;
+   } else {
+   sc.hibernation_mode = 0;
+   sc.may_writepage = !laptop_mode;
+   sc.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_HIGHMEM;
+   }
+
p->flags |= PF_MEMALLOC;
lockdep_set_current_reclaim_state(sc.gfp_mask);
reclaim_state.reclaimed_slab = 0;
@@ -3442,7 +3449,7 @@ unsigned long shrink_all_memory(unsigned long 
nr_to_reclaim)
 
return nr_reclaimed;
 }
-#endif /* CONFIG_HIBERNATION */
+#endif /* CONFIG_HIBERNATION || CONFIG_CMA_AGGRESSIVE */
 
 /* It's optimal to keep kswapds on the same CPUs as their memory, but
not required for correctness.  So if the last cpu in a node goes
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


<    1   2   3   4   5   6   >