Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2021-05-02 Thread Ben Hutchings
On Sun, 2021-05-02 at 07:30 +1000, Paul Szabo wrote:
> I no longer use 32-bit kernels (but use the 64-bit amd64 kernel, even on
> my few last remaining 32-bt machines): that seems a suitable workaround
> or upgrade path. Should I try to test whether the issue with PAE
> remains?

I don't think there's much point in investigating issues with 32-bit
and 16GB RAM - they will be "wontfix" upstream.

It's possible that this particular problem has been fixed by an mm
change in 4.2, but at the cost of a regression in disk throughput:
.

Ben.

-- 
Ben Hutchings
For every complex problem
there is a solution that is simple, neat, and wrong.


signature.asc
Description: This is a digitally signed message part


Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2021-05-01 Thread Paul Szabo
I no longer use 32-bit kernels (but use the 64-bit amd64 kernel, even on
my few last remaining 32-bt machines): that seems a suitable workaround
or upgrade path. Should I try to test whether the issue with PAE
remains?

Cheers, Paul
-- 
Paul Szabo   p...@maths.usyd.edu.au   www.maths.usyd.edu.au/u/psz
School of Mathematics and Statistics   University of SydneyAustralia

I support NTEU members taking a stand for workplace rights in the face of
poorly-run change management. Visit www.nteu.org.au/sydney to learn more.



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2021-05-01 Thread Salvatore Bonaccorso
Control: tags -1 + moreinfo

I guess this usse can be considered resolved?

Regards,
Salvatore



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-06 Thread Paul Szabo
Dear Ben,

 Please read Documentation/SubmittingPatches, use scripts/checkpatch.pl
 and try to provide a patch that is suitable for upstream inclusion.
 Also, your name belongs in the patch header, not in the code.

I changed the proposed patch accordingly, scripts/checkpatch.pl produces
just a few warnings. I had my patch in use for a while now, so I believe
it is suitably tested.

Please let me know if I need to do anything else.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia
Avoid OOM when filesystem caches fill lowmem and are not reclaimed,
doing drop_caches at that point. The issue is easily reproducible on
machines with over 32GB RAM. The patch correctly protects against OOM.
The added call to drop_caches has been observed to trigger needlessly
but on quite rare occasions only.

Also included are several minor fixes:
 - Comment about highmem_is_dirtyable that seems used only to calculate
   limits and threshholds, not used in any decisions.
 - In determine_dirtyable_memory() subtract min_free_kbytes from
   returned value. I believe this is right, that min_free_kbytes is
   not intended for dirty pages.
 - In bdi_position_ratio() get difference (setpoint-dirty) right even
   when it is negative, which happens often. Normally these numbers are
   small and even with left-shift I never observed a 32-bit overflow.
   I believe it should be possible to re-write the whole function in
   32-bit ints; maybe it is not worth the effort to make it efficient;
   seeing how this function was always wrong and we survived, it should
   simply be removed.
 - Comment in bdi_max_pause() that it seems to always return a too-small
   value, maybe it should simply return a fixed value.
 - Comment in balance_dirty_pages() about a test marked unlikely() but
   which I observe to be quite common.
 - Comment in __alloc_pages_slowpath() about did_some_progress being
   set twice, but only checked after the second setting, so the first
   setting is lost and wasted.
 - Comment in zone_reclaimable() that maybe should return true with
   non-zero NR_SLAB_RECLAIMABLE.
 - Comment about all_unreclaimable which may be set wrongly.
 - Comments in global_reclaimable_pages() and zone_reclaimable_pages()
   about maybe adding or including NR_SLAB_RECLAIMABLE.

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia

Reported-by: Paul Szabo p...@maths.usyd.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Paul Szabo p...@maths.usyd.edu.au

--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-04 21:52:47.0 +1100
@@ -65,3 +65,10 @@ int drop_caches_sysctl_handler(ctl_table
 	}
 	return 0;
 }
+
+/* Easy call: do echo 3  /proc/sys/vm/drop_caches */
+void easy_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/page-writeback.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/page-writeback.c	2013-01-06 21:54:59.0 +1100
@@ -39,7 +39,8 @@
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE		max(HZ/5, 1)
+/* Might as well be max(HZ/5,4) to ensure max_pause/40 always */
+#define MAX_PAUSE		max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
@@ -343,11 +344,26 @@ static unsigned long highmem_dirtyable_m
 unsigned long determine_dirtyable_memory(void)
 {
 	unsigned long x;
+	int y = 0;
+	extern int min_free_kbytes;
 
 	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
+	/*
+	 * Seems that highmem_is_dirtyable is only used here, in the
+	 * calculation of limits and threshholds of dirtiness, not in deciding
+	 * where to put dirty things. Is that so? Is that as should be?
+	 * What is the recommended setting of highmem_is_dirtyable?
+	 */
 	if (!vm_highmem_is_dirtyable)
 		x -= highmem_dirtyable_memory(x);
+	/* Subtract min_free_kbytes */
+	if (min_free_kbytes  0)
+		y = min_free_kbytes  (PAGE_SHIFT - 10);
+	if (x  y)
+		x -= y;
+	else
+		x = 0;
 
 	return x + 1;	/* Ensure that we never return 0 */
 }
@@ -541,6 +557,9 @@ static unsigned long bdi_position_ratio(
 
 	if (unlikely(dirty = limit))
 		return 0;
+	/* Never seen this happen, just sanity-check paranoia */
+	if (unlikely(freerun = limit))
+		return 16  RATELIMIT_CALC_SHIFT;
 
 	/*
 	 * global setpoint
@@ -559,7 +578,7 @@ static unsigned long bdi_position_ratio(
 	 * = fast response on large errors; small oscillation near setpoint
 	 */
 	setpoint = (freerun + limit) / 2;
-	x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+	x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
 		limit - setpoint + 1);
 	pos_ratio = x;
 	pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
@@ -995,6 +1014,13 @@ static unsigned long bdi_max_pause(struc
 	 * The pause time will be settled within range (max_pause/4, 

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-01 Thread Paul Szabo
tags 695182 - moreinfo
thanks

Dear Ben,

I suggest the following patch, which seems to solve the problem.
Two attachments: minimal.patch just to show the simplicity, and
complete.patch with comments and enhancements.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia
--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-01 09:23:57.0 +1100
@@ -58,10 +58,16 @@
 	if (ret)
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches  1)
 			iterate_supers(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches  2)
 			drop_slab();
 	}
 	return 0;
 }
+
+void PSz_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/vmscan.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/vmscan.c	2013-01-01 22:58:51.0 +1100
@@ -2719,20 +2719,25 @@
 KSWAPD_ZONE_BALANCE_GAP_RATIO);
 			if (!zone_watermark_ok_safe(zone, order,
 	high_wmark_pages(zone) + balance_gap,
 	end_zone, 0)) {
 shrink_zone(priority, zone, sc);
 
 reclaim_state-reclaimed_slab = 0;
 nr_slab = shrink_slab(shrink, sc.nr_scanned, lru_pages);
 sc.nr_reclaimed += reclaim_state-reclaimed_slab;
 total_scanned += sc.nr_scanned;
+if (i==1  nr_slab10  (reclaim_state-reclaimed_slab)10  zone_page_state(zone,NR_SLAB_RECLAIMABLE)10)
+{
+extern void PSz_drop_caches(void);
+  PSz_drop_caches();
+}
 
 if (nr_slab == 0  !zone_reclaimable(zone))
 	zone-all_unreclaimable = 1;
 			}
 
 			/*
 			 * If we've done a decent amount of scanning and
 			 * the reclaim ratio is low, start doing writepage
 			 * even in laptop mode
 			 */
--- fs/drop_caches.c.old	2012-10-17 13:50:15.0 +1100
+++ fs/drop_caches.c	2013-01-01 09:23:57.0 +1100
@@ -58,10 +58,16 @@
 	if (ret)
 		return ret;
 	if (write) {
 		if (sysctl_drop_caches  1)
 			iterate_supers(drop_pagecache_sb, NULL);
 		if (sysctl_drop_caches  2)
 			drop_slab();
 	}
 	return 0;
 }
+
+void PSz_drop_caches(void)
+{
+	iterate_supers(drop_pagecache_sb, NULL);
+	drop_slab();
+}
--- mm/page-writeback.c.old	2012-10-17 13:50:15.0 +1100
+++ mm/page-writeback.c	2013-01-01 23:01:52.0 +1100
@@ -32,21 +32,22 @@
 #include linux/sysctl.h
 #include linux/cpu.h
 #include linux/syscalls.h
 #include linux/buffer_head.h
 #include linux/pagevec.h
 #include trace/events/writeback.h
 
 /*
  * Sleep at most 200ms at a time in balance_dirty_pages().
  */
-#define MAX_PAUSE		max(HZ/5, 1)
+/* PSz: Might as well be max(HZ/5,4) to ensure max_pause/40 always */
+#define MAX_PAUSE		max(HZ/5, 4)
 
 /*
  * Estimate write bandwidth at 200ms intervals.
  */
 #define BANDWIDTH_INTERVAL	max(HZ/5, 1)
 
 #define RATELIMIT_CALC_SHIFT	10
 
 /*
  * After a CPU has dirtied this many pages, balance_dirty_pages_ratelimited
@@ -339,22 +340,40 @@
  *
  * Returns the numebr of pages that can currently be freed and used
  * by the kernel for direct mappings.
  */
 unsigned long determine_dirtyable_memory(void)
 {
 	unsigned long x;
 
 	x = global_page_state(NR_FREE_PAGES) + global_reclaimable_pages();
 
+/*
+ * PSz: Seems that highmem_is_dirtyable is only used here, in the
+ * calculation of limits and threshholds of dirtiness, not in deciding
+ * where to put dirty things. Is that so? Is that as should be?
+ * What is the recommended setting of highmem_is_dirtyable?
+ */
 	if (!vm_highmem_is_dirtyable)
 		x -= highmem_dirtyable_memory(x);
+/* PSz: Should not we subtract min_free_kbytes? */
+{
+extern int min_free_kbytes;
+int y = 0;
+/* printk(PSz: determine_dirtyable_memory was %ld pages, now subtract min_free_kbytes=%d\n,x,min_free_kbytes); */
+if (min_free_kbytes  0)
+  y = min_free_kbytes  (PAGE_SHIFT - 10);
+if (x  y)
+  x -= y;
+else
+  x = 0;
+}
 
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
 static unsigned long dirty_freerun_ceiling(unsigned long thresh,
 	   unsigned long bg_thresh)
 {
 	return (thresh + bg_thresh) / 2;
 }
 
@@ -534,39 +553,43 @@
 	unsigned long limit = hard_dirty_limit(thresh);
 	unsigned long x_intercept;
 	unsigned long setpoint;		/* dirty pages' target balance point */
 	unsigned long bdi_setpoint;
 	unsigned long span;
 	long long pos_ratio;		/* for scaling up/down the rate limit */
 	long x;
 
 	if (unlikely(dirty = limit))
 		return 0;
+	if (unlikely(freerun = limit))
+/* PSz: Never seen this happen, just sanity-check paranoia */
+		return (16  RATELIMIT_CALC_SHIFT);
 
 	/*
 	 * global setpoint
 	 *
 	 *   setpoint - dirty 3
 	 *f(dirty) := 1.0 + ()
 	 *   limit - setpoint
 	 *
 	 * it's a 3rd order polynomial that subjects to
 	 *
 	 * (1) f(freerun)  = 2.0 = rampup dirty_ratelimit reasonably fast
 	 * (2) f(setpoint) = 1.0 = the balance point
 	 * (3) f(limit)= 0   = the hard limit
 	 * (4) df/dx  = 0	 = negative feedback control
 	 * (5) the closer to setpoint, the smaller |df/dx| (and the 

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-01 Thread paul . szabo
Dear Ben,

I tried to send
  tags 695182 - moreinfo
to cont...@bugs.debian.org but it came back with:
  You have been specifically excluded from using the control interface.
I guess that has something to do with bug#299007. Would you please be
able to have those settings corrected?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2013-01-01 Thread Ben Hutchings
On Wed, 2013-01-02 at 08:33 +1100, Paul Szabo wrote:
 tags 695182 - moreinfo
 thanks
 
 Dear Ben,
 
 I suggest the following patch, which seems to solve the problem.
 Two attachments: minimal.patch just to show the simplicity, and
 complete.patch with comments and enhancements.

Please read Documentation/SubmittingPatches, use scripts/checkpatch.pl
and try to provide a patch that is suitable for upstream inclusion.
Also, your name belongs in the patch header, not in the code.

Ben.

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.


signature.asc
Description: This is a digitally signed message part


Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-27 Thread paul . szabo
Dear Ben,

In the OOM message in my initial bug report, I see
  Normal ... slab_reclaimable:261528kB ... all_unreclaimable? yes
Is that a contradiction? Should not that slab have been reclaimed?
Original line:
[  744.754369] Normal free:43788kB min:44112kB low:55140kB high:66168kB 
active_anon:0kB inactive_anon:0kB active_file:912kB inactive_file:0kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:887976kB 
mlocked:0kB dirty:0kB writeback:0kB mapped:4kB shmem:0kB 
slab_reclaimable:261528kB slab_unreclaimable:28812kB kernel_stack:3096kB 
pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:16060 
all_unreclaimable? yes

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-16 Thread paul . szabo
Dear Ben,

In response to your comments: x seems to be in the range [-1,1]. The
returned pos_ratio would be within [0,2] if not for the final *8.

---

[Funny: taking difference of unsigned ints and expect the result to be
negative in some sense. Seems the problem was not with large memory but
with negative numbers. Curious the bug was not noticed before.]

Need to cast and sign-extend before taking difference of unsigned
numbers, as the following demonstrates:

$ cat silly.c
#include stdio.h
main()
{
  unsigned long i,j;
  long long x;
  i=1; j=2;
  x = j-i; printf(j-i = %lld\n,x);
  x = i-j; printf(i-j = %lld\n,x);
  x = (long long)i-j; printf(OK  = %lld\n,x);
}
$ cc silly.c; a.out
j-i = 1
i-j = 4294967295
OK  = -1
$ 

and in fact things go bad, e.g. freerun=2172 limit=2896 dirty=2589
should get x=-155, whereas original formula gets x=11831710 and Ben's
formula gets x=-769071435.

Seems a correct patch would be:

--- old/mm/page-writeback.c 2012-10-17 13:50:15.0 +1100
+++ new/mm/page-writeback.c 2012-12-17 12:25:14.0 +1100
@@ -559,7 +559,7 @@
 * = fast response on large errors; small oscillation near setpoint
 */
setpoint = (freerun + limit) / 2;
-   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+   x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
limit - setpoint + 1);
pos_ratio = x;
pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;

However, with that patch in place I still got an OOM crash (log below).
More bugs remain...

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


---

Dec 17 12:43:59 zeno kernel: xterm invoked oom-killer: gfp_mask=0xd0, order=0, 
oom_adj=0, oom_score_adj=0
Dec 17 12:43:59 zeno kernel: Pid: 2704, comm: xterm Not tainted 
3.2.32-pk06.08-i386t07 #7
Dec 17 12:43:59 zeno kernel: Call Trace:
Dec 17 12:43:59 zeno kernel:  [c1607533] ? printk+0x18/0x1a
Dec 17 12:43:59 zeno kernel:  [c10776b8] dump_header.isra.10+0x68/0x180
Dec 17 12:43:59 zeno kernel:  [c1069807] ? delayacct_end+0x97/0xb0
Dec 17 12:43:59 zeno kernel:  [c11d664e] ? ___ratelimit+0x7e/0xf0
Dec 17 12:43:59 zeno kernel:  [c1077929] 
oom_kill_process.constprop.15+0x49/0x230
Dec 17 12:43:59 zeno kernel:  [c1039d34] ? has_capability_noaudit+0x24/0x30
Dec 17 12:43:59 zeno kernel:  [c1077880] ? oom_badness+0xb0/0x110
Dec 17 12:43:59 zeno kernel:  [c1077e70] out_of_memory+0x240/0x2c0
Dec 17 12:43:59 zeno kernel:  [c107a8a8] __alloc_pages_nodemask+0x558/0x570
Dec 17 12:43:59 zeno kernel:  [c1569b91] tcp_sendmsg+0x711/0xab0
Dec 17 12:43:59 zeno kernel:  [c11db4fc] ? copy_to_user+0x2c/0x40
Dec 17 12:43:59 zeno kernel:  [c1587f22] inet_sendmsg+0x42/0xa0
Dec 17 12:43:59 zeno kernel:  [c152fe2b] sock_aio_write+0xdb/0x100
Dec 17 12:43:59 zeno kernel:  [c15874f5] ? inet_recvmsg+0x55/0xa0
Dec 17 12:43:59 zeno kernel:  [c152fd50] ? sock_aio_read+0x130/0x130
Dec 17 12:43:59 zeno kernel:  [c10a3fd4] do_sync_readv_writev+0xa4/0xe0
Dec 17 12:43:59 zeno kernel:  [c11db640] ? _copy_from_user+0x30/0x50
Dec 17 12:43:59 zeno kernel:  [c10a40d3] ? rw_copy_check_uvector+0x43/0x130
Dec 17 12:43:59 zeno kernel:  [c10a4262] do_readv_writev+0xa2/0x1b0
Dec 17 12:43:59 zeno kernel:  [c152fd50] ? sock_aio_read+0x130/0x130
Dec 17 12:43:59 zeno kernel:  [c10a3ced] ? vfs_read+0x14d/0x170
Dec 17 12:43:59 zeno kernel:  [c10a43a2] vfs_writev+0x32/0x50
Dec 17 12:43:59 zeno kernel:  [c10a44e8] sys_writev+0x38/0xa0
Dec 17 12:43:59 zeno kernel:  [c160fd14] sysenter_do_call+0x12/0x26
Dec 17 12:43:59 zeno kernel: Mem-Info:
Dec 17 12:43:59 zeno kernel: DMA per-cpu:
Dec 17 12:43:59 zeno kernel: CPU0: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU1: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU2: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU3: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU4: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU5: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU6: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU7: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU8: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU9: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   10: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   11: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   12: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   13: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   14: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   15: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   16: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   17: hi:0, btch:   1 usd:   0
Dec 17 12:43:59 zeno kernel: CPU   18: hi:0, btch:   1 usd:   0
Dec 17 

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-16 Thread Ben Hutchings
On Mon, 2012-12-17 at 13:09 +1100, paul.sz...@sydney.edu.au wrote:
 Dear Ben,
 
 In response to your comments: x seems to be in the range [-1,1]. The
 returned pos_ratio would be within [0,2] if not for the final *8.
 
 ---
 
 [Funny: taking difference of unsigned ints and expect the result to be
 negative in some sense. Seems the problem was not with large memory but
 with negative numbers. Curious the bug was not noticed before.]

WTF.

 Need to cast and sign-extend before taking difference of unsigned
 numbers, as the following demonstrates:

Sorry, I was thinking they were long and not unsigned long.  Again, this
happens to work on a 64-bit machine.

[...]
 Seems a correct patch would be:
 
 --- old/mm/page-writeback.c   2012-10-17 13:50:15.0 +1100
 +++ new/mm/page-writeback.c   2012-12-17 12:25:14.0 +1100
 @@ -559,7 +559,7 @@
* = fast response on large errors; small oscillation near setpoint
*/
   setpoint = (freerun + limit) / 2;
 - x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
 + x = div_s64(((s64)setpoint - (s64)dirty)  RATELIMIT_CALC_SHIFT,
   limit - setpoint + 1);
   pos_ratio = x;
   pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
 
 However, with that patch in place I still got an OOM crash (log below).
 More bugs remain...
[...]

It's not a crash, though that's kind of an academic distinction.

Perhaps you could add some printk() statements to log the results of
these various calculations, so you can sanity-check them.  You would
probably want to  make them conditional on the intitial value of x being
negative (it's reused for something entirely different later so you
would need to assign this condition to a separate variable).

Ben.

-- 
Ben Hutchings
Life is like a sewer:
what you get out of it depends on what you put into it.


signature.asc
Description: This is a digitally signed message part


Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-16 Thread paul . szabo
Dear Ben,

 It's not a crash, though that's kind of an academic distinction.

What would you like me to call it instead? The machine seemed to
hang... do not know if rebooted spontaneously or in response to a
shutdown -r now that I had typed into an un-responsive xterm.

 Perhaps you could add some printk() statements to log the results of
 these various calculations, so you can sanity-check them.  You would
 probably want to make them conditional on the intitial value of x being
 negative (it's reused for something entirely different later so you
 would need to assign this condition to a separate variable).

I did, and am convinced that bdi_position_ratio() now does the right
thing: returns something within [0,2] mostly, and internally seems OK.
I think I had my corrected bdi_position_ratio() in use during this
latest OOM episode.

---

Thanks again for your prompt fix of bdi_position_ratio(). I will now
look for that elusive next bug.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread paul . szabo
Seems to me that the bug is in function
  bdi_position_ratio()
within file
  mm/page-writeback.c
The internal variable declaration is
  long long pos_ratio;
and calculation of it overflows. - Maybe, changing the declaration to
u64 would help. But also, pos_ratio is used without any bounds checks
as return value though that is declared as unsigned long.

I do not yet understand what bdi_position_ratio() is meant to do, so
cannot yet offer patches.

---

What I did:

I added many lines like
  BUG_ON(pos_ratio0);
into kernel sources. Running that kernel and creating my files with
  n=0; while [ $n -lt 99 ]; do dd bs=1M count=1024 if=/dev/zero of=x$n; (( n = 
$n + 1 )); done 
I got after about 15 files created:
/bin/bash: line 1:  2755 Segmentation fault  dd bs=1M count=1024 
if=/dev/zero of=x$n
Message from syslogd@zeno at Sat Dec 15 19:46:37 2012 ...
zeno kernel: [ cut here ]
zeno kernel: invalid opcode:  [#1] SMP 
...
and in the logs:

Dec 15 19:46:37 zeno kernel: [ cut here ]
Dec 15 19:46:37 zeno kernel: kernel BUG at mm/page-writeback.c:569!
Dec 15 19:46:37 zeno kernel: invalid opcode:  [#1] SMP 
Dec 15 19:46:37 zeno kernel: Modules linked in: nfsd exportfs quota_v2 
quota_tree fuse joydev usb_storage coretemp crc32c_intel aesni_intel sg cryptd 
sr_mod aes_i586 aes_generic 8250_pnp evdev i2c_i801 8250 serial_core processor 
thermal_sys button
Dec 15 19:46:37 zeno kernel: 
Dec 15 19:46:37 zeno kernel: Pid: 2755, comm: dd Not tainted 
3.2.32-pk06.08-i386t02 #1 Supermicro X9DR3-F/X9DR3-F
Dec 15 19:46:37 zeno kernel: EIP: 0060:[c107bf30] EFLAGS: 00010282 CPU: 0
Dec 15 19:46:37 zeno kernel: EIP is at bdi_position_ratio.isra.16+0x220/0x230
Dec 15 19:46:37 zeno kernel: EAX: fffaadbc EBX: 0524 ECX: fffaadbc EDX: 
760dae6b
Dec 15 19:46:37 zeno kernel: ESI: 0524 EDI: ea673c18 EBP: d6235d2c ESP: 
d6235d00
Dec 15 19:46:37 zeno kernel:  DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
Dec 15 19:46:37 zeno kernel: Process dd (pid: 2755, ti=d6234000 task=d607bb10 
task.ti=d6234000)
Dec 15 19:46:37 zeno kernel: Stack:
Dec 15 19:46:37 zeno kernel:  d6235d1c 000280cd 33b036ce 0098 047f 
760db26b fffaadbc 007a
Dec 15 19:46:37 zeno kernel:  0004 0546 d5de809c d6235db0 c107c963 
04f1 0523 0546
Dec 15 19:46:37 zeno kernel:   00140669 0007  d5de80bc 
00032afc  d5e83800
Dec 15 19:46:37 zeno kernel: Call Trace:
Dec 15 19:46:37 zeno kernel:  [c107c963] 
balance_dirty_pages_ratelimited_nr+0x253/0x520
Dec 15 19:46:37 zeno kernel:  [c10747cf] 
generic_file_buffered_write+0x16f/0x210
Dec 15 19:46:37 zeno kernel:  [c1075f7d] __generic_file_aio_write+0x24d/0x4b0
Dec 15 19:46:37 zeno kernel:  [c1076240] generic_file_aio_write+0x60/0xc0
Dec 15 19:46:37 zeno kernel:  [c10a2fa7] do_sync_write+0xb7/0xf0
Dec 15 19:46:37 zeno kernel:  [c1036455] ? irq_exit+0x55/0x60
Dec 15 19:46:37 zeno kernel:  [c10a2ef0] ? wait_on_retry_sync_kiocb+0x50/0x50
Dec 15 19:46:37 zeno kernel:  [c10a3aa7] vfs_write+0x87/0x170
Dec 15 19:46:37 zeno kernel:  [c10a2ef0] ? wait_on_retry_sync_kiocb+0x50/0x50
Dec 15 19:46:37 zeno kernel:  [c10a3da8] sys_write+0x38/0x70
Dec 15 19:46:37 zeno kernel:  [c160fd14] sysenter_do_call+0x12/0x26
Dec 15 19:46:37 zeno kernel: Code: 55 ff ff ff 0f 0b 90 8d 74 26 00 0f a4 cb 03 
c1 e1 03 e9 74 ff ff ff 8d 74 26 00 89 d0 31 d2 f7 75 10 89 c6 e9 59 ff ff ff 
0f 0b 0f 0b 0f 0b 0f 0b 31 c0 e9 5d ff ff ff 8d 76 00 55 89 e5 83 ec 
Dec 15 19:46:37 zeno kernel: EIP: [c107bf30] 
bdi_position_ratio.isra.16+0x220/0x230 SS:ESP 0068:d6235d00
Dec 15 19:46:37 zeno kernel: ---[ end trace c9c79e2ba8a36130 ]---

Relevant part of file  mm/page-writeback.c :

   525  static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
   526  unsigned long thresh,
   527  unsigned long bg_thresh,
   528  unsigned long dirty,
   529  unsigned long bdi_thresh,
   530  unsigned long bdi_dirty)
   531  {
   532  unsigned long write_bw = bdi-avg_write_bandwidth;
   533  unsigned long freerun = dirty_freerun_ceiling(thresh, 
bg_thresh);
   534  unsigned long limit = hard_dirty_limit(thresh);
   535  unsigned long x_intercept;
   536  unsigned long setpoint; /* dirty pages' target balance 
point */
   537  unsigned long bdi_setpoint;
   538  unsigned long span;
   539  long long pos_ratio;/* for scaling up/down the rate 
limit */
   540  long x;
   541  
   542  if (unlikely(dirty = limit))
   543  return 0;
   544  
   545  /*
   546   * global setpoint
   547   *
   548   *   setpoint - dirty 3
   549   *f(dirty) := 1.0 + ()
   550   

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread Ben Hutchings
Control: tag -1 patch

On Sun, 2012-12-16 at 07:21 +1100, paul.sz...@sydney.edu.au wrote:
 Seems to me that the bug is in function
   bdi_position_ratio()
 within file
   mm/page-writeback.c
 The internal variable declaration is
   long long pos_ratio;
 and calculation of it overflows.

You seem to be on the right track, but I think the initial overflow
occurs when calculating x:

[...]
562x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
563limit - setpoint + 1);
[...]

setpoint and dirty are numbers of pages and are declared as long, so on
a system with enough memory they can presumably differ by 2^21 or more
(2^21 pages = 8 GB).  Shifting left by RATELIMIT_CALC_SHIFT = 10 can
then change the sign bit.

Does the attached patch fix this?

Ben.

-- 
Ben Hutchings
Theory and practice are closer in theory than in practice.
- John Levine, moderator of comp.compilers
From 8f4ae99695a37c7294ce39e008878e8b304cb4b0 Mon Sep 17 00:00:00 2001
From: Ben Hutchings b...@decadent.org.uk
Date: Sat, 15 Dec 2012 23:03:40 +
Subject: [PATCH] writeback: Fix overflow in bdi_position_ratio() on PAE
 systems

Most variables in bdi_position_ratio() are declared long, which is
enough for a page count.  However, when converting (setpoint - dirty)
to a fixed-point number we left-shift by 10, and on a 32-bit system
with PAE it is possible to have enough dirty pages that the shift
overflows into the sign bit.  We need to cast to s64 before the
left-shift.

Reported-by: Paul Szabo paul.sz...@sydney.edu.au
Reference: http://bugs.debian.org/695182
Signed-off-by: Ben Hutchings b...@decadent.org.uk
---
 mm/page-writeback.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 50f0824..8b5600e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct backing_dev_info *bdi,
 	 * = fast response on large errors; small oscillation near setpoint
 	 */
 	setpoint = (freerun + limit) / 2;
-	x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
+	x = div_s64((s64)(setpoint - dirty)  RATELIMIT_CALC_SHIFT,
 		limit - setpoint + 1);
 	pos_ratio = x;
 	pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;


signature.asc
Description: This is a digitally signed message part


Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread paul . szabo
Dear Ben,

 ... I think the initial overflow occurs when calculating x
 ...
 setpoint and dirty are numbers of pages and are declared as long, so on
 a system with enough memory they can presumably differ by 2^21 or more
 (2^21 pages = 8 GB).  Shifting left by RATELIMIT_CALC_SHIFT = 10 can
 then change the sign bit.
 
 Does the attached patch fix this?
 
 ...
 
 Most variables in bdi_position_ratio() are declared long, which is
 enough for a page count.  However, when converting (setpoint - dirty)
 to a fixed-point number we left-shift by 10, and on a 32-bit system
 with PAE it is possible to have enough dirty pages that the shift
 overflows into the sign bit.  We need to cast to s64 before the
 left-shift.
 
 Reported-by: Paul Szabo paul.sz...@sydney.edu.au
 Reference: http://bugs.debian.org/695182
 Signed-off-by: Ben Hutchings b...@decadent.org.uk
 ---
  mm/page-writeback.c |2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)
 
 diff --git a/mm/page-writeback.c b/mm/page-writeback.c
 index 50f0824..8b5600e 100644
 --- a/mm/page-writeback.c
 +++ b/mm/page-writeback.c
 @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct 
 backing_dev_i
 nfo *bdi,
* = fast response on large errors; small oscillation near setpoint
*/
   setpoint = (freerun + limit) / 2;
 - x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
 + x = div_s64((s64)(setpoint - dirty)  RATELIMIT_CALC_SHIFT,
   limit - setpoint + 1);
   pos_ratio = x;
   pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;

Thanks for the quick patch. I am about to test it (in a day or so).
Initial (blind, off-the-cuff, uneducated) comments:
 - I had BUG_ON(x0) in my code, so unlikely x changed sign.
 - Why not use float instead of infinite-precision integer arithmetic?
 - Do we need a smooth function, or would an easy-to-calculate
   step function suffice?
 - Is there a check that the returned s64 pos_ratio fits into u32?

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-15 Thread Ben Hutchings
On Sun, 2012-12-16 at 11:14 +1100, paul.sz...@sydney.edu.au wrote:
 Dear Ben,
 
  ... I think the initial overflow occurs when calculating x
  ...
  setpoint and dirty are numbers of pages and are declared as long, so on
  a system with enough memory they can presumably differ by 2^21 or more
  (2^21 pages = 8 GB).  Shifting left by RATELIMIT_CALC_SHIFT = 10 can
  then change the sign bit.
  
  Does the attached patch fix this?
  
  ...
  
  Most variables in bdi_position_ratio() are declared long, which is
  enough for a page count.  However, when converting (setpoint - dirty)
  to a fixed-point number we left-shift by 10, and on a 32-bit system
  with PAE it is possible to have enough dirty pages that the shift
  overflows into the sign bit.  We need to cast to s64 before the
  left-shift.
  
  Reported-by: Paul Szabo paul.sz...@sydney.edu.au
  Reference: http://bugs.debian.org/695182
  Signed-off-by: Ben Hutchings b...@decadent.org.uk
  ---
   mm/page-writeback.c |2 +-
   1 file changed, 1 insertion(+), 1 deletion(-)
  
  diff --git a/mm/page-writeback.c b/mm/page-writeback.c
  index 50f0824..8b5600e 100644
  --- a/mm/page-writeback.c
  +++ b/mm/page-writeback.c
  @@ -559,7 +559,7 @@ static unsigned long bdi_position_ratio(struct 
  backing_dev_i
  nfo *bdi,
   * = fast response on large errors; small oscillation near setpoint
   */
  setpoint = (freerun + limit) / 2;
  -   x = div_s64((setpoint - dirty)  RATELIMIT_CALC_SHIFT,
  +   x = div_s64((s64)(setpoint - dirty)  RATELIMIT_CALC_SHIFT,
  limit - setpoint + 1);
  pos_ratio = x;
  pos_ratio = pos_ratio * x  RATELIMIT_CALC_SHIFT;
 
 Thanks for the quick patch. I am about to test it (in a day or so).
 Initial (blind, off-the-cuff, uneducated) comments:
  - I had BUG_ON(x0) in my code, so unlikely x changed sign.

If I understand correctly, x is a ratio in the range [-1, 1].  I would
expect it to become negative before wrapping around to become positive,
but it's also conceivable that it went all the way round between two
calls to this function.  (I don't know how often it's likely to be
called.)

  - Why not use float instead of infinite-precision integer arithmetic?

We cannot assume the presence of an FPU.  (The kernel handles FPU
emulation for userland, but not for itself.)

  - Do we need a smooth function, or would an easy-to-calculate
step function suffice?

I don't know.

  - Is there a check that the returned s64 pos_ratio fits into u32?

It seems to be limited to the range [0, 2] (with 10 fractional bits) but
I didn't check all the other calculations.

Ben.

-- 
Ben Hutchings
Always try to do things in chronological order;
it's less confusing that way.


signature.asc
Description: This is a digitally signed message part


Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-09 Thread paul . szabo
Dear Ben,

 Although PAE supports up to 64 GB RAM ... The use of such a large
 amount of high memory is problematic ...
 Or you can test ... by restricting what the kernel uses with the
 'mem' parameter, e.g. mem=16G.

Trying various mem=XX values, no OOM was observed with mem=32G or less,
but a crash is obtained with any memory over 32GB e.g. with mem=34G.
This suggests a signed/unsigned bug more than an issue with highmem
size; you said PAE supports 64GB, not just 32GB.

 A 64-bit kernel doesn't have a split between normal and high memory.

... and it may have larger integers, less affected by signedness bugs.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


*** mem=34G - Tail end of /var/log/kern.log
[0.00] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-3.2.0-4-686-pae 
mem=34G root=UUID=469c2730-1786-46f7-9d80-5d651ee581d7 ro quiet
...
[  388.560098] dd invoked oom-killer: gfp_mask=0x800d0, order=0, oom_adj=0, 
oom_score_adj=0
[  388.560106] dd cpuset=/ mems_allowed=0
[  388.560113] Pid: 4244, comm: dd Not tainted 3.2.0-4-686-pae #1 Debian 
3.2.32-1
[  388.560117] Call Trace:
[  388.560135]  [c1097c1c] ? dump_header.isra.6+0x5c/0x167
[  388.560144]  [c1120ff8] ? security_real_capable_noaudit+0x2c/0x35
[  388.560149]  [c1097e93] ? oom_kill_process+0x30/0x201
[  388.560155]  [c109810f] ? select_bad_process.constprop.12+0xab/0xff
[  388.560160]  [c10983e0] ? out_of_memory+0xf8/0x135
[  388.560167]  [c109aecd] ? __alloc_pages_nodemask+0x509/0x63e
[  388.560176]  [c10c0ca3] ? cache_alloc+0x253/0x407
[  388.560183]  [c10c1425] ? kmem_cache_alloc+0x29/0x89
[  388.560193]  [c11605ee] ? radix_tree_preload+0x24/0x61
[  388.560203]  [c10961cd] ? add_to_page_cache_locked+0x3e/0xb3
[  388.560210]  [c1096253] ? add_to_page_cache_lru+0x11/0x2f
[  388.560217]  [c10962cb] ? grab_cache_page_write_begin+0x5a/0x94
[  388.560244]  [f88bb348] ? ext3_write_begin+0xa0/0x1d2 [ext3]
[  388.560251]  [c109d562] ? put_page+0x16/0x24
[  388.560258]  [c1095e61] ? generic_file_buffered_write+0xd8/0x1dd
[  388.560265]  [c1096afb] ? __generic_file_aio_write+0x25e/0x282
[  388.560273]  [c100f2fb] ? read_tsc+0xa/0x28
[  388.560282]  [c1053548] ? timekeeping_get_ns+0x11/0x55
[  388.560287]  [c1096b7c] ? generic_file_aio_write+0x5d/0xb3
[  388.560298]  [c10cbc31] ? wait_on_retry_sync_kiocb+0x3c/0x3c
[  388.560304]  [c10cbcd9] ? do_sync_write+0xa8/0xdc
[  388.560311]  [c10cc1f3] ? rw_verify_area+0xc6/0xe7
[  388.560317]  [c10cc493] ? vfs_write+0x83/0xd4
[  388.560323]  [c10cc653] ? sys_write+0x3d/0x61
[  388.560331]  [c12c5d1f] ? sysenter_do_call+0x12/0x28
[  388.560334] Mem-Info:
[  388.560337] DMA per-cpu:
[  388.560341] CPU0: hi:0, btch:   1 usd:   0
[  388.560345] CPU1: hi:0, btch:   1 usd:   0
[  388.560348] CPU2: hi:0, btch:   1 usd:   0
[  388.560351] CPU3: hi:0, btch:   1 usd:   0
[  388.560355] CPU4: hi:0, btch:   1 usd:   0
[  388.560358] CPU5: hi:0, btch:   1 usd:   0
[  388.560361] CPU6: hi:0, btch:   1 usd:   0
[  388.560365] CPU7: hi:0, btch:   1 usd:   0
[  388.560368] CPU8: hi:0, btch:   1 usd:   0
[  388.560371] CPU9: hi:0, btch:   1 usd:   0
[  388.560374] CPU   10: hi:0, btch:   1 usd:   0
[  388.560378] CPU   11: hi:0, btch:   1 usd:   0
[  388.560381] CPU   12: hi:0, btch:   1 usd:   0
[  388.560384] CPU   13: hi:0, btch:   1 usd:   0
[  388.560387] CPU   14: hi:0, btch:   1 usd:   0
[  388.560390] CPU   15: hi:0, btch:   1 usd:   0
[  388.560394] CPU   16: hi:0, btch:   1 usd:   0
[  388.560397] CPU   17: hi:0, btch:   1 usd:   0
[  388.560400] CPU   18: hi:0, btch:   1 usd:   0
[  388.560404] CPU   19: hi:0, btch:   1 usd:   0
[  388.560407] CPU   20: hi:0, btch:   1 usd:   0
[  388.560410] CPU   21: hi:0, btch:   1 usd:   0
[  388.560413] CPU   22: hi:0, btch:   1 usd:   0
[  388.560417] CPU   23: hi:0, btch:   1 usd:   0
[  388.560420] CPU   24: hi:0, btch:   1 usd:   0
[  388.560423] CPU   25: hi:0, btch:   1 usd:   0
[  388.560427] CPU   26: hi:0, btch:   1 usd:   0
[  388.560430] CPU   27: hi:0, btch:   1 usd:   0
[  388.560433] CPU   28: hi:0, btch:   1 usd:   0
[  388.560436] CPU   29: hi:0, btch:   1 usd:   0
[  388.560440] CPU   30: hi:0, btch:   1 usd:   0
[  388.560443] CPU   31: hi:0, btch:   1 usd:   0
[  388.560446] Normal per-cpu:
[  388.560449] CPU0: hi:  186, btch:  31 usd: 174
[  388.560452] CPU1: hi:  186, btch:  31 usd: 164
[  388.560456] CPU2: hi:  186, btch:  31 usd:  53
[  388.560459] CPU3: hi:  186, btch:  31 usd:  51
[  388.560462] CPU4: hi:  186, btch:  31 usd: 155
[  388.560465] CPU5: hi:  186, btch:  31 usd:  72
[  388.560469] CPU6: hi:  186, btch:  31 usd: 143
[  388.560472] CPU7: hi:  186, btch:  31 usd:  95
[  388.560475] CPU8: hi:  186, btch:  31 usd: 178
[  

Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread Ben Hutchings
On Wed, 2012-12-05 at 12:19 +1100, Paul Szabo wrote:
 Subject: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash
 Package: src:linux
 Version: 3.2.32-1
 Severity: normal
 
 
 Writing a few large files, causes an OOM crash.
 This happens on a fresh install from the
   debian-wheezy-DI-b4-i386-netinst.iso
 disk. I observed the problem on several dual-CPU Xeon servers:
   Name  CPU type  RAM Comment
   bivona2*E5-2690128GBnormally runs amd64 kernel
   como  2*E5-2690 64GB
   briona2*X5690   48GB
   gemona2*X5680   48GB
 I have not noticed the problem with desktop machines with single
 CPU chips and 4GB memory.
[...]

Although PAE supports up to 64 GB RAM, everything the kernel accesses
must be mapped into 1 GB of virtual address space (about 880 MB of
persistently mapped 'normal memory', plus temporary mappings of the
remaining 'high memory').  The use of such a large amount of high memory
is problematic, though I don't know whether it entirely explains this
behaviour.  (The memory stats don't seem to account for much of the
normal memory, as there is ~40 MB free but the various classes of
allocations seem to add up to only ~300 MB.)

These machines should all be installed with the amd64 kernel.  Is there
any reason you would prefer not to do that?  Perhaps the kernel flavour
selection in the installer should be changed to favour that based on the
RAM size, though I'm not sure what the critical value should be.

Ben.

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.


signature.asc
Description: This is a digitally signed message part


Bug#695182: Re: Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread paul . szabo
An observation that may help in solving this issue. Using
  while :; do free -lm; sleep 5; done
while writing the files, I see the buffers and cached values
increasing; then buffers start decreasing, eventually down to zero;
then soon after, OOM starts. The free or low or high values do not
seem to show anything unusual.

Cheers, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


Extract from output:
 total   used   free sharedbuffers cached
Mem: 62941   1586  61354  0 61   1359
Mem: 62941   2143  60797  0 61   1907
Mem: 62941   2652  60288  0 62   2407
Mem: 62941   3205  59735  0 63   2951
Mem: 62941   3743  59197  0 63   3483
Mem: 62941   4275  58665  0 64   4007
Mem: 62941   4791  58149  0 64   4511
Mem: 62941   5338  57602  0 65   5049
Mem: 62941   5835  57105  0 65   5538
Mem: 62941   6332  56608  0 66   6027
Mem: 62941   6837  56103  0 66   6524
Mem: 62941   7332  55608  0 67   7007
Mem: 62941   7815  55125  0 67   7482
Mem: 62941   8310  54630  0 68   7970
Mem: 62941   8820  54120  0 68   8471
Mem: 62941   9280  53660  0 69   8922
Mem: 62941   9779  53161  0 69   9413
Mem: 62941  10231  52709  0 59   9868
Mem: 62941  10736  52204  0 59  10366
Mem: 62941  11105  51835  0 48  10741
Mem: 62941  11585  51355  0 41  11223
Mem: 62941  12074  50866  0 36  11709
Mem: 62941  12544  50396  0 23  12183
Mem: 62941  13021  49919  0 24  12653
Mem: 62941  13515  49425  0  8  13152
Mem: 62941  13978  48962  0  9  13609
Mem: 62941  14459  48481  0  1  14091
Mem: 62941  14941  47999  0  0  14566
Mem: 62941  15409  47531  0  0  15028
Mem: 62941  15858  47082  0  0  15487
Mem: 62941  16251  46689  0  0  15873
Mem: 62941  16392  46548  0  0  16017
Mem: 62941  16593  46347  0  0  16215
Mem: 62941  16730  46210  0  0  16350
Mem: 62941  16808  46132  0  0  16429
Mem: 62941  16839  46101  0  0  16460
Mem: 62941  16855  46085  0  0  16476
Mem: 62941  16843  46097  0  0  16487
Mem: 62941  17121  45819  0  0  16779
Mem: 62941  17342  45598  0  0  16998
Mem: 62941  17491  45449  0  0  17146


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread paul . szabo
Dear Ben,

 Although PAE supports up to 64 GB RAM, everything the kernel accesses
 must be mapped into 1 GB of virtual address space (about 880 MB of
 persistently mapped 'normal memory', plus temporary mappings of the
 remaining 'high memory').  The use of such a large amount of high memory
 is problematic, though I don't know whether it entirely explains this
 behaviour.  (The memory stats don't seem to account for much of the
 normal memory, as there is ~40 MB free but the various classes of
 allocations seem to add up to only ~300 MB.)

 These machines should all be installed with the amd64 kernel.  Is there
 any reason you would prefer not to do that?  Perhaps the kernel flavour
 selection in the installer should be changed to favour that based on the
 RAM size, though I'm not sure what the critical value should be.

Are you suggesting that the kernel lies, that 32-bit cannot handle 64GB?
Would it help to test the issue on a 16GB machine (I have one with
2*X5460 CPUs and one with single i5-3570), or with 24GB (have several
with 2*E5335 to 2*X5460)

I have seen recommendations to use 64-bit amd64. I am somewhat reluctant
on jumping ship: I want continuity (when I upgrade by installing a
little more memory), want similarity between my various machines; and
have observed 32-bit being faster in some situations.

But really: this is a bug in the 32-bit build. Do I know that the same
or similar or worse bugs are not present also in the 64-bit build off
the same sources?

Thanks, Paul

Paul Szabo   p...@maths.usyd.edu.au   http://www.maths.usyd.edu.au/u/psz/
School of Mathematics and Statistics   University of SydneyAustralia


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#695182: linux-image-3.2.0-4-686-pae: Write couple of 1GB files for OOM crash

2012-12-05 Thread Ben Hutchings
On Wed, 2012-12-05 at 22:36 +1100, paul.sz...@sydney.edu.au wrote:
 Dear Ben,
 
  Although PAE supports up to 64 GB RAM, everything the kernel accesses
  must be mapped into 1 GB of virtual address space (about 880 MB of
  persistently mapped 'normal memory', plus temporary mappings of the
  remaining 'high memory').  The use of such a large amount of high memory
  is problematic, though I don't know whether it entirely explains this
  behaviour.  (The memory stats don't seem to account for much of the
  normal memory, as there is ~40 MB free but the various classes of
  allocations seem to add up to only ~300 MB.)
 
  These machines should all be installed with the amd64 kernel.  Is there
  any reason you would prefer not to do that?  Perhaps the kernel flavour
  selection in the installer should be changed to favour that based on the
  RAM size, though I'm not sure what the critical value should be.
 
 Are you suggesting that the kernel lies, that 32-bit cannot handle 64GB?
 Would it help to test the issue on a 16GB machine (I have one with
 2*X5460 CPUs and one with single i5-3570), or with 24GB (have several
 with 2*E5335 to 2*X5460)

Or you can test on the kernel larger machines by restricting what the
kernel uses with the 'mem' parameter, e.g. mem=16G.

 I have seen recommendations to use 64-bit amd64. I am somewhat reluctant
 on jumping ship: I want continuity (when I upgrade by installing a
 little more memory), want similarity between my various machines; and
 have observed 32-bit being faster in some situations.
 
 But really: this is a bug in the 32-bit build. Do I know that the same
 or similar or worse bugs are not present also in the 64-bit build off
 the same sources?

A 64-bit kernel doesn't have a split between normal and high memory.

Ben.

-- 
Ben Hutchings
Computers are not intelligent.  They only think they are.


signature.asc
Description: This is a digitally signed message part