[Devel] Re: [RFC][PATCH 2/4] checkpoint/restart: x86 support

2008-08-10 Thread Jeremy Fitzhardinge
Dave Hansen wrote:
 On Fri, 2008-08-08 at 19:04 -0400, Oren Laadan wrote:
   
 struct pt_regs is part of the kernel ABI, it will not change.
   
 I'm in favor about keeping the format identical between the variations of
 each architecture. Note, however, that struct pt_regs won't do because it
 may change with these variations.
 

 Part of the kernel ABI makes it sound to me like it won't change.
 Who's right here? :)

Struct pt_regs is not ABI, and can (and has) changed on x86.   It's not 
suitable for a checkpoint structure because it only contains the 
registers that the kernel trashes, not all usermode registers (on i386, 
it leaves out %gs, for example).  asm-x86/ptrace-abi.h does define stuff 
that's fixed in stone; it expresses it in terms of a register array, 
with constants defining what element is which register.

J
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel


[Devel] Re: memrlimit controller merge to mainline

2008-08-10 Thread Balbir Singh
Hugh Dickins wrote:
 but I do have an initial hypothesis

 CPU0 CPU1
  try_to_unuse
 task 1 stars exiting look at mm = task1-mm
 ..   increment mm_users
 task 1 exits
 mm-owner needs to be updated, but
 no new owner is found
 (mm_users  1, but no other task
 has task-mm = task1-mm)
 mm_update_next_owner() leaves

 grace period
  user count drops, call mmput(mm)
 task 1 freed
  dereferencing mm-owner fails
 
 Yes, that looks right to me: seems obvious now.  I don't think your
 careful alternation of CPU0/1 events at the end matters: the swapoff
 CPU simply dereferences mm-owner after that task has gone.
 
 (That's a shame, I'd always hoped that mm-owner-comm was going to
 be good for use in mm messages, even when tearing down the mm.)
 

Hi, Hugh,

I do have fixes for the problem above, but I've run into something strange. I
see that when I create a new cgroup and set 500M as it's limit and run kernbench
under it, I see a strange problem

1. memrlimit determines that limit is exceeded and fails the fork of the new 
process
2. The process that failed to fork, encounters a page fault and faults in 
find_vma

I tried chasing the problem, but I am lost wondering how a page fault
(do_page_fault) can occur in a process that has not yet been created and is
going to fail with -ENOMEM. The interesting thing is that the OOPS occurs in
find_vma

My trace so far


limit exceeded
Pid: 3695, comm: sh Not tainted 2.6.27-rc1-mm1 #12

Call Trace:
 [802b0473] memrlimit_cgroup_charge_as+0x3a/0x3c
 [8023a82f] dup_mm+0xea/0x410
 [8023b648] copy_process+0xabe/0x12ef
 [8023c0df] do_fork+0x114/0x2d2
 [8025b42c] ? trace_hardirqs_on_caller+0xf9/0x124
 [8025b464] ? trace_hardirqs_on+0xd/0xf
 [805bda1f] ? _spin_unlock_irq+0x2b/0x30
 [805bd24e] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [8020bf4b] ? system_call_fastpath+0x16/0x1b
 [8020a44a] sys_clone+0x23/0x25
 [8020c2c7] ptregscall_common+0x67/0xb0

putting mm 88003d931400 3695 sh
copy_mm, retval -12
copy_process returning -12
copy_process returned fff4 -12
fork failed -12
general protection fault:  [1] copy_process returned 880037a11600 -13194
0462029312
SMP
last sysfs file: /sys/block/sda/size
CPU 2
Modules linked in: coretemp hwmon kvm_intel kvm rtc_cmos rtc_core rtc_lib mptsas
 mptscsih mptbase scsi_transport_sas uhci_hcd ohci_hcd ehci_hcd
Pid: 3695, comm: sh Not tainted 2.6.27-rc1-mm1 #12
RIP: 0010:[802954f8]  [802954f8] find_vma+0x2f/0x62
RSP: :88003544bee8  EFLAGS: 00010202
RAX: 6b6b6b6b6b6b6b6b RBX:  RCX: 8800399e34d8
RDX: 8800399e34d8 RSI: 003a2729ad22 RDI: 88003e5c8500
RBP: 88003544bee8 R08:  R09: 
R10: 88003e5c8568 R11: 0246 R12: 003a2729ad22
R13: 0014 R14: 88003544bf58 R15: 88003e8bac00
FS:  2b3b978f3f50() GS:8800bfd954b0() knlGS:
CS:  0010 DS:  ES:  CR0: 8005003b
CR2: 003a2729ad22 CR3: 3549f000 CR4: 26e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process sh (pid: 3695, threadinfo 88003544a000, task 88003e8bac00)
Stack:  88003544bf48 805bfec0  008cae50
 88003e5c8560 88003e5c8500 00030001 
 7fff131e72c0  008cae50 
Call Trace:
 [805bfec0] do_page_fault+0x36f/0x7ad
 [805bdd4d] error_exit+0x0/0xa9


Code: 85 ff 48 89 e5 74 55 eb 05 48 89 ca eb 47 48 8b 47 10 48 85 c0 74 0c 48 39
 70 10 76 06 48 39 70 08 76 39 48 8b 47 08 31 d2 eb 1d 48 39 70 e0 48 8d 48 d0
 76 0f 48 39 70 d8 76 ce 48 8b 40 10 48
RIP  [802954f8] find_vma+0x2f/0x62
 RSP 88003544bee8

---[ end trace 89156336afdfaec3 ]---

I hope that I'll be able to think more clearly on Monday, but it's hard to say 
:)

-- 
Warm Regards,
Balbir Singh
Linux Technology Center
IBM, ISTL
___
Containers mailing list
[EMAIL PROTECTED]
https://lists.linux-foundation.org/mailman/listinfo/containers

___
Devel mailing list
Devel@openvz.org
https://openvz.org/mailman/listinfo/devel