Corey,
Are you getting: perfmon/perfmon_file.c: In function 'pfm_buf_map_close': perfmon/perfmon_file.c:137: warning: passing argument 1 of 'atomic_read' from incompatible pointer type When compiling your kernel? This needs to be investigated some more because this is on a test which sets a flag related to the lock you are reporting on. On Thu, Jan 8, 2009 at 12:27 AM, stephane eranian <eran...@googlemail.com> wrote: > Corey, > > Let me take a look at this. This is some nasty code in there. > But it is also old and we may be able simplify it. I don't think > it has to be that complicated. Problem is that the issue does > not show up on x86. > > > On Thu, Jan 8, 2009 at 12:10 AM, Corey J Ashford <cjash...@us.ibm.com> wrote: >> Ok, I have some more data about this lock-up problem. I turned on perfmon >> debugging and saw that the last thing that perfmon did was to call >> down_write() from pfm_smp_buf_space_release.212. That code attempts to >> acquire a lock, so I decided to turn on lock debugging in the kernel, and >> got this output when I ran the test case: >> >> perfmon: pfm_smpl_buf_space_release.212: CPU2 [3318]: doing down_write >> >> ============================================= >> [ INFO: possible recursive locking detected ] >> 2.6.28-rc6-pfm2-09445-g4fca1a2-dirty #12 >> --------------------------------------------- >> task_smpl/3318 is trying to acquire lock: >> (&mm->mmap_sem){----}, at: [<c0000000003037d8>] >> .pfm_smpl_buf_space_release+0xa0/0x180 >> >> but task is already holding lock: >> (&mm->mmap_sem){----}, at: [<c000000000102b34>] .sys_munmap+0x54/0xa0 >> >> other info that might help us debug this: >> 1 lock held by task_smpl/3318: >> #0: (&mm->mmap_sem){----}, at: [<c000000000102b34>] >> .sys_munmap+0x54/0xa0 >> >> stack backtrace: >> Call Trace: >> [c00000000ca77380] [c000000000012254] .show_stack+0x94/0x198 (unreliable) >> [c00000000ca77430] [c000000000012380] .dump_stack+0x28/0x3c >> [c00000000ca774b0] [c0000000000a14f0] .validate_chain+0x690/0xdc0 >> [c00000000ca77570] [c0000000000a2404] .__lock_acquire+0x7e4/0x8bc >> [c00000000ca77670] [c0000000000a2588] .lock_acquire+0xac/0xf8 >> [c00000000ca77740] [c0000000005cb630] .down_write+0x64/0xbc >> [c00000000ca777d0] [c0000000003037d8] >> .pfm_smpl_buf_space_release+0xa0/0x180 >> [c00000000ca77870] [c00000000030d464] .pfm_smpl_buf_free+0x8c/0x104 >> [c00000000ca77900] [c00000000030f2a0] .pfm_free_context+0x40/0xc8 >> [c00000000ca77990] [c000000000307d5c] .__pfm_close+0x2f8/0x33c >> [c00000000ca77a60] [c000000000308af8] .pfm_close+0x98/0xb4 >> [c00000000ca77af0] [c00000000012b56c] .__fput+0x16c/0x258 >> [c00000000ca77ba0] [c00000000012baa4] .fput+0x50/0x68 >> [c00000000ca77c30] [c0000000001003c4] .remove_vma+0x90/0xf8 >> [c00000000ca77cc0] [c0000000001015d8] .do_munmap+0x30c/0x358 >> [c00000000ca77d90] [c000000000102b48] .sys_munmap+0x68/0xa0 >> [c00000000ca77e30] [c0000000000084d4] syscall_exit+0x0/0x40 >> >> Does this ring any bells with you? >> >> Thanks, >> >> - Corey >> >> "stephane eranian" <eran...@googlemail.com> wrote on 01/07/2009 12:03:24 >> PM: >> >>> Corey, >>> >>> I was expecting success with the program below if /tmp/foo exists. >>> >>> The perfmon code that handles all of this is generic, so there must be a >>> race condition somewhere which is only exposed on Power. >>> >>> On Wed, Jan 7, 2009 at 8:02 PM, Corey J Ashford <cjash...@us.ibm.com> >> wrote: >>> > Thanks for the reply, Stephane. I tried the test case you suggested: >>> > >>> > main() { >>> > int fd; >>> > void *addr; >>> > >>> > fd = open ("/tmp/foo", O_RDONLY); >>> > printf("fd = %d\n", fd); >>> > addr = mmap(NULL, 10, PROT_READ, MAP_PRIVATE, fd, 0); >>> > printf("addr = %p\n", addr); >>> > if (close(fd)) { >>> > printf("close failed\n"); >>> > } >>> > if (munmap(addr, 10)) { >>> > printf("munmap failed\n"); >>> > } >>> > } >>> > >>> > and it worked fine. So apparently there is a problem related to >>> > munmap'ing a perfmon fd on Power. This will need more investigation, >>> > obviously. >>> > >>> > - Corey >>> > >>> > "stephane eranian" <eran...@googlemail.com> wrote on 01/06/2009 >> 10:28:41 >>> > PM: >>> > >>> >> Corey, >>> >> >>> >> On Wed, Jan 7, 2009 at 3:24 AM, Corey J Ashford <cjash...@us.ibm.com> >>> > wrote: >>> >> > >>> >> > Hello, >>> >> > >>> >> > I'd appreciate it if someone on this mailing list could try out the >>> > libpfm >>> >> > example: task_smpl and see if it runs correctly for you on any >> other >>> >> > architecture besides Power. >>> >> > >>> >> > When I run it on my Power5-based machine here, I get a system hang >>> > that >>> >> > occurs when the munmap call is made. Looking at the code in the >>> > example, I >>> >> > reversed the order of the close and munmap... so that the memory is >>> > unmapped >>> >> > before the fd is closed, and this allows the test to run to >> completion >>> >> > without error and causes no hang. I also tried commenting out the >>> > call to >>> >> > pfm_start, to cut perfmon out of the loop for the most part, and >> the >>> >> > behavior still reproduces - the system hangs unless I reverse those >>> > two >>> >> > calls. >>> >> > >>> >> > When the system hangs like this, if I get it to go into Xmon, none >> of >>> > the >>> >> > CPU stacks are interesting. They all appear to be idle. >>> >> > >>> >> > I run the test as follows: >>> >> > >>> >> > ./task_smpl /bin/sleep 3 >>> >> > >>> >> >>> >> This test runs fine on my x86-64 system (Core 2). The order of the >>> >> close() vs munmap() >>> >> should not matter. The calls can be made in any order. The perfmon >>> >> context is destroyed >>> >> when the last reference to the file descriptor disappears, mmap >> counts >>> >> as 1. If you do close() >>> >> followed by munmap(), the perfmon context is destroyed as part of the >>> >> munmap(). This sequence >>> >> should not hang for you. What happens if you do a similar sequence >> but >>> >> just with a regular file: >>> >> fd = open("/tmp/foo); >>> >> addr = mmap(fd); >>> >> close(fd); >>> >> munmap(addr); >>> >> >>> >> The test runs to completion on both x86-64 and ia64: >>> >> >>> >> $ task_smpl /bin/sleep 3 >>> >> sycall base 295 >>> >> major version 2 >>> >> minor version 82 >>> >> [FIXED_CTRL(pmc16)=0xaa pmi0=1 en0=0x2 pmi1=1 en1=0x2 pmi2=1 en2=0x0] >>> >> INSTRUCTIONS_RETIRED UNHALTED_CORE_CYCLES >>> >> [FIXED_CTR0(pmd16)] >>> >> [FIXED_CTR1(pmd17)] >>> >> programming 1 PMCS and 2 PMDS >>> >> buffer mapped @0x7f999029b000 >>> >> hdr_cur_offs=128 version=1.0 >>> >> task terminated >>> >> entry 0 PID:32691 TID:32691 CPU:2 LAST_VAL:100000 IIP:0x7f66702246c2 >>> >> PMD16 :0x0000000000004130 >>> >> entry 1 PID:32691 TID:32691 CPU:2 LAST_VAL:100213 IIP:0x7f6670227560 >>> >> PMD16 :0x000000000000ef70 >>> >> entry 2 PID:32691 TID:32691 CPU:2 LAST_VAL:100060 IIP:0x7f6670233e52 >>> >> PMD16 :0x000000000000f384 >>> >> entry 3 PID:32691 TID:32691 CPU:2 LAST_VAL:100155 >> IIP:0xffffffff805c9e6f >>> >> PMD16 :0x00000000000104fe >>> >> 4 samples (4 in partial buffer) collected in 0 buffer overflows >>> >> real 0h00m03.001s user 0h00m00.000s sys 0h00m00.001s >>> >> $ >>> > >>> > >> >> > ------------------------------------------------------------------------------ Check out the new SourceForge.net Marketplace. It is the best place to buy or sell services for just about anything Open Source. http://p.sf.net/sfu/Xq1LFB _______________________________________________ perfmon2-devel mailing list perfmon2-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/perfmon2-devel