Corey,


Are you getting:

perfmon/perfmon_file.c: In function 'pfm_buf_map_close':
perfmon/perfmon_file.c:137: warning: passing argument 1 of
'atomic_read' from incompatible pointer type


When compiling your kernel?

This needs to be investigated some more because this is on a test
which sets a flag
related to the lock you are reporting on.


On Thu, Jan 8, 2009 at 12:27 AM, stephane eranian
<eran...@googlemail.com> wrote:
> Corey,
>
> Let me take a look at this. This is some nasty code in there.
> But it is also old and we may be able simplify it. I don't think
> it has to be that complicated. Problem is that the issue does
> not show up on x86.
>
>
> On Thu, Jan 8, 2009 at 12:10 AM, Corey J Ashford <cjash...@us.ibm.com> wrote:
>> Ok, I have some more data about this lock-up problem.  I turned on perfmon
>> debugging and saw that the last thing that perfmon did was to call
>> down_write() from pfm_smp_buf_space_release.212.  That code attempts to
>> acquire a lock, so I decided to turn on lock debugging in the kernel, and
>> got this output when I ran the test case:
>>
>> perfmon: pfm_smpl_buf_space_release.212: CPU2 [3318]: doing down_write
>>
>> =============================================
>> [ INFO: possible recursive locking detected ]
>> 2.6.28-rc6-pfm2-09445-g4fca1a2-dirty #12
>> ---------------------------------------------
>> task_smpl/3318 is trying to acquire lock:
>>  (&mm->mmap_sem){----}, at: [<c0000000003037d8>]
>> .pfm_smpl_buf_space_release+0xa0/0x180
>>
>> but task is already holding lock:
>>  (&mm->mmap_sem){----}, at: [<c000000000102b34>] .sys_munmap+0x54/0xa0
>>
>> other info that might help us debug this:
>> 1 lock held by task_smpl/3318:
>>  #0:  (&mm->mmap_sem){----}, at: [<c000000000102b34>]
>> .sys_munmap+0x54/0xa0
>>
>> stack backtrace:
>> Call Trace:
>> [c00000000ca77380] [c000000000012254] .show_stack+0x94/0x198 (unreliable)
>> [c00000000ca77430] [c000000000012380] .dump_stack+0x28/0x3c
>> [c00000000ca774b0] [c0000000000a14f0] .validate_chain+0x690/0xdc0
>> [c00000000ca77570] [c0000000000a2404] .__lock_acquire+0x7e4/0x8bc
>> [c00000000ca77670] [c0000000000a2588] .lock_acquire+0xac/0xf8
>> [c00000000ca77740] [c0000000005cb630] .down_write+0x64/0xbc
>> [c00000000ca777d0] [c0000000003037d8]
>> .pfm_smpl_buf_space_release+0xa0/0x180
>> [c00000000ca77870] [c00000000030d464] .pfm_smpl_buf_free+0x8c/0x104
>> [c00000000ca77900] [c00000000030f2a0] .pfm_free_context+0x40/0xc8
>> [c00000000ca77990] [c000000000307d5c] .__pfm_close+0x2f8/0x33c
>> [c00000000ca77a60] [c000000000308af8] .pfm_close+0x98/0xb4
>> [c00000000ca77af0] [c00000000012b56c] .__fput+0x16c/0x258
>> [c00000000ca77ba0] [c00000000012baa4] .fput+0x50/0x68
>> [c00000000ca77c30] [c0000000001003c4] .remove_vma+0x90/0xf8
>> [c00000000ca77cc0] [c0000000001015d8] .do_munmap+0x30c/0x358
>> [c00000000ca77d90] [c000000000102b48] .sys_munmap+0x68/0xa0
>> [c00000000ca77e30] [c0000000000084d4] syscall_exit+0x0/0x40
>>
>> Does this ring any bells with you?
>>
>> Thanks,
>>
>> - Corey
>>
>> "stephane eranian" <eran...@googlemail.com> wrote on 01/07/2009 12:03:24
>> PM:
>>
>>> Corey,
>>>
>>> I was expecting success with the program below if /tmp/foo exists.
>>>
>>> The perfmon code that handles all of this is generic, so there must be a
>>> race condition somewhere which is only exposed on Power.
>>>
>>> On Wed, Jan 7, 2009 at 8:02 PM, Corey J Ashford <cjash...@us.ibm.com>
>> wrote:
>>> > Thanks for the reply, Stephane.  I tried the test case you suggested:
>>> >
>>> > main() {
>>> >   int fd;
>>> >   void *addr;
>>> >
>>> >   fd = open ("/tmp/foo", O_RDONLY);
>>> >   printf("fd = %d\n", fd);
>>> >   addr = mmap(NULL, 10, PROT_READ, MAP_PRIVATE, fd, 0);
>>> >   printf("addr = %p\n", addr);
>>> >   if (close(fd)) {
>>> >      printf("close failed\n");
>>> >   }
>>> >   if (munmap(addr, 10)) {
>>> >      printf("munmap failed\n");
>>> >   }
>>> > }
>>> >
>>> > and it worked fine.  So apparently there is a problem related to
>>> > munmap'ing a perfmon fd on Power.  This will need more investigation,
>>> > obviously.
>>> >
>>> > - Corey
>>> >
>>> > "stephane eranian" <eran...@googlemail.com> wrote on 01/06/2009
>> 10:28:41
>>> > PM:
>>> >
>>> >> Corey,
>>> >>
>>> >> On Wed, Jan 7, 2009 at 3:24 AM, Corey J Ashford <cjash...@us.ibm.com>
>>> > wrote:
>>> >> >
>>> >> > Hello,
>>> >> >
>>> >> > I'd appreciate it if someone on this mailing list could try out the
>>> > libpfm
>>> >> > example: task_smpl and see if it runs correctly for you on any
>> other
>>> >> > architecture besides Power.
>>> >> >
>>> >> > When I run it on my Power5-based machine here, I get a system hang
>>> > that
>>> >> > occurs when the munmap call is made.  Looking at the code in the
>>> > example, I
>>> >> > reversed the order of the close and munmap... so that the memory is
>>> > unmapped
>>> >> > before the fd is closed, and this allows the test to run to
>> completion
>>> >> > without error and causes no hang.  I also tried commenting out the
>>> > call to
>>> >> > pfm_start, to cut  perfmon out of the loop for the most part, and
>> the
>>> >> > behavior still reproduces - the system hangs unless I reverse those
>>> > two
>>> >> > calls.
>>> >> >
>>> >> > When the system hangs like this, if I get it to go into Xmon, none
>> of
>>> > the
>>> >> > CPU stacks are interesting.  They all appear to be idle.
>>> >> >
>>> >> > I run the test as follows:
>>> >> >
>>> >> > ./task_smpl /bin/sleep 3
>>> >> >
>>> >>
>>> >> This test runs fine on my x86-64 system (Core 2). The order of the
>>> >> close() vs munmap()
>>> >> should not matter. The calls can be made in any order. The perfmon
>>> >> context is destroyed
>>> >> when the last reference to the file descriptor disappears, mmap
>> counts
>>> >> as 1. If you do close()
>>> >> followed by munmap(), the perfmon context is destroyed as part of the
>>> >> munmap(). This sequence
>>> >> should not hang for you. What happens if you do a similar sequence
>> but
>>> >> just with a regular file:
>>> >>     fd = open("/tmp/foo);
>>> >>     addr = mmap(fd);
>>> >>     close(fd);
>>> >>     munmap(addr);
>>> >>
>>> >> The test runs to completion on both x86-64 and ia64:
>>> >>
>>> >> $ task_smpl /bin/sleep 3
>>> >> sycall base 295
>>> >> major version 2
>>> >> minor version 82
>>> >> [FIXED_CTRL(pmc16)=0xaa pmi0=1 en0=0x2 pmi1=1 en1=0x2 pmi2=1 en2=0x0]
>>> >> INSTRUCTIONS_RETIRED UNHALTED_CORE_CYCLES
>>> >> [FIXED_CTR0(pmd16)]
>>> >> [FIXED_CTR1(pmd17)]
>>> >> programming 1 PMCS and 2 PMDS
>>> >> buffer mapped @0x7f999029b000
>>> >> hdr_cur_offs=128 version=1.0
>>> >> task terminated
>>> >> entry 0 PID:32691 TID:32691 CPU:2 LAST_VAL:100000 IIP:0x7f66702246c2
>>> >> PMD16 :0x0000000000004130
>>> >> entry 1 PID:32691 TID:32691 CPU:2 LAST_VAL:100213 IIP:0x7f6670227560
>>> >> PMD16 :0x000000000000ef70
>>> >> entry 2 PID:32691 TID:32691 CPU:2 LAST_VAL:100060 IIP:0x7f6670233e52
>>> >> PMD16 :0x000000000000f384
>>> >> entry 3 PID:32691 TID:32691 CPU:2 LAST_VAL:100155
>> IIP:0xffffffff805c9e6f
>>> >> PMD16 :0x00000000000104fe
>>> >> 4 samples (4 in partial buffer) collected in 0 buffer overflows
>>> >> real 0h00m03.001s user 0h00m00.000s sys 0h00m00.001s
>>> >> $
>>> >
>>> >
>>
>>
>

------------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It is the best place to buy or sell services for
just about anything Open Source.
http://p.sf.net/sfu/Xq1LFB
_______________________________________________
perfmon2-devel mailing list
perfmon2-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/perfmon2-devel

Reply via email to