On 30-Dec-10 03:35, Greg Walton wrote:
> On 30-Dec-10 03:11, Greg Walton wrote:
>> Ok, I've convinced myself that my previous idea is (at least really
>> close to)correct, messages after the first one that get written into
>> dispatch_buffer are not aligned.
>>
>> Here's some debug cut and past showing testcpg bus error, gdb prints of
>> the pointers/structs in question, with dmesg output showing the bus
>> error address which matches the address accessed for dispatch_data->id.
>> After that there is some fprintf() debugging output that i inserted into
>> coroipcc.c and coroipcs.c to see what's happening to distpatch_buffer etc.
>>
>> No services are started in the corosync.conf file used.
>>
>> so now the question is... where do i add some code to pad the end of a
>> message in dispatch_buffer or shift the start of one to an aligned address?
>>
>> r...@serva:/usr/local/src/cluster/flatiron# uname -a
>> Linux serva 2.6.32-5-kirkwood #1 Fri Nov 26 07:01:06 UTC 2010 armv5tel
>> GNU/Linux
>> r...@serva:/usr/local/src/cluster/flatiron# corosync
>> r...@serva:/usr/local/src/cluster/flatiron# test/testcpg
>> Local node id is 3301c80a
>> membership list
>> node id 855754762 pid 4206
>> Type EXIT to finish
>>
>> ConfchgCallback: group 'GROUP'
>> joined node/pid 855754762/4206 reason: 1
>> nodes in group now 1
>> node/pid 855754762/4206
>> asdf
>> DeliverCallback: message (len=6)from node/pid 855754762/4206: 'asdf
>> '
>> asdf
>> Bus error (core dumped)
>> r...@serva:/usr/local/src/cluster/flatiron#
>> r...@serva:/usr/local/src/cluster/flatiron# gdb test/testcpg core
>> GNU gdb (GDB) 7.0.1-debian
>> Copyright (C) 2009 Free Software Foundation, Inc.
>> License GPLv3+: GNU GPL version 3 or later
>> <http://gnu.org/licenses/gpl.html>
>> This is free software: you are free to change and redistribute it.
>> There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
>> and "show warranty" for details.
>> This GDB was configured as "arm-linux-gnueabi".
>> For bug reporting instructions, please see:
>> <http://www.gnu.org/software/gdb/bugs/>...
>> Reading symbols from /usr/local/src/cluster/flatiron/test/testcpg...done.
>> Reading symbols from /usr/lib/libcpg.so.4...done.
>> Loaded symbols for /usr/lib/libcpg.so.4
>> Reading symbols from /usr/lib/libcoroipcc.so.4...done.
>> Loaded symbols for /usr/lib/libcoroipcc.so.4
>> Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done.
>> Loaded symbols for /lib/librt.so.1
>> Reading symbols from /lib/libpthread.so.0...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/libpthread.so.0
>> Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libdl.so.2
>> Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done.
>> Loaded symbols for /lib/libc.so.6
>> Reading symbols from /lib/ld-linux.so.3...(no debugging symbols
>> found)...done.
>> Loaded symbols for /lib/ld-linux.so.3
>> Core was generated by `test/testcpg'.
>> Program terminated with signal 7, Bus error.
>> #0  cpg_dispatch (handle=<value optimized out>,
>>     dispatch_types=<value optimized out>) at cpg.c:339
>> 339                     switch (dispatch_data->id) {
>> (gdb) bt
>> #0  cpg_dispatch (handle=<value optimized out>,
>>     dispatch_types=<value optimized out>) at cpg.c:339
>> #1  0x00008f10 in main (argc=<value optimized out>, argv=<value
>> optimized out>)
>>     at testcpg.c:237
>> (gdb) print dispatch_data
>> $1 = (coroipc_response_header_t *) 0x403b41a6
>> (gdb) print dispatch_data->id
>> $2 = 5
>> (gdb) print &dispatch_data->id
>> $3 = (int *) 0x403b41ae
>> (gdb) quit
>> r...@serva:/usr/local/src/cluster/flatiron# dmesg
>> [1968597.391683] Alignment trap: testcpg (4206) PC=0x40031c40
>> Instr=0xe5942008 Address=0x403b41ae FSR 0x001
>> r...@serva:/usr/local/src/cluster/flatiron# cat /tmp/corosync.debug
>> coroipcc.c circular_memory_map() - start of memory - *buf:0x403b4000
>> coroipcs.c shared_mem_dispatch_bytes_left n_read:0 n_write:0
>> bytes_left:1048575
>> coroipcc.c coroipc_dispatch_get() dispatch_buffer:0x403b4000
>> control_buffer->read:0
>> coroipcc.c coroipc_dispatch_get()
>> dispatch_buffer[control_buffer->read]:0x403b4000
>> coroipcc.c coroipc_dispatch_put() addr:0x403b4000 read_idx:0
>> header->size:232 dispatch_size:1048576
>> coroipcc.c coroipc_dispatch_put() modulus calc:232
>> coroipcs.c shared_mem_dispatch_bytes_left n_read:232 n_write:232
>> bytes_left:1048575
>> coroipcc.c coroipc_dispatch_get() dispatch_buffer:0x403b40e8
>> control_buffer->read:232
>> coroipcc.c coroipc_dispatch_get()
>> dispatch_buffer[control_buffer->read]:0x403b40e8
>> coroipcc.c coroipc_dispatch_put() addr:0x403b4000 read_idx:232
>> header->size:190 dispatch_size:1048576
>> coroipcc.c coroipc_dispatch_put() modulus calc:422
>> coroipcs.c shared_mem_dispatch_bytes_left n_read:422 n_write:422
>> bytes_left:1048575
>> coroipcc.c coroipc_dispatch_get() dispatch_buffer:0x403b41a6
>> control_buffer->read:422
>> coroipcc.c coroipc_dispatch_get()
>> dispatch_buffer[control_buffer->read]:0x403b41a6
>> r...@serva:/usr/local/src/cluster/flatiron#
>> _______________________________________________
>> Openais mailing list
>> [email protected]
>> https://lists.linux-foundation.org/mailman/listinfo/openais
> Ok, last reply to myself for today, but here is an additional test that
> supports the misalignment idea:
>
> if i run testcpg and send only 2 chars, which ends up as 4 for the
> message as written to dispatch_buffer (i'm assuming) or at least some
> multiple of 4 then there is no alignment error
>
> here's the log with a success for len4 then fails on first message after
> a non 4 byte message:
>
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> as
> DeliverCallback: message (len=4)from node/pid 855754762/5792: 'as
> '
> asd
> DeliverCallback: message (len=5)from node/pid 855754762/5792: 'asd
> '
> as
> Bus error (core dumped)
>
>
> _______________________________________________
> Openais mailing list
> [email protected]
> https://lists.linux-foundation.org/mailman/listinfo/openais

I have a working patch for this issue now!
It's more of  proof of concept patch vs anything ready to actually commit.
Basically I just pretend to pad the message written to dispatch_buffer
to a multiple of 4 bytes by manipulating control_buffer->{write,read}

I'd have included the patch here but it was against flatiron using svn
diff and I see the current practice seems to be git against trunk. So,
I'll get that worked up and submit it... tomorrow.

Greg Walton

_______________________________________________
Openais mailing list
[email protected]
https://lists.linux-foundation.org/mailman/listinfo/openais

Reply via email to