On 06/02/2011 08:16 PM, william felipe_welter wrote: > Well, > > Now with this patch, the pacemakerd process starts and up his other > process ( crmd, lrmd, pengine....) but after the process pacemakerd do > a fork, the forked process pacemakerd dies due to "signal 10, Bus > error".. And on the log, the process of pacemark ( crmd, lrmd, > pengine....) cant connect to open ais plugin (possible because the > "death" of the pacemakerd process). > But this time when the forked pacemakerd dies, he generates a coredump. > > gdb -c "/usr/var/lib/heartbeat/cores/root/ pacemakerd 7986" -se > /usr/sbin/pacemakerd : > GNU gdb (GDB) 7.0.1-debian > Copyright (C) 2009 Free Software Foundation, Inc. > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. Type "show copying" > and "show warranty" for details. > This GDB was configured as "sparc-linux-gnu". > For bug reporting instructions, please see: > <http://www.gnu.org/software/gdb/bugs/>... > Reading symbols from /usr/sbin/pacemakerd...done. > Reading symbols from /usr/lib64/libuuid.so.1...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib64/libuuid.so.1 > Reading symbols from /usr/lib/libcoroipcc.so.4...done. > Loaded symbols for /usr/lib/libcoroipcc.so.4 > Reading symbols from /usr/lib/libcpg.so.4...done. > Loaded symbols for /usr/lib/libcpg.so.4 > Reading symbols from /usr/lib/libquorum.so.4...done. > Loaded symbols for /usr/lib/libquorum.so.4 > Reading symbols from /usr/lib64/libcrmcommon.so.2...done. > Loaded symbols for /usr/lib64/libcrmcommon.so.2 > Reading symbols from /usr/lib/libcfg.so.4...done. > Loaded symbols for /usr/lib/libcfg.so.4 > Reading symbols from /usr/lib/libconfdb.so.4...done. > Loaded symbols for /usr/lib/libconfdb.so.4 > Reading symbols from /usr/lib64/libplumb.so.2...done. > Loaded symbols for /usr/lib64/libplumb.so.2 > Reading symbols from /usr/lib64/libpils.so.2...done. > Loaded symbols for /usr/lib64/libpils.so.2 > Reading symbols from /lib/libbz2.so.1.0...(no debugging symbols found)...done. > Loaded symbols for /lib/libbz2.so.1.0 > Reading symbols from /usr/lib/libxslt.so.1...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libxslt.so.1 > Reading symbols from /usr/lib/libxml2.so.2...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libxml2.so.2 > Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib/libc.so.6 > Reading symbols from /lib/librt.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib/librt.so.1 > Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib/libdl.so.2 > Reading symbols from /lib/libglib-2.0.so.0...(no debugging symbols > found)...done. > Loaded symbols for /lib/libglib-2.0.so.0 > Reading symbols from /usr/lib/libltdl.so.7...(no debugging symbols > found)...done. > Loaded symbols for /usr/lib/libltdl.so.7 > Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. > Loaded symbols for /lib/ld-linux.so.2 > Reading symbols from /lib/libpthread.so.0...(no debugging symbols > found)...done. > Loaded symbols for /lib/libpthread.so.0 > Reading symbols from /lib/libm.so.6...(no debugging symbols found)...done. > Loaded symbols for /lib/libm.so.6 > Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. > Loaded symbols for /usr/lib/libz.so.1 > Reading symbols from /lib/libpcre.so.3...(no debugging symbols found)...done. > Loaded symbols for /lib/libpcre.so.3 > Reading symbols from /lib/libnss_compat.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib/libnss_compat.so.2 > Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. > Loaded symbols for /lib/libnsl.so.1 > Reading symbols from /lib/libnss_nis.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib/libnss_nis.so.2 > Reading symbols from /lib/libnss_files.so.2...(no debugging symbols > found)...done. > Loaded symbols for /lib/libnss_files.so.2 > Core was generated by `pacemakerd'. > Program terminated with signal 10, Bus error. > #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at > cpg.c:339 > 339 switch (dispatch_data->id) { > (gdb) bt > #0 cpg_dispatch (handle=17861288972693536769, dispatch_types=7986) at > cpg.c:339 > #1 0xf6f100f0 in ?? () > #2 0xf6f100f4 in ?? () > Backtrace stopped: previous frame identical to this frame (corrupt stack?) > > > > I take a look at the cpg.c and see that the dispatch_data was aquired > by coroipcc_dispatch_get (that was defined on lib/coroipcc.c) > function: > > do { > error = coroipcc_dispatch_get ( > cpg_inst->handle, > (void **)&dispatch_data, > timeout); > > >
Try the recent patch sent to fix alignment. Regards -steve > > Resumed log: > ... > un 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering f to 10 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 10 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including f > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 10 > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: > Forked child 7991 for process lrmd > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: > update_node_processes: Node xxxxxxxxxx now has process list: > 00000000000000000000000000100112 (was > 00000000000000000000000000100102) > Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering 10 to 11 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 11 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 11 > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: > Forked child 7992 for process attrd > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: > update_node_processes: Node xxxxxxxxxx now has process list: > 00000000000000000000000000101112 (was > 00000000000000000000000000100112) > Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering 11 to 12 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 12 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 12 > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: > Forked child 7993 for process pengine > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: > update_node_processes: Node xxxxxxxxxx now has process list: > 00000000000000000000000000111112 (was > 00000000000000000000000000101112) > Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering 12 to 13 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 13 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 13 > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: start_child: > Forked child 7994 for process crmd > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: > update_node_processes: Node xxxxxxxxxx now has process list: > 00000000000000000000000000111312 (was > 00000000000000000000000000111112) > Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 xxxxxxxxxx pacemakerd: [7986]: info: main: Starting mainloop > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering 13 to 14 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 14 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 14 > Jun 02 23:12:20 corosync [CPG ] got mcast request on 0x62500 > Jun 02 23:12:20 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:20 corosync [TOTEM ] Delivering 14 to 15 > Jun 02 23:12:20 corosync [TOTEM ] Delivering MCAST message with seq 15 > to pending delivery queue > Jun 02 23:12:20 corosync [TOTEM ] releasing messages up to and including 15 > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: Invoked: > /usr/lib64/heartbeat/stonithd > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: > crm_log_init_worker: Changed active directory to > /usr/var/lib/heartbeat/cores/root > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: get_cluster_type: > Cluster type is: 'openais'. > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: > crm_cluster_connect: Connecting to cluster infrastructure: classic > openais (with plugin) > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: > init_ais_connection_classic: Creating connection to our Corosync > plugin > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_log_init_worker: > Changed active directory to /usr/var/lib/heartbeat/cores/hacluster > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: retrieveCib: Reading > cluster configuration from: /usr/var/lib/heartbeat/crm/cib.xml > (digest: /usr/var/lib/heartbeat/crm/cib.xml.sig) > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: retrieveCib: Cluster > configuration not found: /usr/var/lib/heartbeat/crm/cib.xml > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: Primary > configuration corrupt or unusable, trying backup... > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: get_last_sequence: > Series file /usr/var/lib/heartbeat/crm/cib.last does not exist > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile: Backup > file /usr/var/lib/heartbeat/crm/cib-99.raw not found > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: WARN: readCibXmlFile: > Continuing with an empty configuration. > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <cib epoch="0" num_updates="0" admin_epoch="0" > validate-with="pacemaker-1.2" > > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <configuration > > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <crm_config /> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <nodes /> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <resources /> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <constraints /> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > </configuration> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] > <status /> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: readCibXmlFile[on-disk] </cib> > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: validate_with_relaxng: > Creating RNG parser context > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: info: > init_ais_connection_classic: Connection to our AIS plugin (9) failed: > Doesn't exist (12) > Jun 02 23:12:20 xxxxxxxxxx stonith-ng: [7989]: CRIT: main: Cannot sign > in to the cluster... terminating > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: Invoked: > /usr/lib64/heartbeat/crmd > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: Invoked: > /usr/lib64/heartbeat/pengine > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crm_log_init_worker: > Changed active directory to /usr/var/lib/heartbeat/cores/hacluster > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: crm_log_init_worker: > Changed active directory to /usr/var/lib/heartbeat/cores/hacluster > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: main: CRM Hg Version: > e872eeb39a5f6e1fdb57c3108551a5353648c4f4 > > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Checking for > old instances of pengine > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /usr/var/run/crm/pengine > Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: enabling coredumps > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: info: crmd_init: Starting crmd > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: > init_client_ipc_comms_nodispatch: Could not init comms on: > /usr/var/run/crm/pengine > Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: debug: main: run the loop... > Jun 02 23:12:20 xxxxxxxxxx lrmd: [7991]: info: Started. > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: debug: main: Init server comms > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: s_crmd_fsa: Processing > I_STARTUP: [ state=S_STARTING cause=C_STARTUP origin=crmd_init ] > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: > actions:trace: // A_LOG > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: > actions:trace: // A_STARTUP > Jun 02 23:12:20 xxxxxxxxxx pengine: [7993]: info: main: Starting pengine > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: > Registering Signal Handlers > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_startup: Creating > CIB and LRM objects > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: do_fsa_action: > actions:trace: // A_CIB_START > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /usr/var/run/crm/cib_rw > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Could not init comms on: > /usr/var/run/crm/cib_rw > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: > Connection to command channel failed > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /usr/var/run/crm/cib_callback > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Could not init comms on: > /usr/var/run/crm/cib_callback > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: > Connection to callback channel failed > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: > Connection to CIB failed: connection failed > Jun 02 23:12:20 xxxxxxxxxx crmd: [7994]: debug: cib_native_signoff: > Signing out of the CIB Service > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: debug: activateCibXml: > Triggering CIB write for start op > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: startCib: CIB > Initialization completed successfully > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: get_cluster_type: > Cluster type is: 'openais'. > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: crm_cluster_connect: > Connecting to cluster infrastructure: classic openais (with plugin) > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: > init_ais_connection_classic: Creating connection to our Corosync > plugin > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: info: > init_ais_connection_classic: Connection to our AIS plugin (9) failed: > Doesn't exist (12) > Jun 02 23:12:20 xxxxxxxxxx cib: [7990]: CRIT: cib_init: Cannot sign in > to the cluster... terminating > Jun 02 23:12:21 corosync [CPG ] exit_fn for conn=0x62500 > Jun 02 23:12:21 corosync [TOTEM ] mcasted message added to pending queue > Jun 02 23:12:21 corosync [TOTEM ] Delivering 15 to 16 > Jun 02 23:12:21 corosync [TOTEM ] Delivering MCAST message with seq 16 > to pending delivery queue > Jun 02 23:12:21 corosync [CPG ] got procleave message from cluster > node 1377289226 > Jun 02 23:12:21 corosync [TOTEM ] releasing messages up to and including 16 > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: Invoked: > /usr/lib64/heartbeat/attrd > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_log_init_worker: > Changed active directory to /usr/var/lib/heartbeat/cores/hacluster > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Starting up > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: get_cluster_type: > Cluster type is: 'openais'. > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: crm_cluster_connect: > Connecting to cluster infrastructure: classic openais (with plugin) > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: > init_ais_connection_classic: Creating connection to our Corosync > plugin > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: > init_ais_connection_classic: Connection to our AIS plugin (9) failed: > Doesn't exist (12) > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: HA Signon failed > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Cluster connection > active > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: info: main: Accepting > attribute updates > Jun 02 23:12:21 xxxxxxxxxx attrd: [7992]: ERROR: main: Aborting startup > Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /usr/var/run/crm/cib_rw > Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Could not init comms on: > /usr/var/run/crm/cib_rw > Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: cib_native_signon_raw: > Connection to command channel failed > Jun 02 23:12:21 xxxxxxxxxx crmd: [7994]: debug: > init_client_ipc_comms_nodispatch: Attempting to talk on: > /usr/var/run/crm/cib_callback > ... > > > 2011/6/2 Steven Dake <sd...@redhat.com>: >> On 06/01/2011 11:05 PM, william felipe_welter wrote: >>> I recompile my kernel without hugetlb .. and the result are the same.. >>> >>> My test program still resulting: >>> PATH=/dev/shm/teste123XXXXXX >>> page size=20000 >>> fd=3 >>> ADDR_ORIG:0xe000a000 ADDR:0xffffffff >>> Erro >>> >>> And Pacemaker still resulting because the mmap error: >>> Could not initialize Cluster Configuration Database API instance error 2 >>> >> >> Give the patch I posted recently a spin - corosync WFM with this patch >> on sparc64 with hugetlb set. Please report back results. >> >> Regards >> -steve >> >>> For make sure that i have disable the hugetlb there is my /proc/meminfo: >>> MemTotal: 33093488 kB >>> MemFree: 32855616 kB >>> Buffers: 5600 kB >>> Cached: 53480 kB >>> SwapCached: 0 kB >>> Active: 45768 kB >>> Inactive: 28104 kB >>> Active(anon): 18024 kB >>> Inactive(anon): 1560 kB >>> Active(file): 27744 kB >>> Inactive(file): 26544 kB >>> Unevictable: 0 kB >>> Mlocked: 0 kB >>> SwapTotal: 6104680 kB >>> SwapFree: 6104680 kB >>> Dirty: 0 kB >>> Writeback: 0 kB >>> AnonPages: 14936 kB >>> Mapped: 7736 kB >>> Shmem: 4624 kB >>> Slab: 39184 kB >>> SReclaimable: 10088 kB >>> SUnreclaim: 29096 kB >>> KernelStack: 7088 kB >>> PageTables: 1160 kB >>> Quicklists: 17664 kB >>> NFS_Unstable: 0 kB >>> Bounce: 0 kB >>> WritebackTmp: 0 kB >>> CommitLimit: 22651424 kB >>> Committed_AS: 519368 kB >>> VmallocTotal: 1069547520 kB >>> VmallocUsed: 11064 kB >>> VmallocChunk: 1069529616 kB >>> >>> >>> 2011/6/1 Steven Dake <sd...@redhat.com>: >>>> On 06/01/2011 07:42 AM, william felipe_welter wrote: >>>>> Steven, >>>>> >>>>> cat /proc/meminfo >>>>> ... >>>>> HugePages_Total: 0 >>>>> HugePages_Free: 0 >>>>> HugePages_Rsvd: 0 >>>>> HugePages_Surp: 0 >>>>> Hugepagesize: 4096 kB >>>>> ... >>>>> >>>> >>>> It definitely requires a kernel compile and setting the config option to >>>> off. I don't know the debian way of doing this. >>>> >>>> The only reason you may need this option is if you have very large >>>> memory sizes, such as 48GB or more. >>>> >>>> Regards >>>> -steve >>>> >>>>> Its 4MB.. >>>>> >>>>> How can i disable hugetlb ? ( passing CONFIG_HUGETLBFS=n at boot to >>>>> kernel ?) >>>>> >>>>> 2011/6/1 Steven Dake <sd...@redhat.com <mailto:sd...@redhat.com>> >>>>> >>>>> On 06/01/2011 01:05 AM, Steven Dake wrote: >>>>> > On 05/31/2011 09:44 PM, Angus Salkeld wrote: >>>>> >> On Tue, May 31, 2011 at 11:52:48PM -0300, william felipe_welter >>>>> wrote: >>>>> >>> Angus, >>>>> >>> >>>>> >>> I make some test program (based on the code coreipcc.c) and i >>>>> now i sure >>>>> >>> that are problems with the mmap systems call on sparc.. >>>>> >>> >>>>> >>> Source code of my test program: >>>>> >>> >>>>> >>> #include <stdlib.h> >>>>> >>> #include <sys/mman.h> >>>>> >>> #include <stdio.h> >>>>> >>> >>>>> >>> #define PATH_MAX 36 >>>>> >>> >>>>> >>> int main() >>>>> >>> { >>>>> >>> >>>>> >>> int32_t fd; >>>>> >>> void *addr_orig; >>>>> >>> void *addr; >>>>> >>> char path[PATH_MAX]; >>>>> >>> const char *file = "teste123XXXXXX"; >>>>> >>> size_t bytes=10024; >>>>> >>> >>>>> >>> snprintf (path, PATH_MAX, "/dev/shm/%s", file); >>>>> >>> printf("PATH=%s\n",path); >>>>> >>> >>>>> >>> fd = mkstemp (path); >>>>> >>> printf("fd=%d \n",fd); >>>>> >>> >>>>> >>> >>>>> >>> addr_orig = mmap (NULL, bytes, PROT_NONE, >>>>> >>> MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); >>>>> >>> >>>>> >>> >>>>> >>> addr = mmap (addr_orig, bytes, PROT_READ | PROT_WRITE, >>>>> >>> MAP_FIXED | MAP_SHARED, fd, 0); >>>>> >>> >>>>> >>> printf("ADDR_ORIG:%p ADDR:%p\n",addr_orig,addr); >>>>> >>> >>>>> >>> >>>>> >>> if (addr != addr_orig) { >>>>> >>> printf("Erro"); >>>>> >>> } >>>>> >>> } >>>>> >>> >>>>> >>> Results on x86: >>>>> >>> PATH=/dev/shm/teste123XXXXXX >>>>> >>> fd=3 >>>>> >>> ADDR_ORIG:0x7f867d8e6000 ADDR:0x7f867d8e6000 >>>>> >>> >>>>> >>> Results on sparc: >>>>> >>> PATH=/dev/shm/teste123XXXXXX >>>>> >>> fd=3 >>>>> >>> ADDR_ORIG:0xf7f72000 ADDR:0xffffffff >>>>> >> >>>>> >> Note: 0xffffffff == MAP_FAILED >>>>> >> >>>>> >> (from man mmap) >>>>> >> RETURN VALUE >>>>> >> On success, mmap() returns a pointer to the mapped area. On >>>>> >> error, the value MAP_FAILED (that is, (void *) -1) is >>>>> returned, >>>>> >> and errno is set appropriately. >>>>> >> >>>>> >>> >>>>> >>> >>>>> >>> But im wondering if is really needed to call mmap 2 times ? >>>>> What are the >>>>> >>> reason to call the mmap 2 times, on the second time using the >>>>> address of the >>>>> >>> first? >>>>> >>> >>>>> >>> >>>>> >> Well there are 3 calls to mmap() >>>>> >> 1) one to allocate 2 * what you need (in pages) >>>>> >> 2) maps the first half of the mem to a real file >>>>> >> 3) maps the second half of the mem to the same file >>>>> >> >>>>> >> The point is when you write to an address over the end of the >>>>> >> first half of memory it is taken care of the the third mmap which >>>>> maps >>>>> >> the address back to the top of the file for you. This means you >>>>> >> don't have to worry about ringbuffer wrapping which can be a >>>>> headache. >>>>> >> >>>>> >> -Angus >>>>> >> >>>>> > >>>>> > interesting this mmap operation doesn't work on sparc linux. >>>>> > >>>>> > Not sure how I can help here - Next step would be a follow up with >>>>> the >>>>> > sparc linux mailing list. I'll do that and cc you on the message >>>>> - see >>>>> > if we get any response. >>>>> > >>>>> > http://vger.kernel.org/vger-lists.html >>>>> > >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> 2011/5/31 Angus Salkeld <asalk...@redhat.com >>>>> <mailto:asalk...@redhat.com>> >>>>> >>> >>>>> >>>> On Tue, May 31, 2011 at 06:25:56PM -0300, william felipe_welter >>>>> wrote: >>>>> >>>>> Thanks Steven, >>>>> >>>>> >>>>> >>>>> Now im try to run on the MCP: >>>>> >>>>> - Uninstall the pacemaker 1.0 >>>>> >>>>> - Compile and install 1.1 >>>>> >>>>> >>>>> >>>>> But now i have problems to initialize the pacemakerd: Could not >>>>> >>>> initialize >>>>> >>>>> Cluster Configuration Database API instance error 2 >>>>> >>>>> Debbuging with gdb i see that the error are on the confdb.. most >>>>> >>>> specificaly >>>>> >>>>> the errors start on coreipcc.c at line: >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 448 if (addr != addr_orig) { >>>>> >>>>> 449 goto error_close_unlink; <- enter here >>>>> >>>>> 450 } >>>>> >>>>> >>>>> >>>>> Some ideia about what can cause this ? >>>>> >>>>> >>>>> >>>> >>>>> >>>> I tried porting a ringbuffer (www.libqb.org >>>>> <http://www.libqb.org>) to sparc and had the same >>>>> >>>> failure. >>>>> >>>> There are 3 mmap() calls and on sparc the third one keeps >>>>> failing. >>>>> >>>> >>>>> >>>> This is a common way of creating a ring buffer, see: >>>>> >>>> >>>>> >>>>> http://en.wikipedia.org/wiki/Circular_buffer#Exemplary_POSIX_Implementation >>>>> >>>> >>>>> >>>> I couldn't get it working in the short time I tried. It's >>>>> probably >>>>> >>>> worth looking at the clib implementation to see why it's failing >>>>> >>>> (I didn't get to that). >>>>> >>>> >>>>> >>>> -Angus >>>>> >>>> >>>>> >>>>> Note, we sorted this out we believe. Your kernel has hugetlb enabled, >>>>> probably with 4MB pages. This requires corosync to allocate 4MB >>>>> pages. >>>>> >>>>> Can you verify your hugetlb settings? >>>>> >>>>> If you can turn this option off, you should have atleast a working >>>>> corosync. >>>>> >>>>> Regards >>>>> -steve >>>>> >>>> >>>>> >>>> _______________________________________________ >>>>> >>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>> >>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>> >>>>> >>>> Project Home: http://www.clusterlabs.org >>>>> >>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> >>>> Bugs: >>>>> >>>> >>>>> >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> >>>> >>>>> >>> >>>>> >>> >>>>> >>> >>>>> >>> -- >>>>> >>> William Felipe Welter >>>>> >>> ------------------------------ >>>>> >>> Consultor em Tecnologias Livres >>>>> >>> william.wel...@4linux.com.br <mailto:william.wel...@4linux.com.br> >>>>> >>> www.4linux.com.br <http://www.4linux.com.br> >>>>> >> >>>>> >>> _______________________________________________ >>>>> >>> Openais mailing list >>>>> >>> open...@lists.linux-foundation.org >>>>> <mailto:open...@lists.linux-foundation.org> >>>>> >>> https://lists.linux-foundation.org/mailman/listinfo/openais >>>>> >> >>>>> >> >>>>> >> _______________________________________________ >>>>> >> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>> >> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >> >>>>> >> Project Home: http://www.clusterlabs.org >>>>> >> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> >> Bugs: >>>>> >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> > >>>>> > _______________________________________________ >>>>> > Openais mailing list >>>>> > open...@lists.linux-foundation.org >>>>> <mailto:open...@lists.linux-foundation.org> >>>>> > https://lists.linux-foundation.org/mailman/listinfo/openais >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> <mailto:Pacemaker@oss.clusterlabs.org> >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: >>>>> http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: >>>>> >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> William Felipe Welter >>>>> ------------------------------ >>>>> Consultor em Tecnologias Livres >>>>> william.wel...@4linux.com.br <mailto:william.wel...@4linux.com.br> >>>>> www.4linux.com.br <http://www.4linux.com.br> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Pacemaker mailing list: Pacemaker@oss.clusterlabs.org >>>>> http://oss.clusterlabs.org/mailman/listinfo/pacemaker >>>>> >>>>> Project Home: http://www.clusterlabs.org >>>>> Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf >>>>> Bugs: >>>>> http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker >>>> >>>> >>> >>> >>> >> >> > > > _______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://developerbugs.linux-foundation.org/enter_bug.cgi?product=Pacemaker