Hello Zhang Huan,
I was finally able to reproduce the bug in my i386 server installed with 
RHEL5.1.

As identified by you, this bug is reproducible only when the openhpid process 
is run in the background
(done by invoking the openhpid using service utility, i.e. "service openhpid 
start" or
by running the openhpid in background directly i.e. "openhpid -c 
/etc/openhpi/openhpi.conf")

Testing and findings on openhpi.2.10.2:
=============================
When hpitop clients is executed, every alternate execution of the client 
program reports error:

The error printed was:
Wrong version? 0x50 != 0x1
(This error was printed by ReadMsg method of strmsock object defined in 
marshal/strmsock.cpp file)

The configuration file used for this testing was having just one libipmi 
handler pointing to a non-existent machine, extract of the same is attached 
below:

handler libipmi { // This is the handler to non existing machine
       entity_root = "{SYSTEM_CHASSIS,3}"
       name = "lan"
       addr = "10.10.181.9"        #ipaddress
       port = "623"
       auth_type = "md5"   # none, md2, md5 or straight
       auth_level = "admin" # operator or admin
       username = "root"
       password = "password"
}

When the openhpid is started in background, ipmi connection error is reported 
and the openhpid continuously tries to discover with no success.

When the hpitop client is started, there is stream socket connection 
established between openhpid process and hpitop client program.
Since openhpid is running in the background, File handlers like stderr, stdout 
and stdin are not attached to any TTY.
But for reporting the ipmi connection error, the IPMI plugin tries to write 
into stderr using fprintf (plugins/ipmi/ipmi.c, line no 604), this error 
message gets into data stream being read by client:

Extract of ipmi.c:

 602         while (ipmi_handler->fully_up == 0) {
 603                 if (!ipmi_handler->connected) {
 604                         fprintf(stderr, "IPMI connection is down\n");
 605                         return SA_ERR_HPI_NO_RESPONSE;
 606                 }


(gdb) print data
$12 = 0xbfc14d2b "IPMI connection is down\n\001\021"
(gdb) print *data
$13 = 73 'I'

Provided above is the corrupted message content received at hpitop client's 
side (Captured in gdb).
Due to this problem the ReadMsg  reports wrong version error.

I commented out line no 604 in ipmi.c containing this fprintf to stderr and
rebuilt the entire openhpi process and retested the above scenario, the problem 
is not seen any more for repeated execution of hpitop client program
Such usage of fprintf to stderr has been used in ipmi plugin at multiple 
places, probably changing these fprintf to "dbg"  will solve the problem.

Testing and findings on openhpi.2.11.2:
=============================
The scenario here is much different, the definition of the "dbg" statements are 
written to stderr and hence most of the dbg statement gets mixed with the data 
being read by hpitop client program.

In 2.10.2, the dbg statements were written to syslog rather than stderr.

Any suggestions on finding the solution for openhpi trunk.

Renier,

Can please suggest if I can take up this bug for fixing, if no one else is 
already working on it.

Regards,
MS

-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Friday, May 09, 2008 6:58 AM
To: Sampathkumar, Raghavendra Mysore
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for 
non-existent machine)

I was totally confused after some tests.
Show my original openhpid.conf below:
# handler libsimulator {
#        entity_root = "{SYSTEM_CHASSIS,101}"
#        name = "simulator"
# }

handler libipmi {                                                               
<= for existed machine
        entity_root = "{SYSTEM_CHASSIS,200}"
        name = "lan"
        addr = "10.10.37.190"
        port = "623"
        auth_type = "md5"
        auth_level= "admin"
        username = "root"
        password = "111111"
}
handler libipmi {                                                               
<= for not existed machine
        entity_root = "{SYSTEM_CHASSIS,201}"
        name = "lan"
        addr = "10.10.37.191"
        port = "623"
        auth_type = "md5"
        auth_level= "admin"
        username = "root"
        password = "111111"
}
# handler libipmi {
#         entity_root = "{SYSTEM_CHASSIS,300}"
#         name = "lan"
#         addr = "10.10.37.91"
#         port = "623"
#         auth_type = "md5"
#         auth_level= "admin"
#         username = "root"
#         password = "cmmrootpass"
# }
# domain 200 {
#       entity_pattern = "{SYSTEM_CHASSIS,200}*"
#       tag = "normal server" # Optional
#       ai_timeout = 0 # Optional. -1 means BLOCK. 0 is the default.
#       ai_readonly = 1 # Optional. 1 means yes (default), 0 means no.
# }
# domain 300 {
#       entity_pattern = "{SYSTEM_CHASSIS,300}*"
#       tag = "normal server2" # Optional
#       ai_timeout = 0 # Optional. -1 means BLOCK. 0 is the default.
#       ai_readonly = 1 # Optional. 1 means yes (default), 0 means no.
# }

test steps:
I did the following test in my FC8, using conf above:
1. Deleted all the commented lines, then everything is ok, no error report; 2. 
Restored conf file. Deleted all commented lines above the normal (uncommented) 
lines, everything is ok; 3. Restored conf file. Deleted all commented line 
below the normal lines, everything is ok; 4. Restored conf file. then 
everything is ok!!! no error reported since.
5. Removed all commented lines and handler for existed machine, just left the 
handler for not existed machine. The error returned!
6. repeatd the above steps on CentOS5.1, error remains, no changes.

seems some subtle things may trigger the error.


CentOS is very similar to Redhat AS, CentOS 5.1 is just the re-package of 
redhat AS 5.1(they rebuild all packages, since redhat only provide the src.rpm 
file)

we don’t use suse, I will test on redhat AS after I found one.


Zhang Huan

-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月8日 14:24
收件人: [EMAIL PROTECTED]
主题: RE: Regarding bug id 1939812 (openhpid doesnt work correctly for 
non-existent machine)

Hello,

The Linux distro I'm running the openhpid are:

1. openhpi (2.10.2) is run on i686-redhat-linux-gnu (Red Hat Enterprise Linux 
Server release 5.1 (Tikanga)) 2. openhpi (2.11.2) is run on x86_64-suse-linux ( 
SUSE LINUX 10.1 (X86-64) VERSION = 10.1)

And on both these setups, the problem is not reproduced.

We do not have FC8 setup in our lab, which I guess is a desktop version.  We 
will try get this version and check the problem

Meanwhile, I would like to suggest you to try to reproduce the problem on RHEL5 
or SLES version if you have any machines with this OS flavor.

Thanks.

Regards,
Raghavendra M.S.

-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 07, 2008 6:56 PM
To: Sampathkumar, Raghavendra Mysore; [email protected]
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for 
non-existent machine)

I do following checks/tests:

1. checked /etc/init.d/openhpid, only absolute path of file "openhpid" differs. 
My openhpi is installed from rpm, and I run "make rpm" to build it myself.

2. erased openhpi and its subpackages. run "make install" to reinstall it. 
openhpi.sh is also copied to /usr/local/etc/init.d/openhpid.
start openhpid service and run hpitop, the problem remains.

3. removed the existed entry from openhpi.conf, and tested again. hpitop 
reported the same problem. New finding!

4. redo step 2 and step3 using openhpi 2.11.1(download from website), the same 
problem!

5. found another machine running Centos5.1, try its original openhpi(version 
2.8.1-2.el5.7). no error reported!

6. erase openhpi and its subpackages, download openhpi 2.11.1 from openhpi 
website, make install it. tested again using hpitop, and it reported the same 
problem.

what linux distro do you use? the bug is easily reproduced in my machine.

PS: FC8 and CentOS5.1 I used are all 32bit system. FC8 has been updated to the 
newest.

I intended to test on FC4, but openhpi failed to make. message below:
cc1: warnings being treated as errors
el2event.c:5581: warning: initialization discards qualifiers from pointer 
target type


Zhang Huan

-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月7日 17:02
收件人: [EMAIL PROTECTED]; [email protected]
主题: RE: Regarding bug id 1939812 (openhpid doesnt work correctly for 
non-existent machine)

Hello,

I did the following to test the openhpid (run using "service" utility):

1. I copied the openhpid.sh script available in openhpid directory to 
/etc/init.d/openhpid

cp <openhpi_root_directory>/openhpid/openhpi.sh /etc/init.d/openhpid

2. The openhpi.conf file picked up by this script is 
/usr/local/etc/openhpi/openhpi.conf

Copied the openhpi.conf containing the libipmi handler, one pointing to the 
ATCA chassis and another to non existent machine to this directory

cp openhpi.conf.ipmi /usr/local/etc/openhpi/openhpi.conf

3. started the openhpid using service utility

service openhpid start
Starting openhpid:                                         [  OK  ]

4. Discovery completes successfully for the ATCA chassis and reports an error 
for non existent machine.
openhpid continues to function with no issues

5. "hpitop -x" command is issued multiple times and every time the command runs 
fine with no issues.
"wrong version" is not reported even once

6. Verified the above steps with openhpi-2.10.2 and openhpi trunk version 2.11.2

NOTE: I'm still running OpenIPMI-2.0.10 version

I would suggest you to please try the "openhpid.sh" script for starting 
openhpid using service utility and check whether problem is still reproducible.

I shall in parallel update the bug with my findings.

Regards,
Raghavendra M.S.


-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 06, 2008 12:33 PM
To: Sampathkumar, Raghavendra Mysore; [email protected]
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for 
non-existent machine)

in my previous test, openhpid is run as a daemon with "service openhpid start". 
however if I run openhpid directly with "openhpid -c xxx.conf" or "openhpid -c 
xxx.conf -n", everything seems OK. seems that some environment variables 
trigger the bug?!

I did all the tests in FC8, using OpenIPMI-2.0.11-3.fc8.


Zhang Huan
-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月6日 13:39
收件人: [EMAIL PROTECTED]; [email protected]
主题: Regarding bug id 1939812 (openhpid doesnt work correctly for non-existent 
machine)

Hello zhanghan,

This is regarding the bug id 1939812 in source Bugzilla.

Bug Description: openhpid doesn't work correctly for non-existent machine

I retested the scenario which you have mentioned in the bug, following are 
findings:

OpenIPMI-2.0.10 library has been installed to enable the libipmi plugin during 
the openhpi compilation.

Two libipmi handler are created in openhpi.conf, one pointing to the ATCA 
chassis and another pointing to non existing machine.

Extract of the openhpi.conf file:

handler libipmi { // This is the right handler to ATCA
       entity_root = "{SYSTEM_CHASSIS,3}"
       name = "lan"
       addr = "10.10.181.8"        #ipaddress
       port = "623"
       auth_type = "md5"   # none, md2, md5 or straight
       auth_level = "admin" # operator or admin
       username = "root"
       password = "password"
}

handler libipmi { // This is the handler to non existing machine
       entity_root = "{SYSTEM_CHASSIS,3}"
       name = "lan"
       addr = "10.10.181.9"        #ipaddress
       port = "623"
       auth_type = "md5"   # none, md2, md5 or straight
       auth_level = "admin" # operator or admin
       username = "root"
       password = "password"
}


Captured below is the console dump when openhpid(openhpi.2.10.2) is started 
with this configuration:

====================================================================================

#openhpid -c ./openhpi.conf.ipmi -n
 threaded.c:153:oh_threaded_init: Attempting to init event
 threaded.c:158:oh_threaded_init: Already supporting threads
 event.c:47:oh_event_init: Setting up event processing queue
 event.c:50:oh_event_init: Set up processing queue
 plugin.c:311:oh_load_plugin: Plugin libipmi already loaded. Not loading twice.
 config.c:737:oh_load_config: Done processing conf file.
Number of parse errors:0
 init.c:68:oh_init: Initialized UID.
 init.c:72:oh_init: Initialized handler table
 init.c:76:oh_init: Initialized domain table
 init.c:80:oh_init: Initialized session table
 config.c:772:oh_process_config: Loaded handler for plugin libipmi
 config.c:772:oh_process_config: Loaded handler for plugin libipmi
 domain.c:460:oh_create_domain: Domain 0 has been created.
 init.c:103:oh_init: Created DEFAULT domain
 threaded.c:169:oh_threaded_start: Starting discovery thread
 threaded.c:176:oh_threaded_start: Starting event threads
 init.c:132:oh_init: Set init state
 threaded.c:73:oh_discovery_thread_loop: Doing threaded discovery on all 
handlers
 ipmi.c:599:ipmi_discover_resources: ipmi discover_resources
 threaded.c:103:oh_evtpop_thread_loop: Thread processing events
INFO: lan 10.10.181.8 0 ipmi_lan.c(connection_up): Connection 0 to the BMC is up
INFO: lan 10.10.181.8 0 ipmi_lan.c(connection_up): Connection to the BMC 
restored
 threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
 event.c:128:oh_harvest_events: harvesting for 1
ipmi_connection.c:87 (IPMI domain Connection success)
atca_shelf_fru.c:892 (Record #0. MId = 0x157)
atca_shelf_fru.c:913 (Record #4 too short. len = 0xa)
atca_shelf_fru.c:371 (dismatch datalen(0xdd) and record struct(0xdd)  desk_num 
= 11)
ipmi_entity_event.c:803 (No res_info(0x5a07c0) for slot 73)
WARN: lan 10.10.181.8(7.1) entity.c(ipmi_entity_scan_sdrs): Entity has two 
different MCs in different SDRs, only using the first for presence.  MCs are 
lan 10.10.181.8(0.42)  and lan 10.10.181.8(0.62)
WARN: lan 10.10.181.8(7.1) entity.c(ipmi_entity_scan_sdrs): Entity has two 
different MCs in different SDRs, only using the first for presence.  MCs are 
lan 10.10.181.8(0.42)  and lan 10.10.181.8(0.64)
SEVR: lan 10.10.181.8(7.1) oem_atca.c(atca_entity_update_handler): Entity 
mismatch on fru 0, old entity was lan 10.10.181.8(r0.100.10.0)
 openhpid.cpp:283:main: openhpid started.

 openhpid.cpp:284:main: OPENHPI_CONF = ./openhpi.conf.ipmi

 openhpid.cpp:285:main: OPENHPI_DAEMON_PORT = 4743


SEVR: lan 10.10.181.8(0.20) oem_atca.c(atca_handle_new_mc): Could not find IPMC 
info
 event.c:394:oh_process_events: Event Type = HOTSWAP
 event.c:326:process_event: Processing event for domain 0
 event.c:212:process_hpi_event: Added event to EL
 event.c:222:process_hpi_event: Got session list for domain 0

.......................
<Discovery of the FRUs of the rightly configured ATCA is done >

..........................



ipmi_connection.c:84 (Failed to connect to IPMI domain. err = 0x16)
ipmi_connection.c:91 (All IPMI connections down
)
IPMI connection is down
 threaded.c:84:oh_discovery_thread_loop: Going to sleep
 event.c:113:harvest_events_for_handler: Handler is out of Events
 threaded.c:137:oh_evtget_thread_loop: Going to sleep
 threaded.c:141:oh_evtget_thread_loop: TIMEDOUT: Woke up, am looping again
 threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
 event.c:128:oh_harvest_events: harvesting for 1
 event.c:113:harvest_events_for_handler: Handler is out of Events
 event.c:128:oh_harvest_events: harvesting for 2
ipmi_connection.c:84 (Failed to connect to IPMI domain. err = 0x16)
ipmi_connection.c:91 (All IPMI connections down
)
 event.c:113:harvest_events_for_handler: Handler is out of Events
 threaded.c:137:oh_evtget_thread_loop: Going to sleep
 threaded.c:141:oh_evtget_thread_loop: TIMEDOUT: Woke up, am looping again
 threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
 event.c:128:oh_harvest_events: harvesting for 1

<Failure of connection to the non existing machine does not result in the 
failure of discovery>

====================================================================================

hpitop returns no error however many times it is run.

I have also retested this scenario with ipmidirect plugin and there are no 
issues seen.

The whole scenario is also tested with Openhpi 2.11.1 code and still there were 
no issues seen.

Hence it indicates that there is no error in the OpenHPI framework at least, 
plugin libipmi and libipmidirect are also working  fine.

I probably feel there is a issue in the openIPMI library version which have 
installed, but this just my guess.

Please let me know if I have missed anything in recreating the error scenario 
which you encountered.

Thanks.

Regards,
Raghavendra M.S.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Openhpi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openhpi-devel

Reply via email to