Hello Zhang Huan,
I was finally able to reproduce the bug in my i386 server installed with
RHEL5.1.
As identified by you, this bug is reproducible only when the openhpid process
is run in the background
(done by invoking the openhpid using service utility, i.e. "service openhpid
start" or
by running the openhpid in background directly i.e. "openhpid -c
/etc/openhpi/openhpi.conf")
Testing and findings on openhpi.2.10.2:
=============================
When hpitop clients is executed, every alternate execution of the client
program reports error:
The error printed was:
Wrong version? 0x50 != 0x1
(This error was printed by ReadMsg method of strmsock object defined in
marshal/strmsock.cpp file)
The configuration file used for this testing was having just one libipmi
handler pointing to a non-existent machine, extract of the same is attached
below:
handler libipmi { // This is the handler to non existing machine
entity_root = "{SYSTEM_CHASSIS,3}"
name = "lan"
addr = "10.10.181.9" #ipaddress
port = "623"
auth_type = "md5" # none, md2, md5 or straight
auth_level = "admin" # operator or admin
username = "root"
password = "password"
}
When the openhpid is started in background, ipmi connection error is reported
and the openhpid continuously tries to discover with no success.
When the hpitop client is started, there is stream socket connection
established between openhpid process and hpitop client program.
Since openhpid is running in the background, File handlers like stderr, stdout
and stdin are not attached to any TTY.
But for reporting the ipmi connection error, the IPMI plugin tries to write
into stderr using fprintf (plugins/ipmi/ipmi.c, line no 604), this error
message gets into data stream being read by client:
Extract of ipmi.c:
602 while (ipmi_handler->fully_up == 0) {
603 if (!ipmi_handler->connected) {
604 fprintf(stderr, "IPMI connection is down\n");
605 return SA_ERR_HPI_NO_RESPONSE;
606 }
(gdb) print data
$12 = 0xbfc14d2b "IPMI connection is down\n\001\021"
(gdb) print *data
$13 = 73 'I'
Provided above is the corrupted message content received at hpitop client's
side (Captured in gdb).
Due to this problem the ReadMsg reports wrong version error.
I commented out line no 604 in ipmi.c containing this fprintf to stderr and
rebuilt the entire openhpi process and retested the above scenario, the problem
is not seen any more for repeated execution of hpitop client program
Such usage of fprintf to stderr has been used in ipmi plugin at multiple
places, probably changing these fprintf to "dbg" will solve the problem.
Testing and findings on openhpi.2.11.2:
=============================
The scenario here is much different, the definition of the "dbg" statements are
written to stderr and hence most of the dbg statement gets mixed with the data
being read by hpitop client program.
In 2.10.2, the dbg statements were written to syslog rather than stderr.
Any suggestions on finding the solution for openhpi trunk.
Renier,
Can please suggest if I can take up this bug for fixing, if no one else is
already working on it.
Regards,
MS
-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Friday, May 09, 2008 6:58 AM
To: Sampathkumar, Raghavendra Mysore
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for
non-existent machine)
I was totally confused after some tests.
Show my original openhpid.conf below:
# handler libsimulator {
# entity_root = "{SYSTEM_CHASSIS,101}"
# name = "simulator"
# }
handler libipmi {
<= for existed machine
entity_root = "{SYSTEM_CHASSIS,200}"
name = "lan"
addr = "10.10.37.190"
port = "623"
auth_type = "md5"
auth_level= "admin"
username = "root"
password = "111111"
}
handler libipmi {
<= for not existed machine
entity_root = "{SYSTEM_CHASSIS,201}"
name = "lan"
addr = "10.10.37.191"
port = "623"
auth_type = "md5"
auth_level= "admin"
username = "root"
password = "111111"
}
# handler libipmi {
# entity_root = "{SYSTEM_CHASSIS,300}"
# name = "lan"
# addr = "10.10.37.91"
# port = "623"
# auth_type = "md5"
# auth_level= "admin"
# username = "root"
# password = "cmmrootpass"
# }
# domain 200 {
# entity_pattern = "{SYSTEM_CHASSIS,200}*"
# tag = "normal server" # Optional
# ai_timeout = 0 # Optional. -1 means BLOCK. 0 is the default.
# ai_readonly = 1 # Optional. 1 means yes (default), 0 means no.
# }
# domain 300 {
# entity_pattern = "{SYSTEM_CHASSIS,300}*"
# tag = "normal server2" # Optional
# ai_timeout = 0 # Optional. -1 means BLOCK. 0 is the default.
# ai_readonly = 1 # Optional. 1 means yes (default), 0 means no.
# }
test steps:
I did the following test in my FC8, using conf above:
1. Deleted all the commented lines, then everything is ok, no error report; 2.
Restored conf file. Deleted all commented lines above the normal (uncommented)
lines, everything is ok; 3. Restored conf file. Deleted all commented line
below the normal lines, everything is ok; 4. Restored conf file. then
everything is ok!!! no error reported since.
5. Removed all commented lines and handler for existed machine, just left the
handler for not existed machine. The error returned!
6. repeatd the above steps on CentOS5.1, error remains, no changes.
seems some subtle things may trigger the error.
CentOS is very similar to Redhat AS, CentOS 5.1 is just the re-package of
redhat AS 5.1(they rebuild all packages, since redhat only provide the src.rpm
file)
we don’t use suse, I will test on redhat AS after I found one.
Zhang Huan
-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月8日 14:24
收件人: [EMAIL PROTECTED]
主题: RE: Regarding bug id 1939812 (openhpid doesnt work correctly for
non-existent machine)
Hello,
The Linux distro I'm running the openhpid are:
1. openhpi (2.10.2) is run on i686-redhat-linux-gnu (Red Hat Enterprise Linux
Server release 5.1 (Tikanga)) 2. openhpi (2.11.2) is run on x86_64-suse-linux (
SUSE LINUX 10.1 (X86-64) VERSION = 10.1)
And on both these setups, the problem is not reproduced.
We do not have FC8 setup in our lab, which I guess is a desktop version. We
will try get this version and check the problem
Meanwhile, I would like to suggest you to try to reproduce the problem on RHEL5
or SLES version if you have any machines with this OS flavor.
Thanks.
Regards,
Raghavendra M.S.
-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Wednesday, May 07, 2008 6:56 PM
To: Sampathkumar, Raghavendra Mysore; [email protected]
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for
non-existent machine)
I do following checks/tests:
1. checked /etc/init.d/openhpid, only absolute path of file "openhpid" differs.
My openhpi is installed from rpm, and I run "make rpm" to build it myself.
2. erased openhpi and its subpackages. run "make install" to reinstall it.
openhpi.sh is also copied to /usr/local/etc/init.d/openhpid.
start openhpid service and run hpitop, the problem remains.
3. removed the existed entry from openhpi.conf, and tested again. hpitop
reported the same problem. New finding!
4. redo step 2 and step3 using openhpi 2.11.1(download from website), the same
problem!
5. found another machine running Centos5.1, try its original openhpi(version
2.8.1-2.el5.7). no error reported!
6. erase openhpi and its subpackages, download openhpi 2.11.1 from openhpi
website, make install it. tested again using hpitop, and it reported the same
problem.
what linux distro do you use? the bug is easily reproduced in my machine.
PS: FC8 and CentOS5.1 I used are all 32bit system. FC8 has been updated to the
newest.
I intended to test on FC4, but openhpi failed to make. message below:
cc1: warnings being treated as errors
el2event.c:5581: warning: initialization discards qualifiers from pointer
target type
Zhang Huan
-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月7日 17:02
收件人: [EMAIL PROTECTED]; [email protected]
主题: RE: Regarding bug id 1939812 (openhpid doesnt work correctly for
non-existent machine)
Hello,
I did the following to test the openhpid (run using "service" utility):
1. I copied the openhpid.sh script available in openhpid directory to
/etc/init.d/openhpid
cp <openhpi_root_directory>/openhpid/openhpi.sh /etc/init.d/openhpid
2. The openhpi.conf file picked up by this script is
/usr/local/etc/openhpi/openhpi.conf
Copied the openhpi.conf containing the libipmi handler, one pointing to the
ATCA chassis and another to non existent machine to this directory
cp openhpi.conf.ipmi /usr/local/etc/openhpi/openhpi.conf
3. started the openhpid using service utility
service openhpid start
Starting openhpid: [ OK ]
4. Discovery completes successfully for the ATCA chassis and reports an error
for non existent machine.
openhpid continues to function with no issues
5. "hpitop -x" command is issued multiple times and every time the command runs
fine with no issues.
"wrong version" is not reported even once
6. Verified the above steps with openhpi-2.10.2 and openhpi trunk version 2.11.2
NOTE: I'm still running OpenIPMI-2.0.10 version
I would suggest you to please try the "openhpid.sh" script for starting
openhpid using service utility and check whether problem is still reproducible.
I shall in parallel update the bug with my findings.
Regards,
Raghavendra M.S.
-----Original Message-----
From: zhanghuan [mailto:[EMAIL PROTECTED]
Sent: Tuesday, May 06, 2008 12:33 PM
To: Sampathkumar, Raghavendra Mysore; [email protected]
Subject: 答复: Regarding bug id 1939812 (openhpid doesnt work correctly for
non-existent machine)
in my previous test, openhpid is run as a daemon with "service openhpid start".
however if I run openhpid directly with "openhpid -c xxx.conf" or "openhpid -c
xxx.conf -n", everything seems OK. seems that some environment variables
trigger the bug?!
I did all the tests in FC8, using OpenIPMI-2.0.11-3.fc8.
Zhang Huan
-----邮件原件-----
发件人: Sampathkumar, Raghavendra Mysore [mailto:[EMAIL PROTECTED]
发送时间: 2008年5月6日 13:39
收件人: [EMAIL PROTECTED]; [email protected]
主题: Regarding bug id 1939812 (openhpid doesnt work correctly for non-existent
machine)
Hello zhanghan,
This is regarding the bug id 1939812 in source Bugzilla.
Bug Description: openhpid doesn't work correctly for non-existent machine
I retested the scenario which you have mentioned in the bug, following are
findings:
OpenIPMI-2.0.10 library has been installed to enable the libipmi plugin during
the openhpi compilation.
Two libipmi handler are created in openhpi.conf, one pointing to the ATCA
chassis and another pointing to non existing machine.
Extract of the openhpi.conf file:
handler libipmi { // This is the right handler to ATCA
entity_root = "{SYSTEM_CHASSIS,3}"
name = "lan"
addr = "10.10.181.8" #ipaddress
port = "623"
auth_type = "md5" # none, md2, md5 or straight
auth_level = "admin" # operator or admin
username = "root"
password = "password"
}
handler libipmi { // This is the handler to non existing machine
entity_root = "{SYSTEM_CHASSIS,3}"
name = "lan"
addr = "10.10.181.9" #ipaddress
port = "623"
auth_type = "md5" # none, md2, md5 or straight
auth_level = "admin" # operator or admin
username = "root"
password = "password"
}
Captured below is the console dump when openhpid(openhpi.2.10.2) is started
with this configuration:
====================================================================================
#openhpid -c ./openhpi.conf.ipmi -n
threaded.c:153:oh_threaded_init: Attempting to init event
threaded.c:158:oh_threaded_init: Already supporting threads
event.c:47:oh_event_init: Setting up event processing queue
event.c:50:oh_event_init: Set up processing queue
plugin.c:311:oh_load_plugin: Plugin libipmi already loaded. Not loading twice.
config.c:737:oh_load_config: Done processing conf file.
Number of parse errors:0
init.c:68:oh_init: Initialized UID.
init.c:72:oh_init: Initialized handler table
init.c:76:oh_init: Initialized domain table
init.c:80:oh_init: Initialized session table
config.c:772:oh_process_config: Loaded handler for plugin libipmi
config.c:772:oh_process_config: Loaded handler for plugin libipmi
domain.c:460:oh_create_domain: Domain 0 has been created.
init.c:103:oh_init: Created DEFAULT domain
threaded.c:169:oh_threaded_start: Starting discovery thread
threaded.c:176:oh_threaded_start: Starting event threads
init.c:132:oh_init: Set init state
threaded.c:73:oh_discovery_thread_loop: Doing threaded discovery on all
handlers
ipmi.c:599:ipmi_discover_resources: ipmi discover_resources
threaded.c:103:oh_evtpop_thread_loop: Thread processing events
INFO: lan 10.10.181.8 0 ipmi_lan.c(connection_up): Connection 0 to the BMC is up
INFO: lan 10.10.181.8 0 ipmi_lan.c(connection_up): Connection to the BMC
restored
threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
event.c:128:oh_harvest_events: harvesting for 1
ipmi_connection.c:87 (IPMI domain Connection success)
atca_shelf_fru.c:892 (Record #0. MId = 0x157)
atca_shelf_fru.c:913 (Record #4 too short. len = 0xa)
atca_shelf_fru.c:371 (dismatch datalen(0xdd) and record struct(0xdd) desk_num
= 11)
ipmi_entity_event.c:803 (No res_info(0x5a07c0) for slot 73)
WARN: lan 10.10.181.8(7.1) entity.c(ipmi_entity_scan_sdrs): Entity has two
different MCs in different SDRs, only using the first for presence. MCs are
lan 10.10.181.8(0.42) and lan 10.10.181.8(0.62)
WARN: lan 10.10.181.8(7.1) entity.c(ipmi_entity_scan_sdrs): Entity has two
different MCs in different SDRs, only using the first for presence. MCs are
lan 10.10.181.8(0.42) and lan 10.10.181.8(0.64)
SEVR: lan 10.10.181.8(7.1) oem_atca.c(atca_entity_update_handler): Entity
mismatch on fru 0, old entity was lan 10.10.181.8(r0.100.10.0)
openhpid.cpp:283:main: openhpid started.
openhpid.cpp:284:main: OPENHPI_CONF = ./openhpi.conf.ipmi
openhpid.cpp:285:main: OPENHPI_DAEMON_PORT = 4743
SEVR: lan 10.10.181.8(0.20) oem_atca.c(atca_handle_new_mc): Could not find IPMC
info
event.c:394:oh_process_events: Event Type = HOTSWAP
event.c:326:process_event: Processing event for domain 0
event.c:212:process_hpi_event: Added event to EL
event.c:222:process_hpi_event: Got session list for domain 0
.......................
<Discovery of the FRUs of the rightly configured ATCA is done >
..........................
ipmi_connection.c:84 (Failed to connect to IPMI domain. err = 0x16)
ipmi_connection.c:91 (All IPMI connections down
)
IPMI connection is down
threaded.c:84:oh_discovery_thread_loop: Going to sleep
event.c:113:harvest_events_for_handler: Handler is out of Events
threaded.c:137:oh_evtget_thread_loop: Going to sleep
threaded.c:141:oh_evtget_thread_loop: TIMEDOUT: Woke up, am looping again
threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
event.c:128:oh_harvest_events: harvesting for 1
event.c:113:harvest_events_for_handler: Handler is out of Events
event.c:128:oh_harvest_events: harvesting for 2
ipmi_connection.c:84 (Failed to connect to IPMI domain. err = 0x16)
ipmi_connection.c:91 (All IPMI connections down
)
event.c:113:harvest_events_for_handler: Handler is out of Events
threaded.c:137:oh_evtget_thread_loop: Going to sleep
threaded.c:141:oh_evtget_thread_loop: TIMEDOUT: Woke up, am looping again
threaded.c:129:oh_evtget_thread_loop: Thread Harvesting events
event.c:128:oh_harvest_events: harvesting for 1
<Failure of connection to the non existing machine does not result in the
failure of discovery>
====================================================================================
hpitop returns no error however many times it is run.
I have also retested this scenario with ipmidirect plugin and there are no
issues seen.
The whole scenario is also tested with Openhpi 2.11.1 code and still there were
no issues seen.
Hence it indicates that there is no error in the OpenHPI framework at least,
plugin libipmi and libipmidirect are also working fine.
I probably feel there is a issue in the openIPMI library version which have
installed, but this just my guess.
Please let me know if I have missed anything in recreating the error scenario
which you encountered.
Thanks.
Regards,
Raghavendra M.S.
-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Openhpi-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openhpi-devel