I'm trying to deploy an OpenNebula installation on a cluster over the OrangeFS 
file system. I've successfully setup this cluster in the past using local 
storage but now I'm testing performance over distributed storage. I've 
configured the installation to use the parallel storage along with a mysql 
database hosted locally on the head node, but now I'm seeing a couple of errors.

Most of the time I get this:

/srv/cloud/one/bin/one: line 172: /srv/cloud/one/var/sched.pid: Input/output 
error
oned failed to start
/srv/cloud/one/bin/one: line 84: 28706 Terminated              $ONED -f 2>&1

there are not further errors in dmesg, oned.log or messages. Other times oned 
will start but then sched fails with this error:

/srv/cloud/one/bin/one: line 172: /srv/cloud/one/var/sched.pid: Input/output 
error
/srv/cloud/one/bin/one: line 112: 29006 Segmentation fault      (core dumped) 
$ONE_SCHEDULER -p $PORT -t 30 -m 300 -d 30 -h 1
cat: /srv/cloud/one/var/oned.pid: Input/output error

again, nothing in dmesg or messages but sched.log reports:

Sun Jan 29 11:40:27 2012 [POOL][E]: Could not retrieve pool info from ONE
Sun Jan 29 11:40:32 2012 [HOST][E]: Exception raised: Unable to transport XML 
to server and get XML response back.  libcurl failed to execute the HTTP POST 
transaction.  couldn't connect to host
Sun Jan 29 11:40:32 2012 [POOL][E]: Could not retrieve pool info from ONE

There are not errors in the OrangeFS logs and performance seems good so I'm 
assuming the file system is working. I've never seen this error in any of my 
other OpenNebula clusters using local storage.

Any ideas? I'm sure I'm forgetting some helpful details but any thoughts would 
be greatly appreciated.

Thanks,
Nick
_______________________________________________
Users mailing list
[email protected]
http://lists.opennebula.org/listinfo.cgi/users-opennebula.org

Reply via email to