Hi all,

        We use for preproduction purpose the next branche of oVirt. We notice 
that a
lot of bugs appears when the number of message in qpidd increase. It seems
that qpidd is doing the job and that most of the issue are due to Qmf::Query .

        For example in db-omatic lines 265,296
    When you restart db-omatic, if you have multiple node, you have mutiple 
threads launch (line 266)  that hang on :
qmf_host = @qmfc.objects(Qmf::Query.new(:class => "node"), 'hostname' => 
host_info['hostname'])

The function never return. But qpidd never stop to answer correctly to the 
request done by ruby-qmf. 

A workarround for us consist to : 
 - stopping all the libvirt-qpid on every node, 
 - restarting db-omatic 
 - starting libvirt-qpid sequentially on every node. 

Doing this way work, and gave to us a concistent db for db-omatic.
What do you thing if we replace the Thread.new on line 266 by a begin ? Because
the concurrency of the requests on qpidd made by db-omatic seems the origin of
the hang. 

<code snipset of db-omatic lines 265,296>
 if state == Host::STATE_AVAILABLE
    Thread.new do
        @logger.info "#{host_info['hostname']} has moved to available, sleeping 
for updates to vms."
        sleep(20)

        # At this point we want to set all domains that are
        # unreachable to stopped.  We're using a thread here to
        # sleep for 10 seconds outside of the main dbomatic loop.
        # If after 10 seconds with this host up there are still
        # domains set to 'unreachable', then we're going to guess
        # the node rebooted and so the domains should be set to
        # stopped.
        @logger.info "Checking for dead VMs on newly available host 
#{host_info['hostname']}."

        # Double check to make sure this host is still up.
        begin
            qmf_host = @qmfc.objects(Qmf::Query.new(:class => "node"), 
'hostname' => host_info['hostname'])
            if !qmf_host
                @logger.info "Host #{host_info['hostname']} is not up after 
waiting 20 seconds, skipping dead VM check."
            else
                db_vm = Vm.find(:all, :conditions => ["host_id = ? AND state = 
?", db_host.id, Vm::STATE_UNREACHABLE])
                db_vm.each do |vm|
                    @logger.info "Moving vm #{vm.description} in state 
#{vm.state} to state stopped."
                    set_vm_stopped(vm)
                    vm.save!
                end
            end
        rescue Exception => e # just log any errors here
            @logger.info "Exception checking for dead VMs (could be normal): 
#{e.message}"
            @logger.info e.backtrace
        end
    end
end
</code>



-- 
Pierre-Gilles Mialon
Responsable hébergement :: Head of Hosting services
[email protected] :: +33.1 58 18 65 46
Linagora :: http://www.linagora.com
27 rue de Berri :: 75008 PARIS

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Ovirt-devel mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/ovirt-devel

Reply via email to