Re: [pve-devel] [PATCH v2 kvm] add -wait-for-cgroup option to deal with systemd-run race
also see: https://www.freedesktop.org/wiki/Software/systemd/writing-vm-managers/ > On June 1, 2016 at 6:22 PM Dietmar Maurerwrote: > > > > --- > > Changes since v2: > > removed leftovers from old patch > > fixed the option documentation (also a leftover from an patch) > > still a bit ugly... > > What about using the SD dbus interface inside kvm? > > https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ > https://cgit.freedesktop.org/systemd/systemd/plain/src/run/run.c > > It is just an idea, and I am not sure if that is too complex to implement? > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v2 kvm] add -wait-for-cgroup option to deal with systemd-run race
> --- > Changes since v2: > removed leftovers from old patch > fixed the option documentation (also a leftover from an patch) still a bit ugly... What about using the SD dbus interface inside kvm? https://www.freedesktop.org/wiki/Software/systemd/ControlGroupInterface/ https://cgit.freedesktop.org/systemd/systemd/plain/src/run/run.c It is just an idea, and I am not sure if that is too complex to implement? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qemu-server: address migrations issues
>>@Alexandre: do you really need the other direction? No. Only old->new. I don't think that user try to migrate from new->old anyway. - Mail original - De: "dietmar"À: "Thomas Lamprecht" , "pve-devel" , "aderumier" Envoyé: Mercredi 1 Juin 2016 16:49:45 Objet: Re: [pve-devel] qemu-server: address migrations issues > > The way that it also works between different installed qemu-server > > versions would need to give the destination node an additional flag that > > we want to use a TCP tunnel. > > > > @Dietmar should that be done? I mean if they update qemu-server it works > > without it also and as a emergency fallback insecure can also be used. > > I think migrate from old to new node should work. @Alexandre: do you really need the other direction? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager 3/4] check if tabs exist before selecting them
- if (res && res[1]) { + if (res && res[1] && Ext.isArray(me.items)) { + me.items.forEach(function(item) { + if (item.itemId === res[1]) { + activeTab = res[1]; + } + }); + } else if (res && res[1] && me.items && me.items.itemId === res[1]) { activeTab = res[1]; } Isn't there a better way (ExtJS function) to lookup items? well at this point in time we are stil inside the initComponent of the configpanel, so the component is not fully instantiated, thus most of the extjs methods like this.getComponent(id) (which i wanted to use) do not work. i can do more research, but i don't believe we have many options here ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qemu-server: address migrations issues
> > The way that it also works between different installed qemu-server > > versions would need to give the destination node an additional flag that > > we want to use a TCP tunnel. > > > > @Dietmar should that be done? I mean if they update qemu-server it works > > without it also and as a emergency fallback insecure can also be used. > > I think migrate from old to new node should work. @Alexandre: do you really need the other direction? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager 3/4] check if tabs exist before selecting them
> - if (res && res[1]) { > + if (res && res[1] && Ext.isArray(me.items)) { > + me.items.forEach(function(item) { > + if (item.itemId === res[1]) { > + activeTab = res[1]; > + } > + }); > + } else if (res && res[1] && me.items && me.items.itemId === > res[1]) { > activeTab = res[1]; > } Isn't there a better way (ExtJS function) to lookup items? ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager 2/4] fix jslint errors
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH manager 1/4] hide snapshot tab with qemu templates
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH manager 3/4] check if tabs exist before selecting them
when switching from a vm to a template, if you have a tab selected which does not exists in a template (for example console or task history) you break the site this patch checks if the wanted tab actually exists, and leave it on default (the first) when it does not Signed-off-by: Dominik Csapak--- www/manager6/panel/ConfigPanel.js | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/www/manager6/panel/ConfigPanel.js b/www/manager6/panel/ConfigPanel.js index 9afdd7d..489eb3b 100644 --- a/www/manager6/panel/ConfigPanel.js +++ b/www/manager6/panel/ConfigPanel.js @@ -23,7 +23,13 @@ Ext.define('PVE.panel.Config', { var state = sp.get(stateid); if (state && state.value) { var res = hsregex.exec(state.value); - if (res && res[1]) { + if (res && res[1] && Ext.isArray(me.items)) { + me.items.forEach(function(item) { + if (item.itemId === res[1]) { + activeTab = res[1]; + } + }); + } else if (res && res[1] && me.items && me.items.itemId === res[1]) { activeTab = res[1]; } } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH manager 4/4] make the graphs on the summary pages use empty space
this patch lets the graphs flow if you have enough horizontal space Signed-off-by: Dominik Csapak--- this looks a bit weird on nodes because of the long status output, but some people requested this and i want to redo the whole status/notes area anyway www/manager6/lxc/Summary.js | 73 +--- www/manager6/node/Summary.js | 63 ++ www/manager6/qemu/Summary.js | 73 +--- 3 files changed, 120 insertions(+), 89 deletions(-) diff --git a/www/manager6/lxc/Summary.js b/www/manager6/lxc/Summary.js index 3f13ee5..5ab14cd 100644 --- a/www/manager6/lxc/Summary.js +++ b/www/manager6/lxc/Summary.js @@ -6,8 +6,7 @@ Ext.define('PVE.lxc.Summary', { scrollable: true, bodyStyle: 'padding:10px', defaults: { - style: {'padding-top':'10px'}, - width: 800 + style: {'padding-top':'10px'} }, initComponent: function() { @@ -54,7 +53,8 @@ Ext.define('PVE.lxc.Summary', { ptype: 'lazyitems', items: [ { - style: {'padding-top': '0px' }, + width: 800, + padding: '0 5 0 0', layout: { type: 'hbox', align: 'stretchmax' @@ -63,34 +63,45 @@ Ext.define('PVE.lxc.Summary', { items: [ statusview, notesview ] }, { - xtype: 'pveRRDChart', - title: gettext('CPU usage'), - pveSelNode: me.pveSelNode, - fields: ['cpu'], - fieldTitles: [gettext('CPU usage')], - store: rrdstore - }, - { - xtype: 'pveRRDChart', - title: gettext('Memory usage'), - pveSelNode: me.pveSelNode, - fields: ['maxmem', 'mem'], - fieldTitles: [gettext('Total'), gettext('RAM usage')], - store: rrdstore - }, - { - xtype: 'pveRRDChart', - title: gettext('Network traffic'), - pveSelNode: me.pveSelNode, - fields: ['netin','netout'], - store: rrdstore - }, - { - xtype: 'pveRRDChart', - title: gettext('Disk IO'), - pveSelNode: me.pveSelNode, - fields: ['diskread','diskwrite'], - store: rrdstore + xtype: 'container', + layout: { + type: 'column' + }, + defaults: { + padding: '0 5 10 0' + }, + items: [ + { + xtype: 'pveRRDChart', + title: gettext('CPU usage'), + pveSelNode: me.pveSelNode, + fields: ['cpu'], + fieldTitles: [gettext('CPU usage')], + store: rrdstore + }, + { + xtype: 'pveRRDChart', + title: gettext('Memory usage'), + pveSelNode: me.pveSelNode, + fields: ['maxmem', 'mem'], + fieldTitles: [gettext('Total'), gettext('RAM usage')], + store: rrdstore + }, + { + xtype: 'pveRRDChart', + title: gettext('Network traffic'), + pveSelNode: me.pveSelNode, + fields: ['netin','netout'], + store: rrdstore + }, + { + xtype: 'pveRRDChart', + title: gettext('Disk IO'), + pveSelNode: me.pveSelNode, + fields: ['diskread','diskwrite'], + store: rrdstore + } + ] } ] }, diff --git a/www/manager6/node/Summary.js b/www/manager6/node/Summary.js index 0c0483d..8128ec2 100644 --- a/www/manager6/node/Summary.js +++ b/www/manager6/node/Summary.js @@ -5,7 +5,6 @@ Ext.define('PVE.node.Summary', { scrollable: true, bodyStyle: 'padding:10px', defaults: { - width:
[pve-devel] [PATCH manager 1/4] hide snapshot tab with qemu templates
since we cannot create templates with existing snapshots, and we cannot take snapshots of templates, showing the tab on templates makes no sense Signed-off-by: Dominik Csapak--- www/manager6/qemu/Config.js | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/www/manager6/qemu/Config.js b/www/manager6/qemu/Config.js index 8b764b0..25f91ff 100644 --- a/www/manager6/qemu/Config.js +++ b/www/manager6/qemu/Config.js @@ -175,7 +175,7 @@ Ext.define('PVE.qemu.Config', { }); } - if (caps.vms['VM.Snapshot']) { + if (caps.vms['VM.Snapshot'] && !template) { me.items.push({ title: gettext('Snapshots'), xtype: 'pveQemuSnapshotTree', -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH manager 2/4] fix jslint errors
jslint does not like mixing statements and function calls Signed-off-by: Dominik Csapak--- www/manager6/panel/InputPanel.js | 10 +++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/www/manager6/panel/InputPanel.js b/www/manager6/panel/InputPanel.js index 9339eb4..e5afd9a 100644 --- a/www/manager6/panel/InputPanel.js +++ b/www/manager6/panel/InputPanel.js @@ -4,11 +4,15 @@ Ext.define('PVE.panel.InputPanel', { listeners: { activate: function() { // notify owning container that it should display a help button - this.onlineHelp && Ext.GlobalEvents.fireEvent('pveShowHelp', this.onlineHelp); + if (this.onlineHelp) { + Ext.GlobalEvents.fireEvent('pveShowHelp', this.onlineHelp); + } }, deactivate: function() { - this.onlineHelp && Ext.GlobalEvents.fireEvent('pveHideHelp', this.onlineHelp); - }, + if (this.onlineHelp) { + Ext.GlobalEvents.fireEvent('pveHideHelp', this.onlineHelp); + } + } }, border: false, -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qemu-server: address migrations issues
On 06/01/2016 03:37 PM, Alexandre DERUMIER wrote: >>> We could actually support the old one for some time, even if it is broken >>> (not our fault) as we see if we should open a tunnel or not, i.e. if the >>> raddr is >>> tcp and localhost then open a tcp tunnel like we did, if its tcp and not >>> localhost its >>> an unsecure and if its unix the do like in this patch. >>> >>> comments? > yes, I think we should keep support for old migration, or I think users will > complain ;) In fact it works on from not updated node to new, you just have to install (update) qemu-server with this patch on both then it works for and back on already running VMs. Its important to install it on all though. The way that it also works between different installed qemu-server versions would need to give the destination node an additional flag that we want to use a TCP tunnel. @Dietmar should that be done? I mean if they update qemu-server it works without it also and as a emergency fallback insecure can also be used. > > > - Mail original - > De: "Thomas Lamprecht"> À: "pve-devel" > Cc: "aderumier" > Envoyé: Mercredi 1 Juin 2016 15:03:05 > Objet: Re: [pve-devel] qemu-server: address migrations issues > > Hi, > > On 06/01/2016 02:50 PM, Alexandre DERUMIER wrote: >> Hi, >> >> I don't have read the whole patches, >> >> but does it break migration from oldserver to newserver (with new code) ? > Not with unsecure migrations. But yes with secure ones this would be the > case, they would not crash but simply not work as the tcp tunnel is not > made. > > We could actually support the old one for some time, even if it is broken > (not our fault) as we see if we should open a tunnel or not, i.e. if the > raddr is > tcp and localhost then open a tcp tunnel like we did, if its tcp and not > localhost its > an unsecure and if its unix the do like in this patch. > > comments? > > >> >> - Mail original - >> De: "Thomas Lamprecht" >> À: "pve-devel" >> Envoyé: Mercredi 1 Juin 2016 12:09:38 >> Objet: [pve-devel] qemu-server: address migrations issues >> >> Changes since v2: >> - openssh is able to use UNIX socket forwards since 5.7 (and jessie has 5.7) >> so use that instead of socat. >> - split the part of child collection apart as it does not have antything to >> do >> with the tunnel problem >> > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qemu-server: address migrations issues
>>We could actually support the old one for some time, even if it is broken >>(not our fault) as we see if we should open a tunnel or not, i.e. if the >>raddr is >>tcp and localhost then open a tcp tunnel like we did, if its tcp and not >>localhost its >>an unsecure and if its unix the do like in this patch. >> >>comments? yes, I think we should keep support for old migration, or I think users will complain ;) - Mail original - De: "Thomas Lamprecht"À: "pve-devel" Cc: "aderumier" Envoyé: Mercredi 1 Juin 2016 15:03:05 Objet: Re: [pve-devel] qemu-server: address migrations issues Hi, On 06/01/2016 02:50 PM, Alexandre DERUMIER wrote: > Hi, > > I don't have read the whole patches, > > but does it break migration from oldserver to newserver (with new code) ? Not with unsecure migrations. But yes with secure ones this would be the case, they would not crash but simply not work as the tcp tunnel is not made. We could actually support the old one for some time, even if it is broken (not our fault) as we see if we should open a tunnel or not, i.e. if the raddr is tcp and localhost then open a tcp tunnel like we did, if its tcp and not localhost its an unsecure and if its unix the do like in this patch. comments? > > > - Mail original - > De: "Thomas Lamprecht" > À: "pve-devel" > Envoyé: Mercredi 1 Juin 2016 12:09:38 > Objet: [pve-devel] qemu-server: address migrations issues > > Changes since v2: > - openssh is able to use UNIX socket forwards since 5.7 (and jessie has 5.7) > so use that instead of socat. > - split the part of child collection apart as it does not have antything to > do > with the tunnel problem > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] qemu-server: address migrations issues
Hi, On 06/01/2016 02:50 PM, Alexandre DERUMIER wrote: > Hi, > > I don't have read the whole patches, > > but does it break migration from oldserver to newserver (with new code) ? Not with unsecure migrations. But yes with secure ones this would be the case, they would not crash but simply not work as the tcp tunnel is not made. We could actually support the old one for some time, even if it is broken (not our fault) as we see if we should open a tunnel or not, i.e. if the raddr is tcp and localhost then open a tcp tunnel like we did, if its tcp and not localhost its an unsecure and if its unix the do like in this patch. comments? > > > - Mail original - > De: "Thomas Lamprecht"> À: "pve-devel" > Envoyé: Mercredi 1 Juin 2016 12:09:38 > Objet: [pve-devel] qemu-server: address migrations issues > > Changes since v2: > - openssh is able to use UNIX socket forwards since 5.7 (and jessie has 5.7) > so use that instead of socat. > - split the part of child collection apart as it does not have antything to > do > with the tunnel problem > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH ha-manager 6/6] relocate policy: do not try failed nodes again if possible
If the failure policy triggers more often than 2 times we used an already tried node again, even if there where other untried nodes. This does not make real sense as when it failed to start on a node a short time ago it probably will also fail now (e.g. storage is offline), whereas an untried node may have the chance to be fully able to start the service. Fix that by excluding those already tried nodes from the top priority node list in 'select_service_node' if there are other possible nodes to try. If there isn't any left we delete the last tried one, so that the service tries another one (we want to fulfill the relocation policy after all) our default relocation try is set to 1 so that should only happen if a user set it explicitly to an high value. We bound that to try_next as we only can trigger it if try_next is true, also select_service_node gets called in two places: * next_state_started: there we want to use this behaviour * recover_fenced service: there the try_next is always false as we just want to select a node to recover, if a relocation policy is then needed it is the duty for next_state_started to do so. So we are safe to do so. If all tries fail we place the service in the error state. Signed-off-by: Thomas Lamprecht--- changes since v1: * addressed what happens if all nodes available where tried. * added two more tests src/PVE/HA/Manager.pm | 25 +++- src/test/test-relocate-policy-default-group/README | 7 +++ .../test-relocate-policy-default-group/cmdlist | 4 ++ .../hardware_status| 5 ++ .../test-relocate-policy-default-group/log.expect | 53 + .../manager_status | 1 + .../service_config | 3 + src/test/test-relocate-policy1/README | 4 ++ src/test/test-relocate-policy1/cmdlist | 4 ++ src/test/test-relocate-policy1/hardware_status | 5 ++ src/test/test-relocate-policy1/log.expect | 68 ++ src/test/test-relocate-policy1/manager_status | 42 + src/test/test-relocate-policy1/service_config | 9 +++ src/test/test-resource-failure6/log.expect | 59 +++ 14 files changed, 286 insertions(+), 3 deletions(-) create mode 100644 src/test/test-relocate-policy-default-group/README create mode 100644 src/test/test-relocate-policy-default-group/cmdlist create mode 100644 src/test/test-relocate-policy-default-group/hardware_status create mode 100644 src/test/test-relocate-policy-default-group/log.expect create mode 100644 src/test/test-relocate-policy-default-group/manager_status create mode 100644 src/test/test-relocate-policy-default-group/service_config create mode 100644 src/test/test-relocate-policy1/README create mode 100644 src/test/test-relocate-policy1/cmdlist create mode 100644 src/test/test-relocate-policy1/hardware_status create mode 100644 src/test/test-relocate-policy1/log.expect create mode 100644 src/test/test-relocate-policy1/manager_status create mode 100644 src/test/test-relocate-policy1/service_config create mode 100644 src/test/test-resource-failure6/log.expect diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 6e30c39..888ca69 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -52,7 +52,7 @@ sub flush_master_status { } sub select_service_node { -my ($groups, $online_node_usage, $service_conf, $current_node, $try_next) = @_; +my ($groups, $online_node_usage, $service_conf, $current_node, $try_next, $tried_nodes) = @_; my $group = {}; # add all online nodes to default group to allow try_next when no group set @@ -100,6 +100,25 @@ sub select_service_node { my $top_pri = $pri_list[0]; +# do not try nodes where the service failed already +if ($try_next) { + # first check if we have any untried node left + my $i = 0; + foreach my $node (@$tried_nodes) { + $i++ if $pri_groups->{$top_pri}->{$node}; + } + + if ($i < scalar(keys %{$pri_groups->{$top_pri}})){ + # we have another one left so delete the tried ones + foreach my $node (@$tried_nodes) { + delete $pri_groups->{$top_pri}->{$node}; + } + } else { + # else just delete the current we want to try another one after all + delete $pri_groups->{$top_pri}->{$current_node}; + } +} + my @nodes = sort { $online_node_usage->{$a} <=> $online_node_usage->{$b} || $a cmp $b } keys %{$pri_groups->{$top_pri}}; @@ -634,8 +653,8 @@ sub next_state_started { } } - my $node = select_service_node($self->{groups}, $self->{online_node_usage}, - $cd, $sd->{node}, $try_next); + my $node = select_service_node($self->{groups},
[pve-devel] [PATCH ha-manager v2 5/6] Manager: record tried node on relocation policy
Instead of simply counting up an integer on each failed relocation trial record the already tried nodes. We still have the try count through the size of the array, so no information lost and no behavioural change. Use this for now to log on which nodes we failed to recover, may be useful for an user to see that those node fails, so that he can investigate for which reason and fix those. Further this prepares us for a more intelligent recovery node selection, as we can skip already tried nodes from the current recovery cycle. With the reuse of the relocate_trials to relocate_tried_nodes this can happen without any overhead (i.e. additional hash) in the manager status. Signed-off-by: Thomas Lamprecht--- no changes since v1 src/PVE/HA/Manager.pm | 27 +-- src/test/test-resource-failure2/log.expect | 1 + src/test/test-resource-failure5/log.expect | 2 +- 3 files changed, 19 insertions(+), 11 deletions(-) diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 1208720..6e30c39 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -349,8 +349,8 @@ sub manage { } # remove stale relocation try entries -foreach my $sid (keys %{$ms->{relocate_trial}}) { - delete $ms->{relocate_trial}->{$sid} if !$ss->{$sid}; +foreach my $sid (keys %{$ms->{relocate_tried_nodes}}) { + delete $ms->{relocate_tried_nodes}->{$sid} if !$ss->{$sid}; } $self->update_crm_commands(); @@ -589,31 +589,38 @@ sub next_state_started { } else { my $try_next = 0; + my $tried_nodes = $master_status->{relocate_tried_nodes}->{$sid} || []; if ($lrm_res) { + # add current service node to failed list + push @$tried_nodes, $sd->{node}; + my $ec = $lrm_res->{exit_code}; if ($ec == SUCCESS) { - $master_status->{relocate_trial}->{$sid} = 0; + if (scalar(@$tried_nodes) > 1) { + $haenv->log('info', "relocation policy successful for '$sid'," . + " tried nodes: " . join(', ', @$tried_nodes) ); + } + + delete $master_status->{relocate_tried_nodes}->{$sid}; } elsif ($ec == ERROR) { # apply our relocate policy if we got ERROR from the LRM - my $try = $master_status->{relocate_trial}->{$sid} || 0; - - if ($try < $cd->{max_relocate}) { + if (scalar(@$tried_nodes) <= $cd->{max_relocate}) { - $try++; # tell select_service_node to relocate if possible $try_next = 1; + $master_status->{relocate_tried_nodes}->{$sid} = $tried_nodes; $haenv->log('warning', "starting service $sid on node". " '$sd->{node}' failed, relocating service."); - $master_status->{relocate_trial}->{$sid} = $try; } else { - $haenv->log('err', "recovery policy for service". - " $sid failed, entering error state!"); + $haenv->log('err', "recovery policy for service $sid " . + "failed, entering error state. Tried nodes: ". + join(', ', @$tried_nodes)); &$change_service_state($self, $sid, 'error'); return; diff --git a/src/test/test-resource-failure2/log.expect b/src/test/test-resource-failure2/log.expect index 604ad95..aa34e35 100644 --- a/src/test/test-resource-failure2/log.expect +++ b/src/test/test-resource-failure2/log.expect @@ -41,4 +41,5 @@ info201node1/lrm: got lock 'ha_agent_node1_lock' info201node1/lrm: status change wait_for_agent_lock => active info201node1/lrm: starting service fa:130 info201node1/lrm: service status fa:130 started +info220node1/crm: relocation policy successful for 'fa:130', tried nodes: node2, node1 info720 hardware: exit simulation - done diff --git a/src/test/test-resource-failure5/log.expect b/src/test/test-resource-failure5/log.expect index eb87f9f..a15603e 100644 --- a/src/test/test-resource-failure5/log.expect +++ b/src/test/test-resource-failure5/log.expect @@ -28,7 +28,7 @@ warn123node2/lrm: restart policy: retry number 1 for service 'fa:130' info143node2/lrm: starting service fa:130 warn143node2/lrm: unable to start service fa:130 err 143node2/lrm: unable to start service fa:130 on local node after 1 retries -err 160node1/crm: recovery policy for service fa:130 failed, entering error state! +err 160node1/crm: recovery policy for service fa:130 failed, entering error state. Tried nodes: node2
[pve-devel] [PATCH ha-manager 4/6] send email on fence failure and success
Fencing is something which should not happen often in the real world and has most time a really bad cause, thus send a email when starting to fence a node and on success to root@localhost to inform the cluster admin of said failures so he can check the hardware and cluster status as soon as possible. Signed-off-by: Thomas Lamprecht--- src/PVE/HA/NodeStatus.pm | 39 +- src/test/test-basic1/log.expect| 2 ++ src/test/test-basic2/log.expect| 1 + src/test/test-basic5/log.expect| 2 ++ src/test/test-shutdown1/log.expect | 2 ++ src/test/test-shutdown2/log.expect | 2 ++ src/test/test-shutdown3/log.expect | 2 ++ src/test/test-shutdown4/log.expect | 2 ++ 8 files changed, 51 insertions(+), 1 deletion(-) diff --git a/src/PVE/HA/NodeStatus.pm b/src/PVE/HA/NodeStatus.pm index d9ef912..632dbd4 100644 --- a/src/PVE/HA/NodeStatus.pm +++ b/src/PVE/HA/NodeStatus.pm @@ -3,6 +3,7 @@ package PVE::HA::NodeStatus; use strict; use warnings; +use JSON; use Data::Dumper; my $fence_delay = 60; @@ -169,6 +170,38 @@ sub update { } } +# assembles a commont text for fence emails +my $send_fence_state_email = sub { +my ($self, $subject_prefix, $subject, $node) = @_; + +my $haenv = $self->{haenv}; + +my $mail_text = $status, node_status => $self->{status} }; + +$mail_text .= to_json($data, { pretty => 1, canonical => 1}); + +$haenv->sendmail($mail_subject, $mail_text); +}; + + # start fencing sub fence_node { my ($self, $node) = @_; @@ -179,13 +212,17 @@ sub fence_node { if ($state ne 'fence') { &$set_node_state($self, $node, 'fence'); + my $msg = "Try to fence node '$node'"; + &$send_fence_state_email($self, 'FENCE', $msg, $node); } my $success = $haenv->get_ha_agent_lock($node); if ($success) { - $haenv->log("info", "fencing: acknowleged - got agent lock for node '$node'"); + my $msg = "fencing: acknowleged - got agent lock for node '$node'"; + $haenv->log("info", $msg); &$set_node_state($self, $node, 'unknown'); + &$send_fence_state_email($self, 'SUCEED', $msg, $node); } return $success; diff --git a/src/test/test-basic1/log.expect b/src/test/test-basic1/log.expect index 68df71b..250b918 100644 --- a/src/test/test-basic1/log.expect +++ b/src/test/test-basic1/log.expect @@ -36,6 +36,7 @@ info124node3/crm: status change slave => wait_for_quorum info125node3/lrm: status change active => lost_agent_lock info160node1/crm: service 'vm:103': state changed from 'started' to 'fence' info160node1/crm: node 'node3': state changed from 'unknown' => 'fence' +emai160node1/crm: FENCE: Try to fence node 'node3' info166 watchdog: execute power node3 off info165node3/crm: killed by poweroff info166node3/lrm: killed by poweroff @@ -43,6 +44,7 @@ info166 hardware: server 'node3' stopped by poweroff (watchdog) info240node1/crm: got lock 'ha_agent_node3_lock' info240node1/crm: fencing: acknowleged - got agent lock for node 'node3' info240node1/crm: node 'node3': state changed from 'fence' => 'unknown' +emai240node1/crm: SUCEED: fencing: acknowleged - got agent lock for node 'node3' info240node1/crm: recover service 'vm:103' from fenced node 'node3' to node 'node2' info240node1/crm: service 'vm:103': state changed from 'fence' to 'started' (node = node2) info243node2/lrm: starting service vm:103 diff --git a/src/test/test-basic2/log.expect b/src/test/test-basic2/log.expect index 72822ce..f20d09c 100644 --- a/src/test/test-basic2/log.expect +++ b/src/test/test-basic2/log.expect @@ -11,6 +11,7 @@ info 22node3/crm: node 'node2': state changed from 'online' => 'unknown' info 22node3/crm: got lock 'ha_agent_node1_lock' info 22node3/crm: fencing: acknowleged - got agent lock for node 'node1' info 22node3/crm: node 'node1': state changed from 'fence' => 'unknown' +emai 22node3/crm: SUCEED: fencing: acknowleged - got agent lock for node 'node1' info 22node3/crm: recover service 'vm:101' from fenced node 'node1' to node 'node3' info 22node3/crm: service 'vm:101': state changed from 'fence' to 'started' (node = node3) info 23node3/lrm: got lock 'ha_agent_node3_lock' diff --git a/src/test/test-basic5/log.expect b/src/test/test-basic5/log.expect index 54b579c..8131797 100644 --- a/src/test/test-basic5/log.expect +++ b/src/test/test-basic5/log.expect @@ -43,9 +43,11 @@ info222node3/crm: status change slave => master info222node3/crm: node 'node1': state changed from 'online' => 'unknown' info282node3/crm: service 'vm:101': state changed from 'started' to 'fence' info282node3/crm: node 'node1': state changed from 'unknown' =>
[pve-devel] [PATCH ha-manager 2/6] Sim/Hardware: check if testdir exists
avoids stranges error like "could not open dir/group.tmp.PID" Signed-off-by: Thomas Lamprecht--- src/PVE/HA/Sim/Hardware.pm | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/PVE/HA/Sim/Hardware.pm b/src/PVE/HA/Sim/Hardware.pm index a212671..be1037d 100644 --- a/src/PVE/HA/Sim/Hardware.pm +++ b/src/PVE/HA/Sim/Hardware.pm @@ -285,6 +285,9 @@ sub new { die "missing testdir" if !$testdir; +die "testdir '$testdir' does not exist or is not a directory!\n" +if !-d $testdir; + my $class = ref($this) || $this; my $self = bless {}, $class; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH ha-manager 3/6] Env: add sendmail
Signed-off-by: Thomas Lamprecht--- src/PVE/HA/Env.pm | 6 ++ src/PVE/HA/Env/PVE2.pm | 9 + src/PVE/HA/Sim/Env.pm | 7 +++ 3 files changed, 22 insertions(+) diff --git a/src/PVE/HA/Env.pm b/src/PVE/HA/Env.pm index c7537b1..55f6684 100644 --- a/src/PVE/HA/Env.pm +++ b/src/PVE/HA/Env.pm @@ -132,6 +132,12 @@ sub log { return $self->{plug}->log($level, @args); } +sub sendmail { +my ($self, $subject, $text) = @_; + +return $self->{plug}->sendmail($subject, $text); +} + # acquire a cluster wide manager lock sub get_ha_manager_lock { my ($self) = @_; diff --git a/src/PVE/HA/Env/PVE2.pm b/src/PVE/HA/Env/PVE2.pm index 37823ee..ef6485d 100644 --- a/src/PVE/HA/Env/PVE2.pm +++ b/src/PVE/HA/Env/PVE2.pm @@ -226,6 +226,15 @@ sub log { syslog($level, $msg); } +sub sendmail { +my ($self, $subject, $text) = @_; + +my $mailfrom = 'root@' . $self->nodename(); +my $mailto = 'root@localhost'; + +PVE::Tools::sendmail($mailto, $subject, $text, undef, $mailfrom); +} + my $last_lock_status = {}; sub get_pve_lock { diff --git a/src/PVE/HA/Sim/Env.pm b/src/PVE/HA/Sim/Env.pm index 3715eff..4bb747a 100644 --- a/src/PVE/HA/Sim/Env.pm +++ b/src/PVE/HA/Sim/Env.pm @@ -224,6 +224,13 @@ sub log { printf("%-5s %5d %12s: $msg\n", $level, $time, "$self->{nodename}/$self->{log_id}"); } +sub sendmail { +my ($self, $subject, $text) = @_; + +# only log subject, not spam the regression logs +$self->log('email', $subject); +} + sub get_time { my ($self) = @_; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH ha-manager 1/6] Manager, LRM: sort service keys for deterministic tests
Else we the regression test produce a indeterministic output. As the hashs would else be traversed in random order it makes no real difference for the PVE2 environment, so just sort keys when we add them to the cluster or spawn resource agent workers to avoid that problem. Signed-off-by: Thomas Lamprecht--- src/PVE/HA/LRM.pm | 2 +- src/PVE/HA/Manager.pm | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/PVE/HA/LRM.pm b/src/PVE/HA/LRM.pm index a49410f..26c5c89 100644 --- a/src/PVE/HA/LRM.pm +++ b/src/PVE/HA/LRM.pm @@ -376,7 +376,7 @@ sub run_workers { while (($haenv->get_time() - $starttime) < 5) { my $count = $self->check_active_workers(); - foreach my $sid (keys %{$self->{workers}}) { + foreach my $sid (sort keys %{$self->{workers}}) { last if $count >= $max_workers && $max_workers > 0; my $w = $self->{workers}->{$sid}; diff --git a/src/PVE/HA/Manager.pm b/src/PVE/HA/Manager.pm index 622ece8..1208720 100644 --- a/src/PVE/HA/Manager.pm +++ b/src/PVE/HA/Manager.pm @@ -360,7 +360,7 @@ sub manage { $self->recompute_online_node_usage(); - foreach my $sid (keys %$ss) { + foreach my $sid (sort keys %$ss) { my $sd = $ss->{$sid}; my $cd = $sc->{$sid} || { state => 'disabled' }; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [ha-manager] various patches
I picked a few patches from my feature branches which aren't relevant to those features to get them upstream. Patch 5 and 6 are v2 of a previous send regarding relocation policy improvements. cheers, Thomas ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server v2] fix #1010: whitelist options for permissions
instead of defaulting to VM.Config.Options for all options not checked seperately, we now have lists for the different config permissions and check them accordingly. for everything not given, we require root access this is important especially for usbN and hostpciN since they can change the host configuration Signed-off-by: Dominik Csapak--- changes from v1: * shifted numa\d+ options to cpu * parallel and serial can access the host, so require root * renamed remainingoptions to generaloptions PVE/API2/Qemu.pm | 84 ++-- 1 file changed, 75 insertions(+), 9 deletions(-) diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm index 3a3c71a..aadfde5 100644 --- a/PVE/API2/Qemu.pm +++ b/PVE/API2/Qemu.pm @@ -176,6 +176,64 @@ my $create_disks = sub { return $vollist; }; +my $cpuoptions = { +'cores' => 1, +'cpu' => 1, +'cpulimit' => 1, +'cpuunits' => 1, +'numa' => 1, +'smp' => 1, +'sockets' => 1, +'vpcus' => 1, +}; + +my $memoryoptions = { +'memory' => 1, +'balloon' => 1, +'shares' => 1, +}; + +my $hwtypeoptions = { +'acpi' => 1, +'hotplug' => 1, +'kvm' => 1, +'machine' => 1, +'scsihw' => 1, +'smbios1' => 1, +'tablet' => 1, +'vga' => 1, +'watchdog' => 1, +}; + +my $generaloptions = { +'agent' => 1, +'autostart' => 1, +'bios' => 1, +'description' => 1, +'keyboard' => 1, +'localtime' => 1, +'migrate_downtime' => 1, +'migrate_speed' => 1, +'name' => 1, +'onboot' => 1, +'ostype' => 1, +'protection' => 1, +'reboot' => 1, +'startdate' => 1, +'startup' => 1, +'tdf' => 1, +'template' => 1, +}; + +my $vmpoweroptions = { +'freeze' => 1, +}; + +my $diskoptions = { +'boot' => 1, +'bootdisk' => 1, +}; + my $check_vm_modify_config_perm = sub { my ($rpcenv, $authuser, $vmid, $pool, $key_list) = @_; @@ -184,22 +242,30 @@ my $check_vm_modify_config_perm = sub { foreach my $opt (@$key_list) { # disk checks need to be done somewhere else next if PVE::QemuServer::is_valid_drivename($opt); + next if $opt eq 'cdrom'; - if ($opt eq 'sockets' || $opt eq 'cores' || - $opt eq 'cpu' || $opt eq 'smp' || $opt eq 'vcpus' || - $opt eq 'cpulimit' || $opt eq 'cpuunits') { + if ($cpuoptions->{$opt} || $opt =~ m/^numa\d+$/) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.CPU']); - } elsif ($opt eq 'memory' || $opt eq 'balloon' || $opt eq 'shares') { + } elsif ($memoryoptions->{$opt}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Memory']); - } elsif ($opt eq 'args' || $opt eq 'lock') { - die "only root can set '$opt' config\n"; - } elsif ($opt eq 'cpu' || $opt eq 'kvm' || $opt eq 'acpi' || $opt eq 'machine' || -$opt eq 'vga' || $opt eq 'watchdog' || $opt eq 'tablet' || $opt eq 'smbios1') { + } elsif ($hwtypeoptions->{$opt}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.HWType']); + } elsif ($generaloptions->{$opt}) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Options']); + # special case for startup since it changes host behaviour + if ($opt eq 'startup') { + $rpcenv->check_full($authuser, "/", ['Sys.Modify']); + } + } elsif ($vmpoweroptions->{$opt}) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.PowerMgmt']); + } elsif ($diskoptions->{$opt}) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Disk']); } elsif ($opt =~ m/^net\d+$/) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Network']); } else { - $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Options']); + # catches usb\d+, hostpci\d+, args, lock, etc. + # new options will be checked here + die "only root can set '$opt' config\n"; } } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] fix #1010: whitelist options for permissions
comments inline > + } elsif ($remainingoptions->{$opt} || $opt =~ > m/^(numa|parallell|serial)\d+$/) { parallell => parallel And those options allows access to host HW, so we need to restrict access. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH qemu-server] fix #1010: whitelist options for permissions
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v3 2/3] migrate: use ssh over socat provided UNIX socks as tunnel
We cannot guarantee when the SSH forward Tunnel really becomes ready. The check with the mtunnel API call did not help for this prolem as it only checked that the SSH connection itself works and that the destination node has quorum but the forwarded tunnel itself was not checked. The Forward tunnel is a different channel in the SSH connection, independent of the SSH `qm mtunnel` channel, so only if that works it does not guarantees that our migration tunnel is up and ready. When the node(s) where under load, or when we did parallel migrations (migrateall), the migrate command was often started before a tunnel was open and ready to receive data. This led to a direct abortion of the migration and is the main cause in why parallel migrations often leave two thirds or more VMs on the source node. The issue was tracked down to SSH after debugging the QEMU process and enabling debug logging showed that the tunnel became often to late available and ready, or not at all. Fixing the TCP forward tunnel is quirky and not straight ahead, the only way SSH gives as a possibility is to use -N (no command) -f (background) and -o "ExitOnForwardFailure=yes", then it would wait in the foreground until the tunnel is ready and only then background itself. This is not quite the nicest way for our special use case and our code base. Waiting for the local port to become open and ready (through /proc/net/tcp[6]] as a proof of concept is not enough, even if the port is in the listening state and should theoretically accept connections this still failed often as the tunnel was not yet fully ready. Further another problem would still be open if we tried to patch the SSH Forward method we currently use - which we solve for free with the approach of this patch - namely the problem that the method to get an available port (next_migration_port) has a serious race condition which could lead to multiple use of the same port on a parallel migration (I observed this on my many test, seldom but if it happens its really bad). So lets now use UNIX sockets, which ssh supports since version 5.7. The end points are UNIX socket bound to the VMID - thus no port so no race and also no limitation of available ports (we reserved 50 for migration). The endpoints get created in /run/qemu-server/VMID.migrate and as KVM/QEMU is able to use UNIX socket just as well as TCP we have not to change much on the interaction with QEMU. QEMU is started with the migrate_incoming url at the local destination endpoint and creates the socket file, we then create a listening socket on the source side and connect over SSH to the destination. Now the migration can be started by issuing the migrate qmp command with an updated uri. Another small issue was that we used open2 to make the tunnel in a child process but never collected it when we closed tunnel, fix that. Signed-off-by: Thomas Lamprecht--- PVE/QemuMigrate.pm | 79 +- PVE/QemuServer.pm | 22 +-- 2 files changed, 68 insertions(+), 33 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index bfa2277..de6b13d 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -73,11 +73,11 @@ sub finish_command_pipe { } sub fork_tunnel { -my ($self, $nodeip, $lport, $rport) = @_; +my ($self, $sock_addr) = @_; -my @localtunnelinfo = $lport ? ('-L' , "$lport:localhost:$rport" ) : (); +my @localtunnelinfo = ('-L' , "$sock_addr:$sock_addr" ); -my $cmd = [@{$self->{rem_ssh}}, @localtunnelinfo, 'qm', 'mtunnel' ]; +my $cmd = [@{$self->{rem_ssh}}, '-o ExitOnForwardFailure=yes', @localtunnelinfo, 'qm', 'mtunnel' ]; my $tunnel = $self->fork_command_pipe($cmd); @@ -97,12 +97,16 @@ sub fork_tunnel { $self->finish_command_pipe($tunnel); die "can't open migration tunnel - $err"; } + +$tunnel->{sock_addr} = $sock_addr; + return $tunnel; } sub finish_tunnel { -my ($self, $tunnel) = @_; +my ($self) = @_; +my $tunnel = $self->{tunnel}; my $writer = $tunnel->{writer}; eval { @@ -115,7 +119,18 @@ sub finish_tunnel { $self->finish_command_pipe($tunnel); -die $err if $err; +# ssh does not clean up on local host +my $cmd = ['rm', '-f', $tunnel->{sock_addr}]; # +PVE::Tools::run_command($cmd); + +# .. and just to be sure check on remote side +unshift @{$cmd}, @{$self->{rem_ssh}}; +PVE::Tools::run_command($cmd); + +if ($err) { + $self->log('err', $err); + $self->{errors} = 1; +} } sub lock_vm { @@ -329,6 +344,7 @@ sub phase2 { my $raddr; my $rport; +my $ruri; # the whole migration dst. URI (protocol:address[:port]) my $nodename = PVE::INotify::nodename(); ## start on remote node @@ -352,14 +368,22 @@ sub phase2 { # instead we pipe it through STDIN PVE::Tools::run_command($cmd, input => $spice_ticket, outfunc => sub { my $line =
[pve-devel] [PATCH v3 3/3] migrate: add some more log output
Output all errors - if any - and add some log outputs on what we qmp commands we do with which parameters, may be helpful when debugging or analyzing a users problem. Also check if the queried status is defined, as on a error this may not be. Signed-off-by: Thomas Lamprecht--- PVE/QemuMigrate.pm | 14 ++ 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index de6b13d..7f860d4 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -441,6 +441,7 @@ sub phase2 { $self->log('info', "migrate_set_downtime error: $@") if $@; } +$self->log('info', "set migration_caps"); eval { PVE::QemuServer::set_migration_caps($vmid); }; @@ -448,10 +449,12 @@ sub phase2 { #set cachesize 10% of the total memory my $cachesize = int($conf->{memory}*1048576/10); +$self->log('info', "set cachesize: $cachesize"); eval { - PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate-set-cache-size", value => $cachesize); + PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate-set-cache-size", value => int($cachesize)); }; - +$self->log('info', "migrate-set-cache-size error: $@") if $@; + if (PVE::QemuServer::vga_conf_has_spice($conf->{vga})) { my $rpcenv = PVE::RPCEnvironment::get(); my $authuser = $rpcenv->get_user(); @@ -472,6 +475,7 @@ sub phase2 { } +$self->log('info', "start migrate command to $ruri"); eval { PVE::QemuServer::vm_mon_cmd_nocheck($vmid, "migrate", uri => $ruri); }; @@ -496,6 +500,7 @@ sub phase2 { if (my $err = $@) { $err_count++; warn "query migrate failed: $err\n"; + $self->log('info', "query migrate failed: $err"); if ($err_count <= 5) { usleep(100); next; @@ -503,12 +508,12 @@ sub phase2 { die "too many query migrate failures - aborting\n"; } -if ($stat->{status} =~ m/^(setup)$/im) { +if (defined($stat->{status}) && $stat->{status} =~ m/^(setup)$/im) { sleep(1); next; } - if ($stat->{status} =~ m/^(active|completed|failed|cancelled)$/im) { + if (defined($stat->{status}) && $stat->{status} =~ m/^(active|completed|failed|cancelled)$/im) { $merr = undef; $err_count = 0; if ($stat->{status} eq 'completed') { @@ -521,6 +526,7 @@ sub phase2 { } if ($stat->{status} eq 'failed' || $stat->{status} eq 'cancelled') { + $self->log('info', "migration status error: $stat->{status}"); die "aborting\n" } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH v3 1/3] migrate: collect migration tunnel child process
As we open2 it we also need to collect it to avoid zombies Omit timeout as its only used once and so we can use a static wait time. Signed-off-by: Thomas Lamprecht--- PVE/QemuMigrate.pm | 39 +++ 1 file changed, 19 insertions(+), 20 deletions(-) diff --git a/PVE/QemuMigrate.pm b/PVE/QemuMigrate.pm index a288627..bfa2277 100644 --- a/PVE/QemuMigrate.pm +++ b/PVE/QemuMigrate.pm @@ -5,6 +5,7 @@ use warnings; use PVE::AbstractMigrate; use IO::File; use IPC::Open2; +use POSIX qw( WNOHANG ); use PVE::INotify; use PVE::Tools; use PVE::Cluster; @@ -42,7 +43,10 @@ sub fork_command_pipe { } sub finish_command_pipe { -my ($self, $cmdpipe, $timeout) = @_; +my ($self, $cmdpipe) = @_; + +my $cpid = $cmdpipe->{pid}; +return undef if !$cpid; my $writer = $cmdpipe->{writer}; my $reader = $cmdpipe->{reader}; @@ -50,27 +54,22 @@ sub finish_command_pipe { $writer->close(); $reader->close(); -my $cpid = $cmdpipe->{pid}; - -if ($timeout) { - for (my $i = 0; $i < $timeout; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); - sleep(1); +# collect child process +for (my $i = 1; $i < 20; $i++) { + my $waitpid = waitpid($cpid, WNOHANG); + last if (defined($waitpid) && ($waitpid == $cpid)); + + if ($i == 10) { + $self->log('info', "ssh tunnel still running - terminating now with SIGTERM"); + kill(15, $cpid); + } elsif ($i >= 15) { + $self->log('info', "ssh tunnel still running - terminating now with SIGKILL"); + kill(9, $cpid); } + sleep (1); } -$self->log('info', "ssh tunnel still running - terminating now with SIGTERM\n"); -kill(15, $cpid); - -# wait again -for (my $i = 0; $i < 10; $i++) { - return if !PVE::ProcFSTools::check_process_running($cpid); - sleep(1); -} - -$self->log('info', "ssh tunnel still running - terminating now with SIGKILL\n"); -kill 9, $cpid; -sleep 1; +delete $cmdpipe->{cpid}; } sub fork_tunnel { @@ -114,7 +113,7 @@ sub finish_tunnel { }; my $err = $@; -$self->finish_command_pipe($tunnel, 30); +$self->finish_command_pipe($tunnel); die $err if $err; } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] Online Help for complex PVE panels V2
Changes since V1: * the help button was not hidden after switching InputPanels via direct tab clicks. To solve this, add a listener for 'deactivate' events, and react appropriately (1/5) Since the pve Wizard sends this event in all cases, remove the call to hide() in the wizard (2/5) * add in the pveHelpButton documentation header that this button is meant for modal windows. Components located in the workspace area can simply add a Standard ExtJS button in their toolbar if they want a help button in the future (1/5) ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-manager v2 4/5] Add custom syle to pveHelpButton
Next / OK are already displayed in blue, which is the 'call-to-action' color we use everywhere. To prevent stealing attention from these buttons, switch help button to grey --- www/css/ext6-pve.css | 4 www/manager6/button/HelpButton.js | 4 +++- 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/www/css/ext6-pve.css b/www/css/ext6-pve.css index 2f83863..b672892 100644 --- a/www/css/ext6-pve.css +++ b/www/css/ext6-pve.css @@ -363,3 +363,7 @@ left: -3px; top: 2px; } + +.pve-help-button .x-btn-inner { +color: black; +} diff --git a/www/manager6/button/HelpButton.js b/www/manager6/button/HelpButton.js index b00ff14..82f0935 100644 --- a/www/manager6/button/HelpButton.js +++ b/www/manager6/button/HelpButton.js @@ -5,7 +5,9 @@ Ext.define('PVE.button.Help', { extend: 'Ext.button.Button', alias: 'widget.pveHelpButton', text: gettext('Help'), -iconCls: 'fa fa-question-circle', +// make help button less flashy by styling it like toolbar buttons +iconCls: ' x-btn-icon-el-default-toolbar-small fa fa-question-circle', +cls: 'x-btn-default-toolbar-small pve-help-button', hidden: true, controller: { xclass: 'Ext.app.ViewController', -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-manager v2 2/5] Add support for onlineHelp in Creation Wizard
Inside a wizard, switching to a new tab will fire the 'activate' event to the new tab, causing the inputPanel of this tab to display its help in the wizard window. --- www/manager6/window/Wizard.js | 4 1 file changed, 4 insertions(+) diff --git a/www/manager6/window/Wizard.js b/www/manager6/window/Wizard.js index d6e22c1..3162843 100644 --- a/www/manager6/window/Wizard.js +++ b/www/manager6/window/Wizard.js @@ -65,6 +65,10 @@ Ext.define('PVE.window.Wizard', { region: 'south', margins: '0 5 5 5', items: [ + { + xtype: 'pveHelpButton', + itemId: 'help' + }, '->', { text: gettext('Back'), -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] qemu-server: address migrations issues
Changes since v2: - openssh is able to use UNIX socket forwards since 5.7 (and jessie has 5.7) so use that instead of socat. - split the part of child collection apart as it does not have antything to do with the tunnel problem ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-manager v2 3/5] Add support for Help Button in Edit windows
--- www/manager6/window/Edit.js | 6 ++ 1 file changed, 6 insertions(+) diff --git a/www/manager6/window/Edit.js b/www/manager6/window/Edit.js index 28067a6..b231003 100644 --- a/www/manager6/window/Edit.js +++ b/www/manager6/window/Edit.js @@ -238,6 +238,12 @@ Ext.define('PVE.window.Edit', { me.buttons = [ submitBtn, resetBtn ]; } + if (items[0].onlineHelp) { + var helpButton = Ext.create('PVE.button.Help'); + me.buttons.unshift(helpButton, '->'); + Ext.GlobalEvents.fireEvent('pveShowHelp', items[0].onlineHelp); + } + Ext.applyIf(me, { modal: true, width: twoColumn ? colwidth*2 : colwidth, -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH pve-manager v2 1/5] Add help button for PVE
This help button is meant to be added on InputPanels, where a link to an online documentation chapter or subschapter is available. Clicking on the help button will open the help in a new browser tab. Original idea similar to the pfSense GUI. --- www/manager6/Makefile | 1 + www/manager6/button/HelpButton.js | 30 ++ www/manager6/panel/InputPanel.js | 14 +- 3 files changed, 44 insertions(+), 1 deletion(-) create mode 100644 www/manager6/button/HelpButton.js diff --git a/www/manager6/Makefile b/www/manager6/Makefile index 5b7f4bb..d6a3389 100644 --- a/www/manager6/Makefile +++ b/www/manager6/Makefile @@ -8,6 +8,7 @@ JSSRC= \ button/Button.js\ button/ConsoleButton.js \ button/Split.js \ + button/HelpButton.js\ qemu/SendKeyMenu.js \ qemu/CmdMenu.js \ qemu/TemplateMenu.js\ diff --git a/www/manager6/button/HelpButton.js b/www/manager6/button/HelpButton.js new file mode 100644 index 000..b00ff14 --- /dev/null +++ b/www/manager6/button/HelpButton.js @@ -0,0 +1,30 @@ +/* help button pointing to an online documentation + for components contained in a modal window + */ +Ext.define('PVE.button.Help', { +extend: 'Ext.button.Button', +alias: 'widget.pveHelpButton', +text: gettext('Help'), +iconCls: 'fa fa-question-circle', +hidden: true, +controller: { + xclass: 'Ext.app.ViewController', + listen: { + global: { + pveShowHelp: 'onPveShowHelp', + pveHideHelp: 'onPveHideHelp' + } + }, + onPveShowHelp: function(helpLink) { + this.getView().setHandler(function() { + var docsURI = window.location.origin + + '/pve-docs/' + helpLink; + window.open(docsURI); + }); + this.getView().show(); + }, + onPveHideHelp: function() { + this.getView().hide(); + } +} +}); \ No newline at end of file diff --git a/www/manager6/panel/InputPanel.js b/www/manager6/panel/InputPanel.js index 8ab1e33..9339eb4 100644 --- a/www/manager6/panel/InputPanel.js +++ b/www/manager6/panel/InputPanel.js @@ -1,9 +1,21 @@ Ext.define('PVE.panel.InputPanel', { extend: 'Ext.panel.Panel', alias: ['widget.inputpanel'], - +listeners: { + activate: function() { + // notify owning container that it should display a help button + this.onlineHelp && Ext.GlobalEvents.fireEvent('pveShowHelp', this.onlineHelp); + }, + deactivate: function() { + this.onlineHelp && Ext.GlobalEvents.fireEvent('pveHideHelp', this.onlineHelp); + }, +}, border: false, +// override this with an URL to a relevant chapter of the pve manual +// setting this will display a help button in our parent panel +onlineHelp: undefined, + // overwrite this to modify submit data onGetValues: function(values) { return values; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server] use the -wait-for-cgroup option on kvm
--- PVE/QemuServer.pm | 1 + 1 file changed, 1 insertion(+) diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm index b13dc71..d75ac98 100644 --- a/PVE/QemuServer.pm +++ b/PVE/QemuServer.pm @@ -2750,6 +2750,7 @@ sub config_to_command { push @$cmd, '-pidfile' , pidfile_name($vmid); push @$cmd, '-daemonize'; +push @$cmd, '-wait-for-cgroup', "cpu,cpuacct:/qemu.slice/$vmid.scope"; if ($conf->{smbios1}) { push @$cmd, '-smbios', "type=1,$conf->{smbios1}"; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH kvm] add -wait-for-cgroup option to deal with systemd-run race
--- ...r-cgroup-option-to-deal-with-systemd-run-.patch | 161 + debian/patches/series | 1 + 2 files changed, 162 insertions(+) create mode 100644 debian/patches/pve/0046-add-wait-for-cgroup-option-to-deal-with-systemd-run-.patch diff --git a/debian/patches/pve/0046-add-wait-for-cgroup-option-to-deal-with-systemd-run-.patch b/debian/patches/pve/0046-add-wait-for-cgroup-option-to-deal-with-systemd-run-.patch new file mode 100644 index 000..1bc99ae --- /dev/null +++ b/debian/patches/pve/0046-add-wait-for-cgroup-option-to-deal-with-systemd-run-.patch @@ -0,0 +1,161 @@ +From 6d09e3d78a3ad75cda24ef1f7e78f047d437dd1a Mon Sep 17 00:00:00 2001 +From: Wolfgang Bumiller+Date: Tue, 31 May 2016 15:58:39 +0200 +Subject: [PATCH] add -wait-for-cgroup option to deal with systemd-run race + +--- + os-posix.c | 94 - + qemu-options.hx | 2 ++ + 2 files changed, 95 insertions(+), 1 deletion(-) + +diff --git a/os-posix.c b/os-posix.c +index e4da406..7de324b 100644 +--- a/os-posix.c b/os-posix.c +@@ -49,6 +49,7 @@ static struct passwd *user_pwd; + static const char *chroot_dir; + static int daemonize; + static int daemon_pipe; ++static GPtrArray *wait_cgroups; + + void os_setup_early_signal_handling(void) + { +@@ -156,6 +157,11 @@ void os_parse_cmd_args(int index, const char *optarg) + case QEMU_OPTION_daemonize: + daemonize = 1; + break; ++case QEMU_OPTION_wait_for_cgroup: ++if (!wait_cgroups) ++wait_cgroups = g_ptr_array_new(); ++g_ptr_array_add(wait_cgroups, g_strdup(optarg)); ++break; + #if defined(CONFIG_LINUX) + case QEMU_OPTION_enablefips: + fips_set_state(true); +@@ -202,8 +208,94 @@ static void change_root(void) + + } + ++static bool wait_for_cgroup(char *cgroup) ++{ ++static bool unified = false; ++ ++bool found = false; ++char *colon; ++const char *type; ++size_t type_len; ++FILE *fh; ++char *line = NULL; ++size_t n = 0; ++ssize_t linelen; ++ ++if (unified) { ++fprintf(stderr, "Mixed cgroup v1 and v2 parameters\n"); ++exit(1); ++} ++ ++colon = strchr(cgroup, ':'); ++if (!colon) { ++unified = true; ++type = ""; ++} else { ++type = cgroup; ++*colon = 0; ++cgroup = colon+1; ++} ++type_len = strlen(type); ++ ++while (!found) { ++fh = fopen("/proc/self/cgroup", "r"); ++if (!fh) { ++perror("open(/proc/self/cgroup)"); ++exit(1); ++} ++ ++while ((linelen = getline(, , fh)) != -1) { ++line[linelen-1] = 0; /* cut newline */ ++if (strncmp(line, "0::", 3) == 0) { ++/* This is a cgroup v2 file */ ++if (strcmp(line+3, cgroup) == 0) { ++found = true; ++} ++/* cgroup v2 only has one line */ ++break; ++} else { ++colon = strchr(line, ':'); ++if (!colon) { ++/* bad cgroup line */ ++continue; ++} ++colon += 1; ++if (strncmp(colon, type, type_len) != 0 || colon[type_len] != ':') { ++/* not the cgroup we're interested in */ ++continue; ++} ++if (strcmp(colon + type_len + 1, cgroup) == 0) { ++found = true; ++break; ++} ++} ++} ++ ++fclose(fh); ++free(line); ++line = NULL; ++if (!found) { ++usleep(5); ++} ++} ++ ++return found; ++} ++ + void os_daemonize(void) + { ++if (wait_cgroups) { ++char **cgroups, **i; ++ ++g_ptr_array_add(wait_cgroups, NULL); ++cgroups = (char**)g_ptr_array_free(wait_cgroups, FALSE); ++ ++for (i = cgroups; *i; ++i) { ++wait_for_cgroup(*i); ++} ++ ++g_strfreev(cgroups); ++} + if (daemonize) { + pid_t pid; + int fds[2]; +@@ -254,6 +346,7 @@ void os_daemonize(void) + + void os_setup_post(void) + { ++uint8_t status = 0; + int fd = 0; + + if (daemonize) { +@@ -271,7 +364,6 @@ void os_setup_post(void) + change_process_uid(); + + if (daemonize) { +-uint8_t status = 0; + ssize_t len; + + dup2(fd, 0); +diff --git a/qemu-options.hx b/qemu-options.hx +index a9ecd76..83ed899 100644 +--- a/qemu-options.hx b/qemu-options.hx +@@ -3119,6 +3119,8 @@ ETEXI + #ifndef _WIN32 + DEF("daemonize", 0, QEMU_OPTION_daemonize, \ + "-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL) ++DEF("wait-for-cgroup", HAS_ARG, QEMU_OPTION_wait_for_cgroup, \ ++"-wait-for-cgroup stay in foreground even as daemon\n", QEMU_ARCH_ALL)
Re: [pve-devel] [PATCH common 0/2] two more syscalls: opeant() and mkdirat()
applied ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH v2 1/2] migrate: use ssh over socat provided UNIX socks as tunnel
On 05/31/2016 07:29 PM, Dietmar Maurer wrote: >> Further another problem would still be open if we tried to patch the >> SSH Forward method we currently use - which we solve for free with >> the approach of this patch - namely the problem that the method >> to get an available port (next_migration_port) has a serious race >> condition > Why is there a race condition exactly? If so, we have to fix that. It's not directly in next_unused_port as this is flock'ed, but if the program which requests a port needs to long to open it, it may be seen as timeout-ed in next_unused_port and another program gets assigned the same port, then both may try to open/connect to it. As we did not have the SSH options ExitOnForwardFailure enabled the second migrations ssh tunnel trying to bind to the local port did not failed when it couldn't and qemu the writes also to a port where it gets a connection refused (as the other migration is running on it). This may give also troubles for other programs using (indirectly) next_unused_port, but at least the race condition should trigger really seldom, I'll look into a way to fix that after the v3 of this patches. ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH qemu-server] fix #1010: whitelist options for permissions
instead of defaulting to VM.Config.Options for all options not checked seperately, we now have lists for the different config permissions and check them accordingly. for everything not given, we require root access this is important especially for usbN and hostpciN since they can change the host configuration Signed-off-by: Dominik Csapak--- PVE/API2/Qemu.pm | 84 ++-- 1 file changed, 75 insertions(+), 9 deletions(-) diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm index 3a3c71a..fc22519 100644 --- a/PVE/API2/Qemu.pm +++ b/PVE/API2/Qemu.pm @@ -176,6 +176,64 @@ my $create_disks = sub { return $vollist; }; +my $cpuoptions = { +'cores' => 1, +'cpu' => 1, +'cpulimit' => 1, +'cpuunits' => 1, +'numa' => 1, +'smp' => 1, +'sockets' => 1, +'vpcus' => 1, +}; + +my $memoryoptions = { +'memory' => 1, +'balloon' => 1, +'shares' => 1, +}; + +my $hwtypeoptions = { +'acpi' => 1, +'hotplug' => 1, +'kvm' => 1, +'machine' => 1, +'scsihw' => 1, +'smbios1' => 1, +'tablet' => 1, +'vga' => 1, +'watchdog' => 1, +}; + +my $remainingoptions = { +'agent' => 1, +'autostart' => 1, +'bios' => 1, +'description' => 1, +'keyboard' => 1, +'localtime' => 1, +'migrate_downtime' => 1, +'migrate_speed' => 1, +'name' => 1, +'onboot' => 1, +'ostype' => 1, +'protection' => 1, +'reboot' => 1, +'startdate' => 1, +'startup' => 1, +'tdf' => 1, +'template' => 1, +}; + +my $vmpoweroptions = { +'freeze' => 1, +}; + +my $diskoptions = { +'boot' => 1, +'bootdisk' => 1, +}; + my $check_vm_modify_config_perm = sub { my ($rpcenv, $authuser, $vmid, $pool, $key_list) = @_; @@ -184,22 +242,30 @@ my $check_vm_modify_config_perm = sub { foreach my $opt (@$key_list) { # disk checks need to be done somewhere else next if PVE::QemuServer::is_valid_drivename($opt); + next if $opt eq 'cdrom'; - if ($opt eq 'sockets' || $opt eq 'cores' || - $opt eq 'cpu' || $opt eq 'smp' || $opt eq 'vcpus' || - $opt eq 'cpulimit' || $opt eq 'cpuunits') { + if ($cpuoptions->{$opt}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.CPU']); - } elsif ($opt eq 'memory' || $opt eq 'balloon' || $opt eq 'shares') { + } elsif ($memoryoptions->{$opt}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Memory']); - } elsif ($opt eq 'args' || $opt eq 'lock') { - die "only root can set '$opt' config\n"; - } elsif ($opt eq 'cpu' || $opt eq 'kvm' || $opt eq 'acpi' || $opt eq 'machine' || -$opt eq 'vga' || $opt eq 'watchdog' || $opt eq 'tablet' || $opt eq 'smbios1') { + } elsif ($hwtypeoptions->{$opt}) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.HWType']); + } elsif ($remainingoptions->{$opt} || $opt =~ m/^(numa|parallell|serial)\d+$/) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Options']); + # special case for startup since it changes host behaviour + if ($opt eq 'startup') { + $rpcenv->check_full($authuser, "/", ['Sys.Modify']); + } + } elsif ($vmpoweroptions->{$opt}) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.PowerMgmt']); + } elsif ($diskoptions->{$opt}) { + $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Disk']); } elsif ($opt =~ m/^net\d+$/) { $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Network']); } else { - $rpcenv->check_vm_perm($authuser, $vmid, $pool, ['VM.Config.Options']); + # catches usb\d+, hostpci\d+, args, lock, etc. + # new options will be checked here + die "only root can set '$opt' config\n"; } } -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH_V4 pve-container 5/5] Add full clone with running CT
comments inline > Wolfgang Linkhat am 1. Juni 2016 um 09:22 geschrieben: > > > With this patch it is possible to make a full clone from an running container, > if the underlying Storage provides snapshots. > --- > src/PVE/API2/LXC.pm | 42 +- > 1 file changed, 41 insertions(+), 1 deletion(-) > > diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm > index 6fa1da0..c3b02c9 100644 > --- a/src/PVE/API2/LXC.pm > +++ b/src/PVE/API2/LXC.pm > @@ -1087,7 +1087,8 @@ __PACKAGE__->register_method({ > > my $clonefn = sub { > > - die "Clone can't be done online\n." if > PVE::LXC::check_running($vmid); > + my $running = PVE::LXC::check_running($vmid); > + > # do all tests after lock > # we also try to do all tests before we fork the worker > my $conf = PVE::LXC::Config->load_config($vmid); > @@ -1107,6 +1108,32 @@ __PACKAGE__->register_method({ > my $mountpoints = {}; > my $fullclone = {}; > my $vollist = []; > + my $clone_volumes = []; > + > + #create snapshot for full clone to minimize downtime > + if ($running && !$snapname) { > + $snapname = 'cloneCT'; > + > + PVE::Tools::run_command(['/usr/bin/lxc-freeze', '-n', $vmid]); > + > + foreach my $opt (keys %$conf) { > + my $value = $conf->{$opt}; > + > + next if $opt !~ /rootfs|mp\d+/; > + > + my $mp = $opt eq 'rootfs' ? > + PVE::LXC::Config->parse_ct_rootfs($value) : > + PVE::LXC::Config->parse_ct_mountpoint($value); > + > + die "For online copy a CT storage must provide snapshots!\n" > + if !PVE::Storage::volume_has_feature($storecfg, > 'snapshot', $mp->{volume}, $snapname, 0);; > + > + PVE::Storage::volume_snapshot($storecfg, $mp->{volume}, > $snapname); > + push @$clone_volumes, $mp->{volume}; > + } why don't you use PVE::LXC::Config->snapshot_create() here instead of replicating its functionality? it does freezing and syncing and all the snapshot related checks.. > + > + PVE::Tools::run_command(['/usr/bin/lxc-unfreeze', '-n', $vmid]); > + } > > foreach my $opt (keys %$oldconf) { > my $value = $oldconf->{$opt}; > @@ -1189,6 +1216,19 @@ __PACKAGE__->register_method({ > > PVE::AccessControl::add_vm_to_pool($newid, $pool) if $pool; > }; > + > + if (defined($snapname) && $snapname eq 'cloneCT') { > + foreach my $vol (@$clone_volumes) { > + > + eval { > + PVE::Storage::volume_snapshot_delete($storecfg, > $vol, $snapname); > + }; > + #warn only and remove all snapshots > + if (my $err = $@) { > + warn $err; > + } > + } > + } same as above, why don't you simply use PVE::LXC::Config->snapshot_delete() (with force)? > if (my $err = $@) { > unlink $conffile; > > -- > 2.1.4 > > > ___ > pve-devel mailing list > pve-devel@pve.proxmox.com > http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel > ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
Re: [pve-devel] [PATCH_V4 pve-container 2/5] Add move_volume.
comments inline > Wolfgang Linkhat am 1. Juni 2016 um 09:22 geschrieben: > > > Now it is possible to move the volume to an other storage. > This works only when the CT is off, to keep the volume consistent. > --- > src/PVE/API2/LXC.pm | 116 > > src/PVE/CLI/pct.pm | 1 + > 2 files changed, 117 insertions(+) > > diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm > index 0bcadc3..71cf21d 100644 > --- a/src/PVE/API2/LXC.pm > +++ b/src/PVE/API2/LXC.pm > @@ -1374,4 +1374,120 @@ __PACKAGE__->register_method({ > return PVE::LXC::Config->lock_config($vmid, $code);; > }}); > > +__PACKAGE__->register_method({ > +name => 'move_ct_volume', > +path => '{vmid}/move_volume', > +method => 'POST', > +protected => 1, > +proxyto => 'node', > +description => "Move a rootfs-/mp-volume to a different storage", > +permissions => { > + description => "You need 'VM.Config.Disk' permissions on /vms/{vmid}, " > . > + "and 'Datastore.AllocateSpace' permissions on the storage.", > + check => > + [ 'and', > + ['perm', '/vms/{vmid}', [ 'VM.Config.Disk' ]], > + ['perm', '/storage/{storage}', [ 'Datastore.AllocateSpace' ]], > + ], > +}, > +parameters => { > + additionalProperties => 0, > + properties => { > + node => get_standard_option('pve-node'), > + vmid => get_standard_option('pve-vmid', { completion => > \::LXC::complete_ctid }), > + storage => get_standard_option('pve-storage-id', { > + description => "Target Storage.", > + default => 'local', > + completion => \::Storage::complete_storage_enabled, > + }), > + delete => { > + type => 'boolean', > + description => "Delete the original volume after successful > copy. By default the original disk is kept as unused disk.", > + optional => 1, > + default => 0, > + }, > + volume => { > + type => 'string', > + description => "Volume which will move.\n Format: > [rootfs|mp]", > + > + }, this format is not enforced anywhere, see below > + digest => { > + type => 'string', > + description => 'Prevent changes if current configuration file > has different SHA1 digest. This can be used to prevent concurrent > modifications.', > + maxLength => 40, > + optional => 1, > + } > + }, > +}, > +returns => { > + type => 'string', > +}, > +code => sub { > + my ($param) = @_; > + > + my $rpcenv = PVE::RPCEnvironment::get(); > + > + my $authuser = $rpcenv->get_user(); > + > + my $node = extract_param($param, 'node'); > + > + my $vmid = extract_param($param, 'vmid'); > + > + my $storage = extract_param($param, 'storage'); > + > + my $volume = extract_param($param, 'volume'); > + > + my $delete = extract_param($param, 'delete'); > + > + my $digest = extract_param($param, 'digest'); > + > + my $code = sub { > + > + my $conf = PVE::LXC::Config->load_config($vmid); > + PVE::LXC::Config->check_lock($conf); > + > + die "can't move volume: $volume if snapshot exists\n" > + if %{$conf->{snapshots}}; > + > + PVE::Tools::assert_if_modified($digest, $conf->{digest}); > + > + die "Move Volume can't be done online.\n" if > PVE::LXC::check_running($vmid); > + > + PVE::Cluster::log_msg('info', $authuser, "move_volume CT:$vmid > Volume:$volume to Storage:$storage"); > + my $realcmd = sub { > + > + my $storage_cfg = PVE::Storage::config(); > + > + my $mp; > + if ($volume eq 'rootfs') { > + $mp = PVE::LXC::Config->parse_ct_rootfs($conf->{$volume}); > + } else { > + $mp = > PVE::LXC::Config->parse_ct_mountpoint($conf->{$volume}); > + } > + no check for definedness of $conf->{$volume}, no check whether $volume is a valid mountpoint key other than rootfs (you can use PVE::LXC::Config->mountpoint_names() if you don't want to hardcode mp\d+) > + my $old_volid = $mp->{volume}; > + > + eval { > + $mp->{volume} = PVE::LXC::copy_volume($mp, $vmid, $vmid, > $storage, $storage_cfg, $conf); > + > + $conf->{$volume} = > PVE::LXC::Config->print_ct_mountpoint($mp, $volume eq 'rootfs'); > + }; > + if (my $err = $@) { > + die $err; > + } > + > + if ($delete) { > + PVE::Storage::vdisk_free($storage_cfg, $old_volid); > + } else { > + my $unused = PVE::LXC::Config->add_unused_volume($conf, > $old_volid); > + $conf->{$unused} = $old_volid; add_unused_volume already modifies the config hash, so a simple
[pve-devel] [PATCH_V4 pve-container 3/5] clone disk is complete, so we can remove the experimental flag
--- src/PVE/API2/LXC.pm | 6 -- 1 file changed, 6 deletions(-) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 71cf21d..6fb0a62 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -1041,12 +1041,6 @@ __PACKAGE__->register_method({ "you clone a normal CT. For CT templates, we try to create a linked clone by default.", default => 0, }, - experimental => { - type => 'boolean', - description => "The clone feature is experimental, set this " . - "flag if you know what you are doing.", - default => 0, - }, # target => get_standard_option('pve-node', { # description => "Target node. Only allowed if the original VM is on shared storage.", # optional => 1, -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH_V4 pve-container 4/5] Prevent that a CT run at cloning.
If we make a linked clone the CT must be a template so it is not allowed to run. If we make a full clone, it is safer to have the CT offline. --- src/PVE/API2/LXC.pm | 11 +++ src/PVE/LXC.pm | 4 ++-- 2 files changed, 5 insertions(+), 10 deletions(-) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 6fb0a62..6fa1da0 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -1085,20 +1085,15 @@ __PACKAGE__->register_method({ PVE::Cluster::check_cfs_quorum(); - my $running = PVE::LXC::check_running($vmid) || 0; - my $clonefn = sub { + die "Clone can't be done online\n." if PVE::LXC::check_running($vmid); # do all tests after lock # we also try to do all tests before we fork the worker my $conf = PVE::LXC::Config->load_config($vmid); PVE::LXC::Config->check_lock($conf); - my $verify_running = PVE::LXC::check_running($vmid) || 0; - - die "unexpected state change\n" if $verify_running != $running; - die "snapshot '$snapname' does not exist\n" if $snapname && !defined( $conf->{snapshots}->{$snapname}); @@ -1131,7 +1126,7 @@ __PACKAGE__->register_method({ } else { # not full means clone instead of copy die "Linked clone feature for '$volid' is not available\n" - if !PVE::Storage::volume_has_feature($storecfg, 'clone', $volid, $snapname, $running); + if !PVE::Storage::volume_has_feature($storecfg, 'clone', $volid, $snapname, 0); } $mountpoints->{$opt} = $mp; @@ -1177,7 +1172,7 @@ __PACKAGE__->register_method({ if ($fullclone->{$opt}) { print "create full clone of mountpoint $opt ($volid)\n"; - $newvolid = PVE::LXC::copy_volume($mp, $vmid, $newid, $storage, $storecfg, $conf); + $newvolid = PVE::LXC::copy_volume($mp, $vmid, $newid, $storage, $storecfg, $conf, $snapname); } else { print "create linked clone of mountpoint $opt ($volid)\n"; $newvolid = PVE::Storage::vdisk_clone($storecfg, $volid, $newid, $snapname); diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index 4710215..38c8180 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -1359,7 +1359,7 @@ sub userns_command { } sub copy_volume { -my ($mp, $vmid, $newid, $storage, $storage_cfg, $conf) = @_; +my ($mp, $vmid, $newid, $storage, $storage_cfg, $conf, $snapname) = @_; die "Copy only mountpoints with type 'volume'" if $mp->{type} ne 'volume'; my $mount_dir = "/tmp/$vmid"; @@ -1393,7 +1393,7 @@ sub copy_volume { mkdir $src; #mount for copy mp - mountpoint_mount($mp, $src, $storage_cfg); + mountpoint_mount($mp, $src, $storage_cfg, $snapname); mountpoint_mount($new_mp, $dest, $storage_cfg); PVE::Tools::run_command(['/usr/bin/rsync', '--stats', '-X', '-A', '--numeric-ids', -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH_V4 pve-container 5/5] Add full clone with running CT
With this patch it is possible to make a full clone from an running container, if the underlying Storage provides snapshots. --- src/PVE/API2/LXC.pm | 42 +- 1 file changed, 41 insertions(+), 1 deletion(-) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 6fa1da0..c3b02c9 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -1087,7 +1087,8 @@ __PACKAGE__->register_method({ my $clonefn = sub { - die "Clone can't be done online\n." if PVE::LXC::check_running($vmid); + my $running = PVE::LXC::check_running($vmid); + # do all tests after lock # we also try to do all tests before we fork the worker my $conf = PVE::LXC::Config->load_config($vmid); @@ -1107,6 +1108,32 @@ __PACKAGE__->register_method({ my $mountpoints = {}; my $fullclone = {}; my $vollist = []; + my $clone_volumes = []; + + #create snapshot for full clone to minimize downtime + if ($running && !$snapname) { + $snapname = 'cloneCT'; + + PVE::Tools::run_command(['/usr/bin/lxc-freeze', '-n', $vmid]); + + foreach my $opt (keys %$conf) { + my $value = $conf->{$opt}; + + next if $opt !~ /rootfs|mp\d+/; + +my $mp = $opt eq 'rootfs' ? + PVE::LXC::Config->parse_ct_rootfs($value) : + PVE::LXC::Config->parse_ct_mountpoint($value); + + die "For online copy a CT storage must provide snapshots!\n" + if !PVE::Storage::volume_has_feature($storecfg, 'snapshot', $mp->{volume}, $snapname, 0);; + + PVE::Storage::volume_snapshot($storecfg, $mp->{volume}, $snapname); + push @$clone_volumes, $mp->{volume}; + } + + PVE::Tools::run_command(['/usr/bin/lxc-unfreeze', '-n', $vmid]); + } foreach my $opt (keys %$oldconf) { my $value = $oldconf->{$opt}; @@ -1189,6 +1216,19 @@ __PACKAGE__->register_method({ PVE::AccessControl::add_vm_to_pool($newid, $pool) if $pool; }; + + if (defined($snapname) && $snapname eq 'cloneCT') { + foreach my $vol (@$clone_volumes) { + + eval { + PVE::Storage::volume_snapshot_delete($storecfg, $vol, $snapname); + }; + #warn only and remove all snapshots + if (my $err = $@) { + warn $err; + } + } + } if (my $err = $@) { unlink $conffile; -- 2.1.4 ___ pve-devel mailing list pve-devel@pve.proxmox.com http://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel
[pve-devel] [PATCH_V4 pve-container 1/5] Add the functionality of full clone
--- src/PVE/API2/LXC.pm | 21 +- src/PVE/LXC.pm | 62 + 2 files changed, 72 insertions(+), 11 deletions(-) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 95932a9..0bcadc3 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -1079,6 +1079,7 @@ __PACKAGE__->register_method({ my $storage = extract_param($param, 'storage'); + die "Full clone requires a storage.\n"if $param->{full} && !$storage; my $localnode = PVE::INotify::nodename(); my $storecfg = PVE::Storage::config(); @@ -1132,10 +1133,6 @@ __PACKAGE__->register_method({ if ($mp->{type} eq 'volume') { my $volid = $mp->{volume}; if ($param->{full}) { - die "fixme: full clone not implemented"; - - die "Full clone feature for '$volid' is not available\n" - if !PVE::Storage::volume_has_feature($storecfg, 'copy', $volid, $snapname, $running); $fullclone->{$opt} = 1; } else { # not full means clone instead of copy @@ -1182,18 +1179,20 @@ __PACKAGE__->register_method({ foreach my $opt (keys %$mountpoints) { my $mp = $mountpoints->{$opt}; my $volid = $mp->{volume}; + my $newvolid; if ($fullclone->{$opt}) { - die "fixme: full clone not implemented\n"; + print "create full clone of mountpoint $opt ($volid)\n"; + $newvolid = PVE::LXC::copy_volume($mp, $vmid, $newid, $storage, $storecfg, $conf); } else { print "create linked clone of mountpoint $opt ($volid)\n"; - my $newvolid = PVE::Storage::vdisk_clone($storecfg, $volid, $newid, $snapname); - push @$newvollist, $newvolid; - $mp->{volume} = $newvolid; - - $newconf->{$opt} = PVE::LXC::Config->print_ct_mountpoint($mp, $opt eq 'rootfs'); - PVE::LXC::Config->write_config($newid, $newconf); + $newvolid = PVE::Storage::vdisk_clone($storecfg, $volid, $newid, $snapname); } + push @$newvollist, $newvolid; + $mp->{volume} = $newvolid; + + $newconf->{$opt} = PVE::LXC::Config->print_ct_mountpoint($mp, $opt eq 'rootfs'); + PVE::LXC::Config->write_config($newid, $newconf); } delete $newconf->{lock}; diff --git a/src/PVE/LXC.pm b/src/PVE/LXC.pm index 759ab6d..4710215 100644 --- a/src/PVE/LXC.pm +++ b/src/PVE/LXC.pm @@ -1358,5 +1358,67 @@ sub userns_command { return []; } +sub copy_volume { +my ($mp, $vmid, $newid, $storage, $storage_cfg, $conf) = @_; + +die "Copy only mountpoints with type 'volume'" if $mp->{type} ne 'volume'; +my $mount_dir = "/tmp/$vmid"; +my $dest = "$mount_dir\/dest"; +my $src = "$mount_dir\/src"; + +my $new_format = $storage_cfg->{ids}->{$storage}->{type} eq "zfspool" ? 'subvol' : 'raw'; + +my $new_volid = PVE::Storage::vdisk_alloc($storage_cfg, $storage, $newid, $new_format, undef, $mp->{size}/1024); + +#simplify mount struct. See mountpoint_mount(). +my $tmp_mp = $mp->{mp}; +$mp->{mp} = '/'; + +my $new_mp; +$new_mp->{mp} = '/'; +$new_mp->{volume} = $new_volid; +$new_mp->{size} = $mp->{size}; +$new_mp->{type} = "volume"; + +#get id's for unprivileged container +my (undef, $rootuid, $rootgid) = parse_id_maps($conf); + +my $path = PVE::Storage::path($storage_cfg, $new_volid, undef); + +eval { + mkfs($path, $rootuid, $rootgid) if $new_format ne "subvol"; + + mkdir $mount_dir; + mkdir $dest; + mkdir $src; + + #mount for copy mp + mountpoint_mount($mp, $src, $storage_cfg); + mountpoint_mount($new_mp, $dest, $storage_cfg); + + PVE::Tools::run_command(['/usr/bin/rsync', '--stats', '-X', '-A', '--numeric-ids', +'--whole-file', '--inplace', '--one-file-system', '-aH', +"$src\/.", $dest]); + +}; +my $err = $@; +eval { PVE::Tools::run_command(['/bin/umount', '--lazy', $src], errfunc => sub{})}; +warn "Can't umount $src\n" if $@; +eval { PVE::Tools::run_command(['/bin/umount', '--lazy', $dest], errfunc => sub{})}; +warn "Can't umount $src\n" if $@; + +rmdir $dest; +rmdir $src; +rmdir $mount_dir; + +#set it back to orgin value +$mp->{mp} = $tmp_mp; + +if ($err) { + PVE::Storage::vdisk_free($storage_cfg, $new_volid); + die $err; +} +return
[pve-devel] [PATCH_V4 pve-container 2/5] Add move_volume.
Now it is possible to move the volume to an other storage. This works only when the CT is off, to keep the volume consistent. --- src/PVE/API2/LXC.pm | 116 src/PVE/CLI/pct.pm | 1 + 2 files changed, 117 insertions(+) diff --git a/src/PVE/API2/LXC.pm b/src/PVE/API2/LXC.pm index 0bcadc3..71cf21d 100644 --- a/src/PVE/API2/LXC.pm +++ b/src/PVE/API2/LXC.pm @@ -1374,4 +1374,120 @@ __PACKAGE__->register_method({ return PVE::LXC::Config->lock_config($vmid, $code);; }}); +__PACKAGE__->register_method({ +name => 'move_ct_volume', +path => '{vmid}/move_volume', +method => 'POST', +protected => 1, +proxyto => 'node', +description => "Move a rootfs-/mp-volume to a different storage", +permissions => { + description => "You need 'VM.Config.Disk' permissions on /vms/{vmid}, " . + "and 'Datastore.AllocateSpace' permissions on the storage.", + check => + [ 'and', + ['perm', '/vms/{vmid}', [ 'VM.Config.Disk' ]], + ['perm', '/storage/{storage}', [ 'Datastore.AllocateSpace' ]], + ], +}, +parameters => { + additionalProperties => 0, + properties => { + node => get_standard_option('pve-node'), + vmid => get_standard_option('pve-vmid', { completion => \::LXC::complete_ctid }), + storage => get_standard_option('pve-storage-id', { + description => "Target Storage.", + default => 'local', + completion => \::Storage::complete_storage_enabled, + }), + delete => { + type => 'boolean', + description => "Delete the original volume after successful copy. By default the original disk is kept as unused disk.", + optional => 1, + default => 0, + }, + volume => { + type => 'string', + description => "Volume which will move.\n Format: [rootfs|mp]", + + }, + digest => { + type => 'string', + description => 'Prevent changes if current configuration file has different SHA1 digest. This can be used to prevent concurrent modifications.', + maxLength => 40, + optional => 1, + } + }, +}, +returns => { + type => 'string', +}, +code => sub { + my ($param) = @_; + + my $rpcenv = PVE::RPCEnvironment::get(); + + my $authuser = $rpcenv->get_user(); + + my $node = extract_param($param, 'node'); + + my $vmid = extract_param($param, 'vmid'); + + my $storage = extract_param($param, 'storage'); + + my $volume = extract_param($param, 'volume'); + + my $delete = extract_param($param, 'delete'); + + my $digest = extract_param($param, 'digest'); + + my $code = sub { + + my $conf = PVE::LXC::Config->load_config($vmid); + PVE::LXC::Config->check_lock($conf); + + die "can't move volume: $volume if snapshot exists\n" + if %{$conf->{snapshots}}; + + PVE::Tools::assert_if_modified($digest, $conf->{digest}); + + die "Move Volume can't be done online.\n" if PVE::LXC::check_running($vmid); + + PVE::Cluster::log_msg('info', $authuser, "move_volume CT:$vmid Volume:$volume to Storage:$storage"); + my $realcmd = sub { + + my $storage_cfg = PVE::Storage::config(); + + my $mp; + if ($volume eq 'rootfs') { + $mp = PVE::LXC::Config->parse_ct_rootfs($conf->{$volume}); + } else { + $mp = PVE::LXC::Config->parse_ct_mountpoint($conf->{$volume}); + } + + my $old_volid = $mp->{volume}; + + eval { + $mp->{volume} = PVE::LXC::copy_volume($mp, $vmid, $vmid, $storage, $storage_cfg, $conf); + + $conf->{$volume} = PVE::LXC::Config->print_ct_mountpoint($mp, $volume eq 'rootfs'); + }; + if (my $err = $@) { + die $err; + } + + if ($delete) { + PVE::Storage::vdisk_free($storage_cfg, $old_volid); + } else { + my $unused = PVE::LXC::Config->add_unused_volume($conf, $old_volid); + $conf->{$unused} = $old_volid; + } + PVE::LXC::Config->write_config($vmid, $conf); + }; + + return $rpcenv->fork_worker('move_volume', $vmid, $authuser, $realcmd); + }; + + return PVE::LXC::Config->lock_config($vmid, $code); + }}); 1; diff --git a/src/PVE/CLI/pct.pm b/src/PVE/CLI/pct.pm index ca87229..b6b834d 100755 --- a/src/PVE/CLI/pct.pm +++ b/src/PVE/CLI/pct.pm @@ -564,6 +564,7 @@ our $cmddef = { clone => [ "PVE::API2::LXC", 'clone_vm', ['vmid', 'newid'], { node => $nodename }, $upid_exit ], migrate => [