Re: [pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-13 Thread Alwin Antreich
Some comments inline.

On Fri, Dec 13, 2019 at 03:56:42PM +0100, Aaron Lauterer wrote:
> It's possible to have a situation where the cluster network (used for
> inter-OSD traffic) is not configured on a node. The OSD can still be
> created but can't communicate.
> 
> This check will abort the creation if there is no IP within the subnet
> of the cluster network present on the node. If there is no dedicated
> cluster network the public network is used. The chances of that not
> being configured is much lower but better be on the safe side and check
> it if there is no cluster network.
> 
> Signed-off-by: Aaron Lauterer 
> ---
>  PVE/API2/Ceph/OSD.pm | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index 5f70cf58..59cc9567 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>   # extract parameter info and fail if a device is set more than once
>   my $devs = {};
>  
> + my $ceph_conf = cfs_read_file('ceph.conf');
The public/cluster networks could have been migrated into the MON DB. In
this case they would not appear in the ceph.conf.

ATM it might be unlikely, there is an ugly warning, with every command
execution. But still possible.
```
Configuration option 'cluster_network' may not be modified at runtime
```

> +
> + # check if network is configured
> + my $osd_network = $ceph_conf->{global}->{cluster_network}
> + // $ceph_conf->{global}->{public_network};
An OSD needs both networks. Public for communication with the MONS &
clients. And the cluster network for replication. On our default setup,
it's both the same network.

I have tested the OSD creation with the cluster network down. During
creation, it only needs the public network to create the OSD on the MON.
But the OSD can't start and therefore isn't placed on the CRUSH map.
Once it can start, it will be added to the correct location on the map.

IMHO, the code needs to check both.

> + die "No network interface configured for subnet $osd_network. Check ".
> + "your network config.\n" if 
> !@{PVE::Network::get_local_ip_from_cidr($osd_network)};
> +
>   # FIXME: rename params on next API compatibillity change (7.0)
>   $param->{wal_dev_size} = delete $param->{wal_size};
>   $param->{db_dev_size} = delete $param->{db_size};
> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>   my $fsid = $monstat->{monmap}->{fsid};
>  $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>  
> - my $ceph_conf = cfs_read_file('ceph.conf');
>   my $ceph_bootstrap_osd_keyring = 
> PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>  
>   if (! -f $ceph_bootstrap_osd_keyring && 
> $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> -- 
> 2.20.1

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-13 Thread Aaron Lauterer
It's possible to have a situation where the cluster network (used for
inter-OSD traffic) is not configured on a node. The OSD can still be
created but can't communicate.

This check will abort the creation if there is no IP within the subnet
of the cluster network present on the node. If there is no dedicated
cluster network the public network is used. The chances of that not
being configured is much lower but better be on the safe side and check
it if there is no cluster network.

Signed-off-by: Aaron Lauterer 
---
 PVE/API2/Ceph/OSD.pm | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
index 5f70cf58..59cc9567 100644
--- a/PVE/API2/Ceph/OSD.pm
+++ b/PVE/API2/Ceph/OSD.pm
@@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
# extract parameter info and fail if a device is set more than once
my $devs = {};
 
+   my $ceph_conf = cfs_read_file('ceph.conf');
+
+   # check if network is configured
+   my $osd_network = $ceph_conf->{global}->{cluster_network}
+   // $ceph_conf->{global}->{public_network};
+   die "No network interface configured for subnet $osd_network. Check ".
+   "your network config.\n" if 
!@{PVE::Network::get_local_ip_from_cidr($osd_network)};
+
# FIXME: rename params on next API compatibillity change (7.0)
$param->{wal_dev_size} = delete $param->{wal_size};
$param->{db_dev_size} = delete $param->{db_size};
@@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
my $fsid = $monstat->{monmap}->{fsid};
 $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
 
-   my $ceph_conf = cfs_read_file('ceph.conf');
my $ceph_bootstrap_osd_keyring = 
PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
 
if (! -f $ceph_bootstrap_osd_keyring && 
$ceph_conf->{global}->{auth_client_required} eq 'cephx') {
-- 
2.20.1


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v2 qemu-server 3/3] apply_pending: handle errors gracefully and remove redundant config write calls

2019-12-13 Thread Oguz Bektas
wrap code in eval to handle errors gracefully.

that way we can replace redundant write/load config calls with a single
write_config at the end to avoid unnecessary i/o

Signed-off-by: Oguz Bektas 
---
 PVE/QemuServer.pm | 52 +++
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index ed6b557..abb2d83 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4990,42 +4990,42 @@ sub vmconfig_apply_pending {
 
 my $pending_delete_hash = 
PVE::QemuConfig->parse_pending_delete($conf->{pending}->{delete});
 foreach my $opt (sort keys %$pending_delete_hash) {
-   die "internal error" if $opt =~ m/^unused/;
my $force = $pending_delete_hash->{$opt}->{force};
-   $conf = PVE::QemuConfig->load_config($vmid); # update/reload
-   if (!defined($conf->{$opt})) {
-   PVE::QemuConfig->remove_from_pending_delete($conf, $opt);
-   PVE::QemuConfig->write_config($vmid, $conf);
-   } elsif (is_valid_drivename($opt)) {
-   vmconfig_delete_or_detach_drive($vmid, $storecfg, $conf, $opt, 
$force);
-   PVE::QemuConfig->remove_from_pending_delete($conf, $opt);
-   delete $conf->{$opt};
-   PVE::QemuConfig->write_config($vmid, $conf);
+   eval {
+   die "internal error" if $opt =~ m/^unused/;
+   if (!defined($conf->{$opt})) {
+   PVE::QemuConfig->remove_from_pending_delete($conf, $opt);
+   } elsif (is_valid_drivename($opt)) {
+   vmconfig_delete_or_detach_drive($vmid, $storecfg, $conf, $opt, 
$force);
+   }
+   };
+   if (my $err = $@) {
+   $add_apply_error->($opt, $err);
} else {
PVE::QemuConfig->remove_from_pending_delete($conf, $opt);
-   delete $conf->{$opt};
-   PVE::QemuConfig->write_config($vmid, $conf);
+   delete $conf->${opt};
}
 }
 
-$conf = PVE::QemuConfig->load_config($vmid); # update/reload
-
 foreach my $opt (keys %{$conf->{pending}}) { # add/change
-   $conf = PVE::QemuConfig->load_config($vmid); # update/reload
-
-   if (defined($conf->{$opt}) && ($conf->{$opt} eq 
$conf->{pending}->{$opt})) {
-   # skip if nothing changed
-   } elsif (is_valid_drivename($opt)) {
-   vmconfig_register_unused_drive($storecfg, $vmid, $conf, 
parse_drive($opt, $conf->{$opt}))
-   if defined($conf->{$opt});
-   $conf->{$opt} = $conf->{pending}->{$opt};
+   next if $opt eq 'delete'; # just to be sure
+   eval {
+   if (defined($conf->{$opt}) && ($conf->{$opt} eq 
$conf->{pending}->{$opt})) {
+   # skip if nothing changed
+   } elsif (is_valid_drivename($opt)) {
+   vmconfig_register_unused_drive($storecfg, $vmid, $conf, 
parse_drive($opt, $conf->{$opt}))
+   if defined($conf->{$opt});
+   }
+   };
+   if (my $err = $@) {
+   $add_apply_error->($opt, $err);
} else {
-   $conf->{$opt} = $conf->{pending}->{$opt};
+   PVE::QemuConfig->cleanup_pending($conf);
+   $conf->{$opt} = delete $conf->{pending}->{$opt};
}
-
-   delete $conf->{pending}->{$opt};
-   PVE::QemuConfig->write_config($vmid, $conf);
 }
+
+PVE::QemuConfig->write_config($vmid, $conf);
 }
 
 my $safe_num_ne = sub {
-- 
2.20.1

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v2 qemu-server 2/3] add error handling to vmconfig_apply_pending

2019-12-13 Thread Oguz Bektas
---
 PVE/API2/Qemu.pm  | 6 +++---
 PVE/QemuServer.pm | 9 -
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/PVE/API2/Qemu.pm b/PVE/API2/Qemu.pm
index 3c7ef30..baa96f2 100644
--- a/PVE/API2/Qemu.pm
+++ b/PVE/API2/Qemu.pm
@@ -1241,13 +1241,13 @@ my $update_vm_api  = sub {
 
$conf = PVE::QemuConfig->load_config($vmid); # update/reload
 
+   my $errors = {};
if ($running) {
-   my $errors = {};
PVE::QemuServer::vmconfig_hotplug_pending($vmid, $conf, 
$storecfg, $modified, $errors);
-   raise_param_exc($errors) if scalar(keys %$errors);
} else {
-   PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, 
$storecfg, $running);
+   PVE::QemuServer::vmconfig_apply_pending($vmid, $conf, 
$storecfg, $running, $errors);
}
+   raise_param_exc($errors) if scalar(keys %$errors);
 
return;
};
diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 9c89be5..ed6b557 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4977,7 +4977,14 @@ sub vmconfig_delete_or_detach_drive {
 
 
 sub vmconfig_apply_pending {
-my ($vmid, $conf, $storecfg) = @_;
+my ($vmid, $conf, $storecfg, $errors) = @_;
+
+my $add_apply_error = sub {
+   my ($opt, $msg) = @_;
+   my $err_msg = "unable to apply pending change $opt : $msg";
+   $errors->{$opt} = $err_msg;
+   warn $err_msg;
+};
 
 # cold plug
 
-- 
2.20.1

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH v2 qemu-server 1/3] hotplug_pending: remove redundant write/load config calls

2019-12-13 Thread Oguz Bektas
instead of writing the config after every change, we can do it once for
all the changes in the end to avoid redundant i/o.

we also don't need to load_config after writing fastplug changes.

Signed-off-by: Oguz Bektas 
---
 PVE/QemuServer.pm | 9 ++---
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/PVE/QemuServer.pm b/PVE/QemuServer.pm
index 09a1559..9c89be5 100644
--- a/PVE/QemuServer.pm
+++ b/PVE/QemuServer.pm
@@ -4779,7 +4779,6 @@ sub vmconfig_hotplug_pending {
 
 if ($changes) {
PVE::QemuConfig->write_config($vmid, $conf);
-   $conf = PVE::QemuConfig->load_config($vmid); # update/reload
 }
 
 my $hotplug_features = parse_hotplug_features(defined($conf->{hotplug}) ? 
$conf->{hotplug} : '1');
@@ -4839,11 +4838,8 @@ sub vmconfig_hotplug_pending {
if (my $err = $@) {
&$add_error($opt, $err) if $err ne "skip\n";
} else {
-   # save new config if hotplug was successful
delete $conf->{$opt};
PVE::QemuConfig->remove_from_pending_delete($conf, $opt);
-   PVE::QemuConfig->write_config($vmid, $conf);
-   $conf = PVE::QemuConfig->load_config($vmid); # update/reload
}
 }
 
@@ -4931,13 +4927,12 @@ sub vmconfig_hotplug_pending {
if (my $err = $@) {
&$add_error($opt, $err) if $err ne "skip\n";
} else {
-   # save new config if hotplug was successful
$conf->{$opt} = $value;
delete $conf->{pending}->{$opt};
-   PVE::QemuConfig->write_config($vmid, $conf);
-   $conf = PVE::QemuConfig->load_config($vmid); # update/reload
}
 }
+
+PVE::QemuConfig->write_config($vmid, $conf);
 }
 
 sub try_deallocate_drive {
-- 
2.20.1

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [RFC manager 1/2] api: backup: Add endpoint for disk included status

2019-12-13 Thread Fabian Grünbichler
On December 12, 2019 11:27 am, Aaron Lauterer wrote:
> This endpoint provides information if disks and mountpoints of guests are
> included in a VZDump job.
> 
> The returned object is formatted to be used with the TreePanel of ExtJS.
> 
> Signed-off-by: Aaron Lauterer 
> ---
> 
> Unfortunately the logic of which disk / mp will be included is deeply
> engrained in the LXC and Qemu VZDump plugins and hard to use outside.
> Since AFAIU there are not that many cases I implemented the checks
> again.

haven't taken a closer look at the rest, but what about implementing 
this via AbstractConfig?

either something similar to get_replicatable_volumes (but not just 
returning the volumes that get backed up, but the complete picture), or 
a helper that you can call for each disk/mountpoint that returns whether 
it will be included in a backup or not (optionally, with reason (e.g., 
config, default, bindmount, ...)). IIRC we want to put a general disk 
iterator into AbstractConfig anyway, meaning you'd just have to use the 
correct AbstractConfig implementation and have a unified interface that 
"does the right thing" ;)

this is a bit more work as you need to adapt the other callers/backup 
code to use these new methods, but it would consolidate the logic and 
default in one place, and prevent them from diverting in the future.

>  PVE/API2/Backup.pm | 168 +
>  1 file changed, 168 insertions(+)
> 
> diff --git a/PVE/API2/Backup.pm b/PVE/API2/Backup.pm
> index 86377c0a..e7baaa3b 100644
> --- a/PVE/API2/Backup.pm
> +++ b/PVE/API2/Backup.pm
> @@ -324,4 +324,172 @@ __PACKAGE__->register_method({
>   die "$@" if ($@);
>  }});
>  
> +__PACKAGE__->register_method({
> +name => 'get_disk_status',
> +path => '{id}/included_disks',
> +method => 'GET',
> +protected => 1,
> +description => "Shows included guests and the backup status of their 
> disks.",
> +permissions => {
> + check => ['perm', '/', ['Sys.Audit']],
> +},
> +parameters => {
> + additionalProperties => 0,
> + properties => {
> + id => $vzdump_job_id_prop
> + },
> +},
> +returns => {
> + type => 'object',
> + properties => {
> + children => {
> + type => 'array',
> + items => {
> + type => 'object',
> + properties => {
> + id => {
> + type => 'string',
> + description => 'ID of the guest.',
> + },
> + type => {
> + type => 'string',
> + description => 'Type of the guest, VM or CT.',
> + },
> + children => {
> + type => 'array',
> + description => 'The disks or mount points of the 
> guest with the information if they will be included in backups.',
> + items => {
> + type => 'object',
> + properties => {
> + id => {
> + type => 'string',
> + description => 'Name of the disk or 
> mount point.',
> + },
> + status => {
> + type => 'string',
> + description => 'The included status of 
> the disk or mount point.',
> + },
> + },
> + },
> + },
> + },
> + },
> + },
> + },
> +},
> +code => sub {
> + my ($param) = @_;
> +
> + my $vzconf = cfs_read_file('vzdump.cron');
> + my $jobs = $vzconf->{jobs} || [];
> + my $job;
> +
> + foreach my $j (@$jobs) {
> + $job = $j if $j->{id} eq $param->{id};
> + }
> + raise_param_exc({ id => "No such job '$param->{id}'" }) if !$job;
> +
> + my @vmids;
> + my $vmlist = PVE::Cluster::get_vmlist();
> +
> + # get VMIDs from pool or selected individual guests
> + if ($job->{pool}) {
> + @vmids = 
> @{PVE::API2Tools::get_resource_pool_guest_members($job->{pool})};
> + } elsif ($job->{vmid}) {
> + @vmids = PVE::Tools::split_list(extract_param($job, 'vmid'));
> + }
> +
> + # get excluded guests
> + my @exclude = PVE::Tools::split_list(extract_param($job, 'exclude'));
> + @exclude = @{PVE::VZDump::check_vmids(@exclude)};
> + @vmids = @{PVE::VZDump::check_vmids(@vmids)};
> +
> + my $excludehash = { map { $_ => 1 } @exclude };
> +
> + # no list of guests? (from pool or manually)
> + # get all except excluded
> + if (!@vmids) {
> + if ($job->{all} && $job->{node}) {
> + foreach my $id (keys %{ $vmlist->{ids} }) {
> + push