Re: [pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-14 Thread Thomas Lamprecht
On 12/13/19 3:56 PM, Aaron Lauterer wrote:
> It's possible to have a situation where the cluster network (used for
> inter-OSD traffic) is not configured on a node. The OSD can still be
> created but can't communicate.
> 
> This check will abort the creation if there is no IP within the subnet
> of the cluster network present on the node. If there is no dedicated
> cluster network the public network is used. The chances of that not
> being configured is much lower but better be on the safe side and check
> it if there is no cluster network.
> 

As said in another mail, patch is OK semantically, thanks!
Some style nits inline, though.

> Signed-off-by: Aaron Lauterer 
> ---
>  PVE/API2/Ceph/OSD.pm | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index 5f70cf58..59cc9567 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>   # extract parameter info and fail if a device is set more than once
>   my $devs = {};
>  
> + my $ceph_conf = cfs_read_file('ceph.conf');
> +
> + # check if network is configured
> + my $osd_network = $ceph_conf->{global}->{cluster_network}
> + // $ceph_conf->{global}->{public_network};
> + die "No network interface configured for subnet $osd_network. Check ".
> + "your network config.\n" if 
> !@{PVE::Network::get_local_ip_from_cidr($osd_network)};

I'd like to avoid a post-if if its statement goes over multiple lines, i.e.:

OK:
die "foo" if !check; 

OK:
die "a bit of a longer error message here, we need to be more yadda, yadda"
if !check;

*NOT* OK:
die "even a longer error message here, we need to be really really really ".
"more yadda, yadda"
if !check;


The check itself should ideally also not span multiple lines, but exceptions can
be OK for that.

Also @{function_call()} is something which should be rather avoided, if the
alternatives are somewhat easily to do, here I'd maybe do:

my $cluster_net_ips = PVE::Network::get_local_ip_from_cidr($osd_network);
if (scalar(@$cluster_net_ips) < 1) {
   die "..."
}

> +
>   # FIXME: rename params on next API compatibillity change (7.0)
>   $param->{wal_dev_size} = delete $param->{wal_size};
>   $param->{db_dev_size} = delete $param->{db_size};
> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>   my $fsid = $monstat->{monmap}->{fsid};
>  $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>  
> - my $ceph_conf = cfs_read_file('ceph.conf');
>   my $ceph_bootstrap_osd_keyring = 
> PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>  
>   if (! -f $ceph_bootstrap_osd_keyring && 
> $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> 


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-14 Thread Thomas Lamprecht
On 12/13/19 7:14 PM, Alwin Antreich wrote:
> Some comments inline.
> 
> On Fri, Dec 13, 2019 at 03:56:42PM +0100, Aaron Lauterer wrote:
>> It's possible to have a situation where the cluster network (used for
>> inter-OSD traffic) is not configured on a node. The OSD can still be
>> created but can't communicate.
>>
>> This check will abort the creation if there is no IP within the subnet
>> of the cluster network present on the node. If there is no dedicated
>> cluster network the public network is used. The chances of that not
>> being configured is much lower but better be on the safe side and check
>> it if there is no cluster network.
>>
>> Signed-off-by: Aaron Lauterer 
>> ---
>>  PVE/API2/Ceph/OSD.pm | 9 -
>>  1 file changed, 8 insertions(+), 1 deletion(-)
>>
>> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
>> index 5f70cf58..59cc9567 100644
>> --- a/PVE/API2/Ceph/OSD.pm
>> +++ b/PVE/API2/Ceph/OSD.pm
>> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>>  # extract parameter info and fail if a device is set more than once
>>  my $devs = {};
>>  
>> +my $ceph_conf = cfs_read_file('ceph.conf');
> The public/cluster networks could have been migrated into the MON DB. In
> this case they would not appear in the ceph.conf.
> 
> ATM it might be unlikely, there is an ugly warning, with every command
> execution. But still possible.
> ```
> Configuration option 'cluster_network' may not be modified at runtime
> ```

This is for setups which are managed by PVE-Ceph, so no concern for that.
Further, as you mentioned, it's highly unlikely, and can be work-arounded
so no concern for now. We also have more places which such potential
"issues", this is something we should look at for the next Ceph release,
Octopus, see were they're going (ceph tends to change mind sometimes) and
so see if we want/need to cope with the config DB better.

> 
>> +
>> +# check if network is configured
>> +my $osd_network = $ceph_conf->{global}->{cluster_network}
>> +// $ceph_conf->{global}->{public_network};
> An OSD needs both networks. Public for communication with the MONS &
> clients. And the cluster network for replication. On our default setup,
> it's both the same network.
> 
> I have tested the OSD creation with the cluster network down. During
> creation, it only needs the public network to create the OSD on the MON.
> But the OSD can't start and therefore isn't placed on the CRUSH map.
> Once it can start, it will be added to the correct location on the map.
> 
> IMHO, the code needs to check both.

This checks if the OSD communication is up, and is thus a valid fix for #2496,
if the public network is down the OSD cannot be created anyway, so there's
already an error - but yes, we could also check for the public network, would
not harm, the patch is good as is on its own though.

> 
>> +die "No network interface configured for subnet $osd_network. Check ".
>> +"your network config.\n" if 
>> !@{PVE::Network::get_local_ip_from_cidr($osd_network)};
>> +
>>  # FIXME: rename params on next API compatibillity change (7.0)
>>  $param->{wal_dev_size} = delete $param->{wal_size};
>>  $param->{db_dev_size} = delete $param->{db_size};
>> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>>  my $fsid = $monstat->{monmap}->{fsid};
>>  $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>>  
>> -my $ceph_conf = cfs_read_file('ceph.conf');
>>  my $ceph_bootstrap_osd_keyring = 
>> PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>>  
>>  if (! -f $ceph_bootstrap_osd_keyring && 
>> $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
>> -- 
>> 2.20.1



___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


Re: [pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-13 Thread Alwin Antreich
Some comments inline.

On Fri, Dec 13, 2019 at 03:56:42PM +0100, Aaron Lauterer wrote:
> It's possible to have a situation where the cluster network (used for
> inter-OSD traffic) is not configured on a node. The OSD can still be
> created but can't communicate.
> 
> This check will abort the creation if there is no IP within the subnet
> of the cluster network present on the node. If there is no dedicated
> cluster network the public network is used. The chances of that not
> being configured is much lower but better be on the safe side and check
> it if there is no cluster network.
> 
> Signed-off-by: Aaron Lauterer 
> ---
>  PVE/API2/Ceph/OSD.pm | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
> index 5f70cf58..59cc9567 100644
> --- a/PVE/API2/Ceph/OSD.pm
> +++ b/PVE/API2/Ceph/OSD.pm
> @@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
>   # extract parameter info and fail if a device is set more than once
>   my $devs = {};
>  
> + my $ceph_conf = cfs_read_file('ceph.conf');
The public/cluster networks could have been migrated into the MON DB. In
this case they would not appear in the ceph.conf.

ATM it might be unlikely, there is an ugly warning, with every command
execution. But still possible.
```
Configuration option 'cluster_network' may not be modified at runtime
```

> +
> + # check if network is configured
> + my $osd_network = $ceph_conf->{global}->{cluster_network}
> + // $ceph_conf->{global}->{public_network};
An OSD needs both networks. Public for communication with the MONS &
clients. And the cluster network for replication. On our default setup,
it's both the same network.

I have tested the OSD creation with the cluster network down. During
creation, it only needs the public network to create the OSD on the MON.
But the OSD can't start and therefore isn't placed on the CRUSH map.
Once it can start, it will be added to the correct location on the map.

IMHO, the code needs to check both.

> + die "No network interface configured for subnet $osd_network. Check ".
> + "your network config.\n" if 
> !@{PVE::Network::get_local_ip_from_cidr($osd_network)};
> +
>   # FIXME: rename params on next API compatibillity change (7.0)
>   $param->{wal_dev_size} = delete $param->{wal_size};
>   $param->{db_dev_size} = delete $param->{db_size};
> @@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
>   my $fsid = $monstat->{monmap}->{fsid};
>  $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
>  
> - my $ceph_conf = cfs_read_file('ceph.conf');
>   my $ceph_bootstrap_osd_keyring = 
> PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
>  
>   if (! -f $ceph_bootstrap_osd_keyring && 
> $ceph_conf->{global}->{auth_client_required} eq 'cephx') {
> -- 
> 2.20.1

___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel


[pve-devel] [PATCH manager] API: OSD: Fix #2496 Check OSD Network

2019-12-13 Thread Aaron Lauterer
It's possible to have a situation where the cluster network (used for
inter-OSD traffic) is not configured on a node. The OSD can still be
created but can't communicate.

This check will abort the creation if there is no IP within the subnet
of the cluster network present on the node. If there is no dedicated
cluster network the public network is used. The chances of that not
being configured is much lower but better be on the safe side and check
it if there is no cluster network.

Signed-off-by: Aaron Lauterer 
---
 PVE/API2/Ceph/OSD.pm | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/PVE/API2/Ceph/OSD.pm b/PVE/API2/Ceph/OSD.pm
index 5f70cf58..59cc9567 100644
--- a/PVE/API2/Ceph/OSD.pm
+++ b/PVE/API2/Ceph/OSD.pm
@@ -275,6 +275,14 @@ __PACKAGE__->register_method ({
# extract parameter info and fail if a device is set more than once
my $devs = {};
 
+   my $ceph_conf = cfs_read_file('ceph.conf');
+
+   # check if network is configured
+   my $osd_network = $ceph_conf->{global}->{cluster_network}
+   // $ceph_conf->{global}->{public_network};
+   die "No network interface configured for subnet $osd_network. Check ".
+   "your network config.\n" if 
!@{PVE::Network::get_local_ip_from_cidr($osd_network)};
+
# FIXME: rename params on next API compatibillity change (7.0)
$param->{wal_dev_size} = delete $param->{wal_size};
$param->{db_dev_size} = delete $param->{db_size};
@@ -330,7 +338,6 @@ __PACKAGE__->register_method ({
my $fsid = $monstat->{monmap}->{fsid};
 $fsid = $1 if $fsid =~ m/^([0-9a-f\-]+)$/;
 
-   my $ceph_conf = cfs_read_file('ceph.conf');
my $ceph_bootstrap_osd_keyring = 
PVE::Ceph::Tools::get_config('ceph_bootstrap_osd_keyring');
 
if (! -f $ceph_bootstrap_osd_keyring && 
$ceph_conf->{global}->{auth_client_required} eq 'cephx') {
-- 
2.20.1


___
pve-devel mailing list
pve-devel@pve.proxmox.com
https://pve.proxmox.com/cgi-bin/mailman/listinfo/pve-devel