subject:"Re\: \[openstack\-dev\] \[Fuel\] Order of network interfaces for bootstrap nodes"

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

2014-11-20 Thread Andrew Woodward

In order for this to occur, this means that the node has to be
bootstrapped and discover to nailgun, added to a cluster, and then
bootstrap again (reboot) and have the agent update with a different
nic order?

i think the issue will only occur when networks are mapped to the
interfaces, in this case the root cause is that the ethX name is used
as the key attribute for updates, but really the mac should be the
real key. If we change this behavior, then we should be able to have
it update properly regardless of the current interface name.

On Thu, Nov 20, 2014 at 12:01 PM, Dmitriy Shulyak dshul...@mirantis.com wrote:
 Hi folks,

 There was interesting research today on random nics ordering for nodes in
 bootstrap stage. And in my opinion it requires separate thread...
 I will try to describe what the problem is and several ways to solve it.
 Maybe i am missing the simple way, if you see it - please participate.
 Link to LP bug: https://bugs.launchpad.net/fuel/+bug/1394466

 When a node is booted first time it registers its interfaces in nailgun, see
 sample of data (only related to discussion parts):
 - name: eth0
   ip: 10.0.0.3/24
   mac: 00:00:03
 - name: eth1
   ip: None
   mac: 00:00:04
 * eth0 is admin network interface which was used for initial pxe boot

 We have networks, for simplicity lets assume there is 2:
  - admin
  - public
 When the node is added to cluster, in general you will see next schema:
 - name: eth0
   ip: 10.0.0.3/24
   mac: 00:00:03
   networks:
 - admin
 - public
 - name: eth1
   ip: None
   mac: 00:00:04

 At this stage node is still using default system with bootstrap profile, so
 there is no custom system with udev rules. And on next reboot there is no
 way to guarantee that network cards will be discovered by kernel in same
 order. If network cards is discovered in order that is diffrent from
 original and nics configuration is updated, it is possible to end up with:
 - name: eth0
   ip: None
   mac: 00:00:04
   networks:
 - admin
 - public
 - name: eth1
   mac: 00:00:03
   ip: 10.0.0.3/24
 Here you can see that networks is left connected to eth0 (in db). And
 ofcourse this schema doesnt reflect physical infrastructure. I hope it is
 clear now what is the problem.
 If you want to investigate it yourself, please find db dump in snapshot
 attached to the bug, you will be able to find described here case.
 What happens next:
 1. netcfg/choose_interface for kernel is misconfigured, and in my example it
 will be 00:00:04, but should be 00:00:03
 2. network configuration for l23network will be simply corrupted

 So - possible solutions:
 1. Reflect node interfaces ordering, with networks reassignment - Hard and
 hackish
 2. Do not update any interfaces info if networks assigned to them, then udev
 rules will be applied and nics will be reordered into original state - i
 would say easy and reliable solution
 3. Create cobbler system when node is booted first time, and add udev rules
 - it looks to me like proper solution, but requires design

 Please share your thoughts/ideas, afaik this issue is not rare on scale
 deployments.
 Thank you

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Andrew
Mirantis
Ceph community

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

2014-11-20 Thread Ryan Moe

Could this be caused by a case mismatch between the MAC address as it
exists in the database and the MAC that comes from the agent?

When the interfaces are updated with data from the agent we attempt to
match the MAC to an existing interface (
https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/network/manager.py#L682-L690).
If that doesn't work we attempt to match by name. Looking at the data that
comes from the agent the MAC is always capitalized while in the database
it's lower-case. It seems like checking the MAC will fail and we'll fall
through to matching by name.

If the interfaces haven't been reordered then it doesn't matter whether or
not we match on name or MAC. However, if the order has changed we'll have
an issue. When the interfaces are matched by name they'll be updated with
the agent info. Because we matched by name that will stay the same and
we'll update the MAC instead, which isn't what we want.

e.g.
First boot:
1 | eth0 | 00:aa
2 | eth1 |00:bb

If the interface order is changed we'll have (as sent by the agent):
eth0 (00:BB)
eth1 (00:AA)

Because the MAC case doesn't match we'll end up matching by name. This
means we update the wrong database record. We have:

1 | eth0 | 00:bb
2 | eth1 | 00:aa

Instead of

1 | eth1 | 00:aa
2 | eth0 | 00:bb

On Thu, Nov 20, 2014 at 4:29 PM, Andrew Woodward xar...@gmail.com wrote:

In order for this to occur, this means that the node has to be
bootstrapped and discover to nailgun, added to a cluster, and then
bootstrap again (reboot) and have the agent update with a different
nic order?

i think the issue will only occur when networks are mapped to the
interfaces, in this case the root cause is that the ethX name is used
as the key attribute for updates, but really the mac should be the
real key. If we change this behavior, then we should be able to have
it update properly regardless of the current interface name.

On Thu, Nov 20, 2014 at 12:01 PM, Dmitriy Shulyak dshul...@mirantis.com
wrote:
Hi folks,

There was interesting research today on random nics ordering for nodes in
bootstrap stage. And in my opinion it requires separate thread...
I will try to describe what the problem is and several ways to solve it.
Maybe i am missing the simple way, if you see it - please participate.
Link to LP bug: https://bugs.launchpad.net/fuel/+bug/1394466

When a node is booted first time it registers its interfaces in nailgun,
see
sample of data (only related to discussion parts):
- name: eth0
ip: 10.0.0.3/24
mac: 00:00:03
- name: eth1
ip: None
mac: 00:00:04
* eth0 is admin network interface which was used for initial pxe boot

We have networks, for simplicity lets assume there is 2:
- admin
- public
When the node is added to cluster, in general you will see next schema:
- name: eth0
ip: 10.0.0.3/24
mac: 00:00:03
networks:
- admin
- public
- name: eth1
ip: None
mac: 00:00:04

At this stage node is still using default system with bootstrap profile,
so
there is no custom system with udev rules. And on next reboot there is no
way to guarantee that network cards will be discovered by kernel in same
order. If network cards is discovered in order that is diffrent from
original and nics configuration is updated, it is possible to end up
with:
- name: eth0
ip: None
mac: 00:00:04
networks:
- admin
- public
- name: eth1
mac: 00:00:03
ip: 10.0.0.3/24
Here you can see that networks is left connected to eth0 (in db). And
ofcourse this schema doesnt reflect physical infrastructure. I hope it is
clear now what is the problem.
If you want to investigate it yourself, please find db dump in snapshot
attached to the bug, you will be able to find described here case.
What happens next:
1. netcfg/choose_interface for kernel is misconfigured, and in my
example it
will be 00:00:04, but should be 00:00:03
2. network configuration for l23network will be simply corrupted

So - possible solutions:
1. Reflect node interfaces ordering, with networks reassignment - Hard
and
hackish
2. Do not update any interfaces info if networks assigned to them, then
udev
rules will be applied and nics will be reordered into original state - i
would say easy and reliable solution
3. Create cobbler system when node is booted first time, and add udev
rules
- it looks to me like proper solution, but requires design

Please share your thoughts/ideas, afaik this issue is not rare on scale
deployments.
Thank you

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

--
Andrew
Mirantis
Ceph community

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

2014-11-20 Thread Dmitriy Shulyak



 When the interfaces are updated with data from the agent we attempt to
 match the MAC to an existing interface (
 https://github.com/stackforge/fuel-web/blob/master/nailgun/nailgun/network/manager.py#L682-L690).
 If that doesn't work we attempt to match by name. Looking at the data that
 comes from the agent the MAC is always capitalized while in the database
 it's lower-case. It seems like checking the MAC will fail and we'll fall
 through to matching by name.


 Thank you! I think it is correct, and I made the problem more complicated
than it is ))
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

Re: [openstack-dev] [Fuel] Order of network interfaces for bootstrap nodes

3 matches

Site Navigation

Mail list logo

Footer information