Re: [lustre-discuss] Lustre 2.12 routing with MR and discovery off

2020-08-30 Thread Andreas Dilger
On Aug 26, 2020, at 4:37 PM, Faaland, Olaf P.  wrote:
> 
> Does Lustre 2.12 require that routes for every intermediate network are 
> defined, on every node on a path?
> 
> For example, given this Lustre network, where:
>  A-D are nodes and 1-6 are addresses
>  network tcp2 has only routers, no clients and no servers
> 
> A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D
> 
> And configured routes:
> 
> A: options lnet routes="tcp3 2@tcp1"
> B: options lnet routes="tcp3 4@tcp2"
> C: options lnet routes="tcp1 3@tcp2"
> D: options lnet routes="tcp1 5@tcp3"
> 
> With Lustre <= 2.10 we configured only these routes.  The only nodes that 
> need to know tcp2 exist are attached to it, and so there are no routes to 
> tcp2 defined anywhere.
> 
> It looks to me like Lustre 2.12 attempts to send error notifications back to 
> the original sender, and so nodes A and D may end up receiving messages from 
> nids on tcp2.  This then requires nodes A and D to have routes to tcp2 
> defined, so they can reply to the messages.

This is interesting.  I'm not an LNet expert, but it seems strange to me that
nodes other than "B" and "C" should care about the state of connections within
@tcp2 if they are not endpoints.  They should never be sending messages directly
to those nodes, and the LNet routers B/C knowing which connections/peers are
working should be enough for them to make routing decisions for A and D.

Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


Re: [lustre-discuss] Lustre 2.12 routing with MR and discovery off

2020-08-29 Thread Andreas Dilger
On Aug 26, 2020, at 16:37, Faaland, Olaf P. 
mailto:faala...@llnl.gov>> wrote:

Does Lustre 2.12 require that routes for every intermediate network are 
defined, on every node on a path?

For example, given this Lustre network, where:
 A-D are nodes and 1-6 are addresses
 network tcp2 has only routers, no clients and no servers

A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D

And configured routes:

A: options lnet routes="tcp3 2@tcp1"
B: options lnet routes="tcp3 4@tcp2"
C: options lnet routes="tcp1 3@tcp2"
D: options lnet routes="tcp1 5@tcp3"

With Lustre <= 2.10 we configured only these routes.  The only nodes that need 
to know tcp2 exist are attached to it, and so there are no routes to tcp2 
defined anywhere.

It looks to me like Lustre 2.12 attempts to send error notifications back to 
the original sender, and so nodes A and D may end up receiving messages from 
nids on tcp2.  This then requires nodes A and D to have routes to tcp2 defined, 
so they can reply to the messages.

Interesting.  I'm no LNet expert, but it seems strange to me that nodes other 
than B and C should care about the state of connections within @tcp2 if they 
are not endpoints themselves. A and D should never be sending messges directly 
to those nodes, and the LNet routers B/C knowing which connections peers in 
@tcp2 are working should be enough for them to make routing decisions for A and 
D.

If B/C nodes are themselves unable to communicate with their peers, then _that_ 
should be sent back to A/D to indicate they cannot route packets to the target 
NID, but I wouldn't think A/D should get information about @tcp2 themselves?

Cheers, Andreas
--
Andreas Dilger
Principal Lustre Architect
Whamcloud






___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org


[lustre-discuss] Lustre 2.12 routing with MR and discovery off

2020-08-26 Thread Faaland, Olaf P.
Does Lustre 2.12 require that routes for every intermediate network are 
defined, on every node on a path?

For example, given this Lustre network, where:
  A-D are nodes and 1-6 are addresses
  network tcp2 has only routers, no clients and no servers

A(1) -tcp1- (2)B(3) -tcp2- (4)C(5) -tcp3- (6)D

And configured routes:

A: options lnet routes="tcp3 2@tcp1"
B: options lnet routes="tcp3 4@tcp2"
C: options lnet routes="tcp1 3@tcp2"
D: options lnet routes="tcp1 5@tcp3"

With Lustre <= 2.10 we configured only these routes.  The only nodes that need 
to know tcp2 exist are attached to it, and so there are no routes to tcp2 
defined anywhere.

It looks to me like Lustre 2.12 attempts to send error notifications back to 
the original sender, and so nodes A and D may end up receiving messages from 
nids on tcp2.  This then requires nodes A and D to have routes to tcp2 defined, 
so they can reply to the messages.

Is that correct?

thanks,
Olaf


-Olaf
___
lustre-discuss mailing list
lustre-discuss@lists.lustre.org
http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org