Good morning Lightning world,

A minor report from the MPP trenches.

One of the resources that the C-Lightning MPP implementation is hitting is the 
limit on the number of HTLCs a channel can have.

For the 0.9.0 release, the initial consideration was that we can count the 
number of channels with outgoing capacity, and from there derive a number of 
HTLCs that the payer can afford to put on the network.
Very roughly speaking, for every channel with outgoing capacity, we allocate 10 
HTLCs.
This is the "limit" on our connectivity to the network: adding more HTLCs 
beyond this risks overloading our local connectivity and we would be unable to 
get good capacity.

However, we neglected to consider the *incoming number-of-HTLCs limit* of the 
the payee.
I believe this is the cause of this reported issue: 
https://github.com/ElementsProject/lightning/issues/3926
In the report, the payer node has 16 channels, and thus it allows up to 160 
HTLCs.
Initial splitting of the payment led to 51 starting splits, and since we do not 
implement re-merging of sub-payments, that number of splits can only grow.
Yet it seems that the payee had far fewer channels than the payer, and the much 
higher number of splits then could not fit in the incoming channels of the 
payee.

This is exacerbated by the use of the same failure code 
`temporary_channel_failure` for hitting both *msat capacity* and *number of 
HTLCs* limits.
Our assumption was that any such `temporary_channel_failure` was due only to 
*msat capacity* being hit.
We then annotated that channel with the smallest HTLC that failed to route 
through it, and do not route through that channel for sub-payments equal or 
larger than that size.
This lead our code into splitting the 51 payments even further into more 
smaller payments, when it would have been objectively better to instead *merge* 
those payments (or not split up into such tiny pieces in the first place!).
Unfortunately, the local annotations would then be poisoned --- it would think 
that very small payments were failing because of the *msat capacity* of the 
channel being ridiculously low.
This ended up convincing the payment subsystem that it would be better to keep 
on splitting payments **even more**, leading to >100 payments outgoing, further 
preventing the receiver from being able to receive (because the problem was not 
*msat capacity* but rather *number of HTLCs*) and further crashing ourselves 
into the problem.

So, I think it would be reasonable:

* To count the number of channels the payee has (if the receiver is not 
published, count the number of routehints in the invoice), and use it as well 
as the basis of the number of HTLCs the receiver can get, and to get the lower 
of this and the outgoing channels of the payer.


Overall, the issue is probably fixable if we consider the number of channels of 
the payer (as C-Lightning 0.9.0 does) ***and*** also the number of channels of 
the payee.
We can consider that, if we assume the rest of the public LN is well-connected, 
the only bottlenecks are at the payer and at the payee, and that for 
intermediate nodes there are a large number of alternate routes available.
So it may be sufficient to just limit based on the smaller of the number of 
payer-side channels and the number of payee-side channels.

Regards,
ZmnSCPxj
_______________________________________________
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Reply via email to