Re: [Lightning-dev] Fat Errors

2022-10-20 Thread Joost Jager
Ah, missed that. Thanks for the correction.
Joost.

On Thu, Oct 20, 2022 at 5:36 PM Bastien TEINTURIER  wrote:

> Hi Joost,
>
> I need more time to review your proposed change, but I wanted to quickly
> correct a misunderstanding you had in quoting eclair's code:
>
> > Unfortunately it is possible for nodes on the route to hide themselves.
> > If they return random data as the failure message, the sender won't know
> > where the failure happened. Some senders then penalize all nodes that
> > were part of the route [4][5]. This may exclude perfectly reliable nodes
> > from being used for future payments.
>
> Eclair's code does not penalize nodes for future payment attempts in this
> case. It only ignores them for the retries of that particular payment.
>
> Cheers,
> Bastien
>
> Le mer. 19 oct. 2022 à 13:13, Joost Jager  a
> écrit :
>
>> Hi list,
>>
>> I wanted to get back to a long-standing issue in Lightning: gaps in error
>> attribution. I've posted about this before back in 2019 [1].
>>
>> Error attribution is important to properly penalize nodes after a payment
>> failure occurs. The goal of the penalty is to give the next attempt a
>> better chance at succeeding. In the happy failure flow, the sender is able
>> to determine the origin of the failure and penalizes a single node or pair
>> of nodes.
>>
>> Unfortunately it is possible for nodes on the route to hide themselves.
>> If they return random data as the failure message, the sender won't know
>> where the failure happened. Some senders then penalize all nodes that were
>> part of the route [4][5]. This may exclude perfectly reliable nodes from
>> being used for future payments. Other senders penalize no nodes at all
>> [6][7], which allows the offending node to keep the disruption going.
>>
>> A special case of this is a final node sending back random data. Senders
>> that penalize all nodes will keep looking for alternative routes. But
>> because each alternative route still ends with that same final node, the
>> sender will ultimately penalize all of its peers and possibly a lot of the
>> rest of the network too.
>>
>> I can think of various reasons for exploiting this weakness. One is just
>> plain grievance for whatever reason. Another one is to attract more traffic
>> by getting competing routing nodes penalized. Or the goal could be to
>> sufficiently mess up reputation tracking of a specific sender node to make
>> it hard for that node to make further payments.
>>
>> Related to this are delays in the path. A node can delay propagating back
>> a failure message and the sender won't be able to determine which node did
>> it.
>>
>> The link at the top of this post [1] describes a way to address both
>> unreadable failure messages as well as delays by letting each node on the
>> route append a timestamp and hmac to the failure message. The great
>> challenge is to do this in such a way that nodes don’t learn their position
>> in the path.
>>
>> I'm revisiting this idea, and have prototyped various ways to implement
>> it. In the remainder of this post, I will describe the variant that I
>> thought works best (so far).
>>
>> # Failure message format
>>
>> The basic idea of the new format is to let each node (not just the error
>> source) commit to the failure message when it passes it back by adding an
>> hmac. The sender verifies all hmacs upon receipt of the failure message.
>> This makes it impossible for any of the nodes to modify the failure message
>> without revealing that they might have played a part in the modification.
>> It won’t be possible for the sender to pinpoint an exact node, because
>> either end of a communication channel may have modified the message.
>> Pinpointing a pair of nodes however is good enough, and is commonly done
>> for regular onion failures too.
>>
>> On the highest level, the new failure message consists of three parts:
>>
>> `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len)
>>
>> * `message` is the standard onion failure message as described in [2],
>> but without the hmac. The hmac is now part of `hmacs` and doesn't need to
>> be repeated.
>>
>> * `payloads` is a fixed length array that contains space for each node
>> (`hop_payload`) on the route to add data to return to the sender. Ideally
>> the contents and size of `hop_payload` is signaled so that future
>> extensions don’t require all nodes to upgrade. For now, we’ll assume the
>> following 9-byte format:
>>
>>   `is_final` (1 byte) | `duration` (8 bytes)
>>
>>   `is_final` indicates whether this node is the failure source. The
>> sender uses `is_final` to determine when to stop the
>> decryption/verification process.
>>
>>   `duration` is the time in milliseconds that the node held the htlc. By
>> observing the series of reported durations, the sender is able to pinpoint
>> a delay down to a pair of nodes.
>>
>>   The `hop_payload` is repeated 27 times (the maximum route length).
>>
>>   Every hop shifts `payloads` 9 bytes to the 

Re: [Lightning-dev] Fat Errors

2022-10-20 Thread Bastien TEINTURIER
Hi Joost,

I need more time to review your proposed change, but I wanted to quickly
correct a misunderstanding you had in quoting eclair's code:

> Unfortunately it is possible for nodes on the route to hide themselves.
> If they return random data as the failure message, the sender won't know
> where the failure happened. Some senders then penalize all nodes that
> were part of the route [4][5]. This may exclude perfectly reliable nodes
> from being used for future payments.

Eclair's code does not penalize nodes for future payment attempts in this
case. It only ignores them for the retries of that particular payment.

Cheers,
Bastien

Le mer. 19 oct. 2022 à 13:13, Joost Jager  a écrit :

> Hi list,
>
> I wanted to get back to a long-standing issue in Lightning: gaps in error
> attribution. I've posted about this before back in 2019 [1].
>
> Error attribution is important to properly penalize nodes after a payment
> failure occurs. The goal of the penalty is to give the next attempt a
> better chance at succeeding. In the happy failure flow, the sender is able
> to determine the origin of the failure and penalizes a single node or pair
> of nodes.
>
> Unfortunately it is possible for nodes on the route to hide themselves. If
> they return random data as the failure message, the sender won't know where
> the failure happened. Some senders then penalize all nodes that were part
> of the route [4][5]. This may exclude perfectly reliable nodes from being
> used for future payments. Other senders penalize no nodes at all [6][7],
> which allows the offending node to keep the disruption going.
>
> A special case of this is a final node sending back random data. Senders
> that penalize all nodes will keep looking for alternative routes. But
> because each alternative route still ends with that same final node, the
> sender will ultimately penalize all of its peers and possibly a lot of the
> rest of the network too.
>
> I can think of various reasons for exploiting this weakness. One is just
> plain grievance for whatever reason. Another one is to attract more traffic
> by getting competing routing nodes penalized. Or the goal could be to
> sufficiently mess up reputation tracking of a specific sender node to make
> it hard for that node to make further payments.
>
> Related to this are delays in the path. A node can delay propagating back
> a failure message and the sender won't be able to determine which node did
> it.
>
> The link at the top of this post [1] describes a way to address both
> unreadable failure messages as well as delays by letting each node on the
> route append a timestamp and hmac to the failure message. The great
> challenge is to do this in such a way that nodes don’t learn their position
> in the path.
>
> I'm revisiting this idea, and have prototyped various ways to implement
> it. In the remainder of this post, I will describe the variant that I
> thought works best (so far).
>
> # Failure message format
>
> The basic idea of the new format is to let each node (not just the error
> source) commit to the failure message when it passes it back by adding an
> hmac. The sender verifies all hmacs upon receipt of the failure message.
> This makes it impossible for any of the nodes to modify the failure message
> without revealing that they might have played a part in the modification.
> It won’t be possible for the sender to pinpoint an exact node, because
> either end of a communication channel may have modified the message.
> Pinpointing a pair of nodes however is good enough, and is commonly done
> for regular onion failures too.
>
> On the highest level, the new failure message consists of three parts:
>
> `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len)
>
> * `message` is the standard onion failure message as described in [2], but
> without the hmac. The hmac is now part of `hmacs` and doesn't need to be
> repeated.
>
> * `payloads` is a fixed length array that contains space for each node
> (`hop_payload`) on the route to add data to return to the sender. Ideally
> the contents and size of `hop_payload` is signaled so that future
> extensions don’t require all nodes to upgrade. For now, we’ll assume the
> following 9-byte format:
>
>   `is_final` (1 byte) | `duration` (8 bytes)
>
>   `is_final` indicates whether this node is the failure source. The sender
> uses `is_final` to determine when to stop the decryption/verification
> process.
>
>   `duration` is the time in milliseconds that the node held the htlc. By
> observing the series of reported durations, the sender is able to pinpoint
> a delay down to a pair of nodes.
>
>   The `hop_payload` is repeated 27 times (the maximum route length).
>
>   Every hop shifts `payloads` 9 bytes to the right and puts its own
> `hop_payload` in the 9 left-most bytes.
>
> * `hmacs` is a fixed length array where nodes add their hmacs as the
> failure message travels back to the sender.
>
>   To keep things simple, I'll describe the