Re: [Lightning-dev] Fat Errors
Pushed a golang implementation of the fat errors here: https://github.com/lightningnetwork/lightning-onion/pull/60 Joost. On Wed, Oct 19, 2022 at 1:12 PM Joost Jager wrote: > Hi list, > > I wanted to get back to a long-standing issue in Lightning: gaps in error > attribution. I've posted about this before back in 2019 [1]. > > Error attribution is important to properly penalize nodes after a payment > failure occurs. The goal of the penalty is to give the next attempt a > better chance at succeeding. In the happy failure flow, the sender is able > to determine the origin of the failure and penalizes a single node or pair > of nodes. > > Unfortunately it is possible for nodes on the route to hide themselves. If > they return random data as the failure message, the sender won't know where > the failure happened. Some senders then penalize all nodes that were part > of the route [4][5]. This may exclude perfectly reliable nodes from being > used for future payments. Other senders penalize no nodes at all [6][7], > which allows the offending node to keep the disruption going. > > A special case of this is a final node sending back random data. Senders > that penalize all nodes will keep looking for alternative routes. But > because each alternative route still ends with that same final node, the > sender will ultimately penalize all of its peers and possibly a lot of the > rest of the network too. > > I can think of various reasons for exploiting this weakness. One is just > plain grievance for whatever reason. Another one is to attract more traffic > by getting competing routing nodes penalized. Or the goal could be to > sufficiently mess up reputation tracking of a specific sender node to make > it hard for that node to make further payments. > > Related to this are delays in the path. A node can delay propagating back > a failure message and the sender won't be able to determine which node did > it. > > The link at the top of this post [1] describes a way to address both > unreadable failure messages as well as delays by letting each node on the > route append a timestamp and hmac to the failure message. The great > challenge is to do this in such a way that nodes don’t learn their position > in the path. > > I'm revisiting this idea, and have prototyped various ways to implement > it. In the remainder of this post, I will describe the variant that I > thought works best (so far). > > # Failure message format > > The basic idea of the new format is to let each node (not just the error > source) commit to the failure message when it passes it back by adding an > hmac. The sender verifies all hmacs upon receipt of the failure message. > This makes it impossible for any of the nodes to modify the failure message > without revealing that they might have played a part in the modification. > It won’t be possible for the sender to pinpoint an exact node, because > either end of a communication channel may have modified the message. > Pinpointing a pair of nodes however is good enough, and is commonly done > for regular onion failures too. > > On the highest level, the new failure message consists of three parts: > > `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len) > > * `message` is the standard onion failure message as described in [2], but > without the hmac. The hmac is now part of `hmacs` and doesn't need to be > repeated. > > * `payloads` is a fixed length array that contains space for each node > (`hop_payload`) on the route to add data to return to the sender. Ideally > the contents and size of `hop_payload` is signaled so that future > extensions don’t require all nodes to upgrade. For now, we’ll assume the > following 9-byte format: > > `is_final` (1 byte) | `duration` (8 bytes) > > `is_final` indicates whether this node is the failure source. The sender > uses `is_final` to determine when to stop the decryption/verification > process. > > `duration` is the time in milliseconds that the node held the htlc. By > observing the series of reported durations, the sender is able to pinpoint > a delay down to a pair of nodes. > > The `hop_payload` is repeated 27 times (the maximum route length). > > Every hop shifts `payloads` 9 bytes to the right and puts its own > `hop_payload` in the 9 left-most bytes. > > * `hmacs` is a fixed length array where nodes add their hmacs as the > failure message travels back to the sender. > > To keep things simple, I'll describe the format as if the maximum route > length was only three hops (instead of 27): > > `hmac_0_2` | `hmac_0_1`| `hmac_0_0`| `hmac_1_1`| `hmac_1_0`| `hmac_2_0` > > Because nodes don't know their position in the path, it's unclear to > them what part of the failure message they are supposed to include in the > hmac. They can't just include everything, because if part of that data is > deleted later (to keep the message size fixed) it opens up the possibility > for nodes to blame others. > > The solution here is to
Re: [Lightning-dev] Fat Errors
Hi Thomas, This is a very interesting proposal that elegantly solves the problem, with > however a very significant size increase. I can see two ways to keep the > size small: > - Each node just adds its hmac in a naive way, without deleting any part > of the message to relay. You seem to have disqualified this option because > it increases the size of the relayed message but I think it merits more > consideration. It is much simpler and the size only grows linearly with the > length of the route. An intermediate node could try to infer its position > relative to the failing node (which should not be the recipient) but > without knowing the original message size (which can easily be randomized > by the final node), is that really such a problem? It may be but I would > argue it's a good trade-off. > That would definitely make the solution a lot simpler. I think that increasing the message length still does leak some information, even with randomization by the final node. For example if you know the minimum message length including random bytes produced by the final node, and a routing node sees this length, they must be the second-last hop. I tried to come up with something that stays within the current privacy guarantees, but it's fair to question the trade-off. An advantage of the naive hmac append is also that each node can add a variable (tlv?) payload. In the fixed size proposal that isn't possible because nodes need to know exactly how many bytes to sign to cover a number of downstream hop payloads, and some form of signaling would be required to add flexibility to that. A variable payload makes it easier to add extensions later on. It also helps with the randomization of the length. And intermediate nodes could choose to add some random bytes too in a otherwise unused tlv record. > - If we really want to keep the constant size property, as you've > suggested we could use a low limit on the number of nodes. I would put the > limit even lower, at 8 or less. We could still use longer routes but we > would only get hmacs for the first 8 hops and revert to the legacy system > if the failure happens after the first 8 hops. That way we keep the size > low and 8 hops should be good enough for 99% of the payments, and even when > there are more hops we would know that the first 7 hops are clean. > Sounds like a useful middle road. Each hop will just shift hmacs and the ones further than 8 hops away will be shifted out completely. Yes, not bad. The question that is still unanswered for me is how problematic a full size fat error of 12 kb would really be. Of course small is better than big, but wondering if there would be an actual degradation of the ux or other significant negative effects in practice. Joost ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Fat Errors
Hi Joost, This is a very interesting proposal that elegantly solves the problem, with however a very significant size increase. I can see two ways to keep the size small: - Each node just adds its hmac in a naive way, without deleting any part of the message to relay. You seem to have disqualified this option because it increases the size of the relayed message but I think it merits more consideration. It is much simpler and the size only grows linearly with the length of the route. An intermediate node could try to infer its position relative to the failing node (which should not be the recipient) but without knowing the original message size (which can easily be randomized by the final node), is that really such a problem? It may be but I would argue it's a good trade-off. - If we really want to keep the constant size property, as you've suggested we could use a low limit on the number of nodes. I would put the limit even lower, at 8 or less. We could still use longer routes but we would only get hmacs for the first 8 hops and revert to the legacy system if the failure happens after the first 8 hops. That way we keep the size low and 8 hops should be good enough for 99% of the payments, and even when there are more hops we would know that the first 7 hops are clean. Thanks again for your contribution, I hope we'll soon be able to attribute failures trustlessly. Thomas Le mar. 1 nov. 2022 à 22:10, Joost Jager a écrit : > Hey Rusty, > > Great to hear that you want to try to implement the proposal. I can polish > my golang proof of concept code a bit and share if that's useful? It's just > doing the calculation in isolation. My next step after that would be to see > what it looks like integrated in lnd. > > 16 hops sounds fine to me too, but in general I am not too concerned about > the size of the message. Maybe a scheme is possible where the sender > signals the max number of hops, trading off size against privacy. Probably > an unnecessary complication though. > > I remember the prepay scheme, but sounds quite a bit more invasive than > just touching encode/relay/decode of the failure message. You also won't > have the timing information to identify slow nodes on the path. > > Joost. > > On Tue, Oct 25, 2022 at 9:58 PM Rusty Russell > wrote: > >> Joost Jager writes: >> > Hi list, >> > >> > I wanted to get back to a long-standing issue in Lightning: gaps in >> error >> > attribution. I've posted about this before back in 2019 [1]. >> >> Hi Joost! >> >> Thanks for writing this up fully. Core lightning also doesn't >> penalize properly, because of the attribution problem: solving this lets >> us penalize a channel, at least. >> >> I want to implement this too, to make sure I understand it >> correctly, but having read it twice it seems reasonable. >> >> How about 16 hops? It's the closest power of 2 to the legacy hop >> limit, and makes this 4.5k for payloads and hmacs. >> >> There is, however, a completely different possibility if we want >> to use a pre-pay scheme, which I think I've described previously. You >> send N sats and a secp point; every chained secret returned earns the >> forwarder 1 sat[1]. The answers of course are placed in each layer of >> the onion. You know how far the onion got based on how much money you >> got back on failure[2], though the error message may be corrupted. >> >> Cheers, >> Rusty. >> [1] Simplest is truncate the point to a new secret key. Each node would >> apply a tweak for decorrelation ofc. >> [2] The best scheme is that you don't get paid unless the next node >> decrypts, actually, but that needs more thought. >> > ___ > Lightning-dev mailing list > Lightning-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Fat Errors
Hey Rusty, Great to hear that you want to try to implement the proposal. I can polish my golang proof of concept code a bit and share if that's useful? It's just doing the calculation in isolation. My next step after that would be to see what it looks like integrated in lnd. 16 hops sounds fine to me too, but in general I am not too concerned about the size of the message. Maybe a scheme is possible where the sender signals the max number of hops, trading off size against privacy. Probably an unnecessary complication though. I remember the prepay scheme, but sounds quite a bit more invasive than just touching encode/relay/decode of the failure message. You also won't have the timing information to identify slow nodes on the path. Joost. On Tue, Oct 25, 2022 at 9:58 PM Rusty Russell wrote: > Joost Jager writes: > > Hi list, > > > > I wanted to get back to a long-standing issue in Lightning: gaps in error > > attribution. I've posted about this before back in 2019 [1]. > > Hi Joost! > > Thanks for writing this up fully. Core lightning also doesn't > penalize properly, because of the attribution problem: solving this lets > us penalize a channel, at least. > > I want to implement this too, to make sure I understand it > correctly, but having read it twice it seems reasonable. > > How about 16 hops? It's the closest power of 2 to the legacy hop > limit, and makes this 4.5k for payloads and hmacs. > > There is, however, a completely different possibility if we want > to use a pre-pay scheme, which I think I've described previously. You > send N sats and a secp point; every chained secret returned earns the > forwarder 1 sat[1]. The answers of course are placed in each layer of > the onion. You know how far the onion got based on how much money you > got back on failure[2], though the error message may be corrupted. > > Cheers, > Rusty. > [1] Simplest is truncate the point to a new secret key. Each node would > apply a tweak for decorrelation ofc. > [2] The best scheme is that you don't get paid unless the next node > decrypts, actually, but that needs more thought. > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Fat Errors
Joost Jager writes: > Hi list, > > I wanted to get back to a long-standing issue in Lightning: gaps in error > attribution. I've posted about this before back in 2019 [1]. Hi Joost! Thanks for writing this up fully. Core lightning also doesn't penalize properly, because of the attribution problem: solving this lets us penalize a channel, at least. I want to implement this too, to make sure I understand it correctly, but having read it twice it seems reasonable. How about 16 hops? It's the closest power of 2 to the legacy hop limit, and makes this 4.5k for payloads and hmacs. There is, however, a completely different possibility if we want to use a pre-pay scheme, which I think I've described previously. You send N sats and a secp point; every chained secret returned earns the forwarder 1 sat[1]. The answers of course are placed in each layer of the onion. You know how far the onion got based on how much money you got back on failure[2], though the error message may be corrupted. Cheers, Rusty. [1] Simplest is truncate the point to a new secret key. Each node would apply a tweak for decorrelation ofc. [2] The best scheme is that you don't get paid unless the next node decrypts, actually, but that needs more thought. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Fat Errors
Ah, missed that. Thanks for the correction. Joost. On Thu, Oct 20, 2022 at 5:36 PM Bastien TEINTURIER wrote: > Hi Joost, > > I need more time to review your proposed change, but I wanted to quickly > correct a misunderstanding you had in quoting eclair's code: > > > Unfortunately it is possible for nodes on the route to hide themselves. > > If they return random data as the failure message, the sender won't know > > where the failure happened. Some senders then penalize all nodes that > > were part of the route [4][5]. This may exclude perfectly reliable nodes > > from being used for future payments. > > Eclair's code does not penalize nodes for future payment attempts in this > case. It only ignores them for the retries of that particular payment. > > Cheers, > Bastien > > Le mer. 19 oct. 2022 à 13:13, Joost Jager a > écrit : > >> Hi list, >> >> I wanted to get back to a long-standing issue in Lightning: gaps in error >> attribution. I've posted about this before back in 2019 [1]. >> >> Error attribution is important to properly penalize nodes after a payment >> failure occurs. The goal of the penalty is to give the next attempt a >> better chance at succeeding. In the happy failure flow, the sender is able >> to determine the origin of the failure and penalizes a single node or pair >> of nodes. >> >> Unfortunately it is possible for nodes on the route to hide themselves. >> If they return random data as the failure message, the sender won't know >> where the failure happened. Some senders then penalize all nodes that were >> part of the route [4][5]. This may exclude perfectly reliable nodes from >> being used for future payments. Other senders penalize no nodes at all >> [6][7], which allows the offending node to keep the disruption going. >> >> A special case of this is a final node sending back random data. Senders >> that penalize all nodes will keep looking for alternative routes. But >> because each alternative route still ends with that same final node, the >> sender will ultimately penalize all of its peers and possibly a lot of the >> rest of the network too. >> >> I can think of various reasons for exploiting this weakness. One is just >> plain grievance for whatever reason. Another one is to attract more traffic >> by getting competing routing nodes penalized. Or the goal could be to >> sufficiently mess up reputation tracking of a specific sender node to make >> it hard for that node to make further payments. >> >> Related to this are delays in the path. A node can delay propagating back >> a failure message and the sender won't be able to determine which node did >> it. >> >> The link at the top of this post [1] describes a way to address both >> unreadable failure messages as well as delays by letting each node on the >> route append a timestamp and hmac to the failure message. The great >> challenge is to do this in such a way that nodes don’t learn their position >> in the path. >> >> I'm revisiting this idea, and have prototyped various ways to implement >> it. In the remainder of this post, I will describe the variant that I >> thought works best (so far). >> >> # Failure message format >> >> The basic idea of the new format is to let each node (not just the error >> source) commit to the failure message when it passes it back by adding an >> hmac. The sender verifies all hmacs upon receipt of the failure message. >> This makes it impossible for any of the nodes to modify the failure message >> without revealing that they might have played a part in the modification. >> It won’t be possible for the sender to pinpoint an exact node, because >> either end of a communication channel may have modified the message. >> Pinpointing a pair of nodes however is good enough, and is commonly done >> for regular onion failures too. >> >> On the highest level, the new failure message consists of three parts: >> >> `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len) >> >> * `message` is the standard onion failure message as described in [2], >> but without the hmac. The hmac is now part of `hmacs` and doesn't need to >> be repeated. >> >> * `payloads` is a fixed length array that contains space for each node >> (`hop_payload`) on the route to add data to return to the sender. Ideally >> the contents and size of `hop_payload` is signaled so that future >> extensions don’t require all nodes to upgrade. For now, we’ll assume the >> following 9-byte format: >> >> `is_final` (1 byte) | `duration` (8 bytes) >> >> `is_final` indicates whether this node is the failure source. The >> sender uses `is_final` to determine when to stop the >> decryption/verification process. >> >> `duration` is the time in milliseconds that the node held the htlc. By >> observing the series of reported durations, the sender is able to pinpoint >> a delay down to a pair of nodes. >> >> The `hop_payload` is repeated 27 times (the maximum route length). >> >> Every hop shifts `payloads` 9 bytes to the
Re: [Lightning-dev] Fat Errors
Hi Joost, I need more time to review your proposed change, but I wanted to quickly correct a misunderstanding you had in quoting eclair's code: > Unfortunately it is possible for nodes on the route to hide themselves. > If they return random data as the failure message, the sender won't know > where the failure happened. Some senders then penalize all nodes that > were part of the route [4][5]. This may exclude perfectly reliable nodes > from being used for future payments. Eclair's code does not penalize nodes for future payment attempts in this case. It only ignores them for the retries of that particular payment. Cheers, Bastien Le mer. 19 oct. 2022 à 13:13, Joost Jager a écrit : > Hi list, > > I wanted to get back to a long-standing issue in Lightning: gaps in error > attribution. I've posted about this before back in 2019 [1]. > > Error attribution is important to properly penalize nodes after a payment > failure occurs. The goal of the penalty is to give the next attempt a > better chance at succeeding. In the happy failure flow, the sender is able > to determine the origin of the failure and penalizes a single node or pair > of nodes. > > Unfortunately it is possible for nodes on the route to hide themselves. If > they return random data as the failure message, the sender won't know where > the failure happened. Some senders then penalize all nodes that were part > of the route [4][5]. This may exclude perfectly reliable nodes from being > used for future payments. Other senders penalize no nodes at all [6][7], > which allows the offending node to keep the disruption going. > > A special case of this is a final node sending back random data. Senders > that penalize all nodes will keep looking for alternative routes. But > because each alternative route still ends with that same final node, the > sender will ultimately penalize all of its peers and possibly a lot of the > rest of the network too. > > I can think of various reasons for exploiting this weakness. One is just > plain grievance for whatever reason. Another one is to attract more traffic > by getting competing routing nodes penalized. Or the goal could be to > sufficiently mess up reputation tracking of a specific sender node to make > it hard for that node to make further payments. > > Related to this are delays in the path. A node can delay propagating back > a failure message and the sender won't be able to determine which node did > it. > > The link at the top of this post [1] describes a way to address both > unreadable failure messages as well as delays by letting each node on the > route append a timestamp and hmac to the failure message. The great > challenge is to do this in such a way that nodes don’t learn their position > in the path. > > I'm revisiting this idea, and have prototyped various ways to implement > it. In the remainder of this post, I will describe the variant that I > thought works best (so far). > > # Failure message format > > The basic idea of the new format is to let each node (not just the error > source) commit to the failure message when it passes it back by adding an > hmac. The sender verifies all hmacs upon receipt of the failure message. > This makes it impossible for any of the nodes to modify the failure message > without revealing that they might have played a part in the modification. > It won’t be possible for the sender to pinpoint an exact node, because > either end of a communication channel may have modified the message. > Pinpointing a pair of nodes however is good enough, and is commonly done > for regular onion failures too. > > On the highest level, the new failure message consists of three parts: > > `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len) > > * `message` is the standard onion failure message as described in [2], but > without the hmac. The hmac is now part of `hmacs` and doesn't need to be > repeated. > > * `payloads` is a fixed length array that contains space for each node > (`hop_payload`) on the route to add data to return to the sender. Ideally > the contents and size of `hop_payload` is signaled so that future > extensions don’t require all nodes to upgrade. For now, we’ll assume the > following 9-byte format: > > `is_final` (1 byte) | `duration` (8 bytes) > > `is_final` indicates whether this node is the failure source. The sender > uses `is_final` to determine when to stop the decryption/verification > process. > > `duration` is the time in milliseconds that the node held the htlc. By > observing the series of reported durations, the sender is able to pinpoint > a delay down to a pair of nodes. > > The `hop_payload` is repeated 27 times (the maximum route length). > > Every hop shifts `payloads` 9 bytes to the right and puts its own > `hop_payload` in the 9 left-most bytes. > > * `hmacs` is a fixed length array where nodes add their hmacs as the > failure message travels back to the sender. > > To keep things simple, I'll describe the
[Lightning-dev] Fat Errors
Hi list, I wanted to get back to a long-standing issue in Lightning: gaps in error attribution. I've posted about this before back in 2019 [1]. Error attribution is important to properly penalize nodes after a payment failure occurs. The goal of the penalty is to give the next attempt a better chance at succeeding. In the happy failure flow, the sender is able to determine the origin of the failure and penalizes a single node or pair of nodes. Unfortunately it is possible for nodes on the route to hide themselves. If they return random data as the failure message, the sender won't know where the failure happened. Some senders then penalize all nodes that were part of the route [4][5]. This may exclude perfectly reliable nodes from being used for future payments. Other senders penalize no nodes at all [6][7], which allows the offending node to keep the disruption going. A special case of this is a final node sending back random data. Senders that penalize all nodes will keep looking for alternative routes. But because each alternative route still ends with that same final node, the sender will ultimately penalize all of its peers and possibly a lot of the rest of the network too. I can think of various reasons for exploiting this weakness. One is just plain grievance for whatever reason. Another one is to attract more traffic by getting competing routing nodes penalized. Or the goal could be to sufficiently mess up reputation tracking of a specific sender node to make it hard for that node to make further payments. Related to this are delays in the path. A node can delay propagating back a failure message and the sender won't be able to determine which node did it. The link at the top of this post [1] describes a way to address both unreadable failure messages as well as delays by letting each node on the route append a timestamp and hmac to the failure message. The great challenge is to do this in such a way that nodes don’t learn their position in the path. I'm revisiting this idea, and have prototyped various ways to implement it. In the remainder of this post, I will describe the variant that I thought works best (so far). # Failure message format The basic idea of the new format is to let each node (not just the error source) commit to the failure message when it passes it back by adding an hmac. The sender verifies all hmacs upon receipt of the failure message. This makes it impossible for any of the nodes to modify the failure message without revealing that they might have played a part in the modification. It won’t be possible for the sender to pinpoint an exact node, because either end of a communication channel may have modified the message. Pinpointing a pair of nodes however is good enough, and is commonly done for regular onion failures too. On the highest level, the new failure message consists of three parts: `message` (var len) | `payloads` (fixed len) | `hmacs` (fixed len) * `message` is the standard onion failure message as described in [2], but without the hmac. The hmac is now part of `hmacs` and doesn't need to be repeated. * `payloads` is a fixed length array that contains space for each node (`hop_payload`) on the route to add data to return to the sender. Ideally the contents and size of `hop_payload` is signaled so that future extensions don’t require all nodes to upgrade. For now, we’ll assume the following 9-byte format: `is_final` (1 byte) | `duration` (8 bytes) `is_final` indicates whether this node is the failure source. The sender uses `is_final` to determine when to stop the decryption/verification process. `duration` is the time in milliseconds that the node held the htlc. By observing the series of reported durations, the sender is able to pinpoint a delay down to a pair of nodes. The `hop_payload` is repeated 27 times (the maximum route length). Every hop shifts `payloads` 9 bytes to the right and puts its own `hop_payload` in the 9 left-most bytes. * `hmacs` is a fixed length array where nodes add their hmacs as the failure message travels back to the sender. To keep things simple, I'll describe the format as if the maximum route length was only three hops (instead of 27): `hmac_0_2` | `hmac_0_1`| `hmac_0_0`| `hmac_1_1`| `hmac_1_0`| `hmac_2_0` Because nodes don't know their position in the path, it's unclear to them what part of the failure message they are supposed to include in the hmac. They can't just include everything, because if part of that data is deleted later (to keep the message size fixed) it opens up the possibility for nodes to blame others. The solution here is to provide hmacs for all possible positions. The last node that updated `hmacs` added `hmac_0_2`, `hmac_0_1` and `hmac_0_0` to the block. Each hmac corresponds to a presumed position in the path, where `hmac_0_2` is for the longest path (2 downstream hops) and `hmac_0_0` for the shortest (node is the error source). `hmac_x_y` is the hmac added by node x