[Lightning-dev] Onion messages rate-limiting
During the recent Oakland Dev Summit, some lightning engineers got together to discuss DoS protection for onion messages. Rusty proposed a very simple rate-limiting scheme that statistically propagates back to the correct sender, which we describe in details below. You can also read this in gist format if that works better for you [1]. Nodes apply per-peer rate limits on _incoming_ onion messages that should be relayed (e.g. N/seconds with some burst tolerance). It is recommended to allow more onion messages from peers with whom you have channels, for example 10/seconds when you have a channel and 1/second when you don't. When relaying an onion message, nodes keep track of where it came from (by using the `node_id` of the peer who sent that message). Nodes only need the last such `node_id` per outgoing connection, which ensures the memory footprint is very small. Also, this data doesn't need to be persisted. Let's walk through an example to illustrate this mechanism: * Bob receives an onion message from Alice that should be relayed to Carol * After relaying that message, Bob stores Alice's `node_id` in its per-connection state with Carol * Bob receives an onion message from Eve that should be relayed to Carol * After relaying that message, Bob replaces Alice's `node_id` with Eve's `node_id` in its per-connection state with Carol * Bob receives an onion message from Alice that should be relayed to Dave * After relaying that message, Bob stores Alice's `node_id` in its per-connection state with Dave * ... We introduce a new message that will be sent when dropping an incoming onion message because it reached rate limits: 1. type: 515 (`onion_message_drop`) 2. data: * [`rate_limited`:`u8`] * [`shared_secret_hash`:`32*byte`] Whenever an incoming onion message reaches the rate limit, the receiver sends `onion_message_drop` to the sender. The sender looks at its per-connection state to find where the message was coming from and relays `onion_message_drop` to the last sender, halving their rate limits with that peer. If the sender doesn't overflow the rate limit again, the receiver should double the rate limit after 30 seconds, until it reaches the default rate limit again. The flow will look like: Alice Bob Carol | | | | onion_message | | |>| | | | onion_message | | |>| | |onion_message_drop | | |<| |onion_message_drop | | |<| | The `shared_secret_hash` field contains a BIP 340 tagged hash of the Sphinx shared secret of the rate limiting peer (in the example above, Carol): * `shared_secret_hash = SHA256(SHA256("onion_message_drop") || SHA256("onion_message_drop") || sphinx_shared_secret)` This value is known by the node that created the onion message: if `onion_message_drop` propagates all the way back to them, it lets them know which part of the route is congested, allowing them to retry through a different path. Whenever there is some latency between nodes and many onion messages, `onion_message_drop` may be relayed to the incorrect incoming peer (since we only store the `node_id` of the _last_ incoming peer in our outgoing connection state). The following example highlights this: Eve Bob Carol | onion_message | | |>| onion_message | | onion_message |>| |>| onion_message | | onion_message |>| |>| onion_message | |>| Alice |onion_message_drop | | onion_message |+| |>| onion_message || | ||--->| | ||| | ||| | ||| |onion_message_drop |<---+| |<| | In this example, Eve is spamming but `onion_message_drop` is propagated back to Alice instead. However, this scheme will _statistically_ penalize the right incoming peer (with a probability depending on the volume of onion messages that the spamming peer is generating compared to the volume of legitimate onion messages). It is an interesting research problem to find formulas for those probabilities to evaluate how efficient this will be against various
Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality
On Sun, Jun 05, 2022 at 02:29:28PM +, ZmnSCPxj via Lightning-dev wrote: Just sharing my thoughts on this. > Introduction > > Optimize for reliability+ >uncertainty+fee+drain+uptime... > .--~~--. > /\ >/ \ > /\ > / \ > /\ > _--' `--_ > Just Just > optimize optimize > for for > low fee low fee I think ideally you want to optimise for some combination of fee, speed and reliability (both liklihood of a clean failure that you can retry and of generating stuck payments). As Matt/Peter suggest in another thread, maybe for some uses you can accept low speed for low fees, while in others you'd rather pay more and get near-instant results. I think drain should just go to fee, and uncertainty/uptime are just ways of estimating reliability. It might be reasonable to generate local estimates for speed/reliability by regularly sending onion messages or designed-to-fail htlcs. Sorry if that makes me a midwit :) > Rene Pickhardt also presented the idea of leaking friend-of-a-friend > balances, to help payers increase their payment reliability. I think foaf (as opposed to global) gossip of *fee rates* is a very interesting approach to trying to give nodes more *current* information, without flooding the entire network with more traffic than it can cope with. > Now we can consider that *every channel is a marketplace*. > What is being sold is the sats inside the channel. (Really, the marketplace is a channel pair (the incoming channel and the outgoing channel), and what's being sold is their relative balance) > So my concrete proposal is that we can do the same friend-of-a-friend balance > leakage proposed by Rene, except we leak it using *existing* mechanisms --- > i.e. gossiping a `channel_update` with new feerates adjusted according to the > supply on the channel --- rather than having a new message to leak > friend-of-a-friend balance directly. +42 > Because we effectively leak the balance of channels by the feerates on the > channel, this totally leaks the balance of channels. I don't think this is true -- you ideally want to adjust fees not to maintain a balanced channel (50% on each side), but a balanced *flow* (1:1 incoming/outgoing payment volume) -- it doesn't really matter if you get the balanced flow that results in an average of a 50:50, 80:20 or 20:80 ratio of channel balances (at least, it doesn't as long as your channel capacity is 10 or 100 times the payment size, and your variance is correspondingly low). Further, you have two degrees of freedom when setting fee rates: one is how balanced the flows are, which controls how long your channel can remain useful, but the other is how *much* flow there is -- if halving your fee rate doubles the flow rate in sats/hour, then that will still increase your profit. That also doesn't leak balance information. > ### Inverting The Filter: Feerate Cards > Basically, a feerate card is a mapping between a probability-of-success range > and a feerate. > * 00%->25%: -10ppm > * 26%->50%: 1ppm > * 51%->75%: 5ppm > * 76%->100%: 50ppm Feerate cards don't really make sense to me; "probability of success" isn't a real measure the payer can use -- naively, if it were, they could just retry at 1ppm 10 times and get to 95% chances of success. But if they can afford to retry (background rebalancing?), they might as well just try at -10ppm, 1ppm, 5ppm, 10ppm (or perhaps with a binary search?), and see if they're lucky; but if they want a 1s response time, and can't afford retries, what good is even a 75% chance of success if that's the individual success rate on each hop of their five hop path? And if you're not just going by odds of having to retry, then you need to get some current information about the channel to plug into the formula; but if you're getting *current* information, why not let that information be the feerate directly? > More concretely, we set some high feerate, impose some kind of constant > "gravity" that pulls down the feerate over time, then we measure the relative > loss of outgoing liquidity to serve as "lift" to the feerate. If your current fee rate is F (ppm), and your current volume (flow) is V (sats forwarded per hour), then your profit is FV. If dropping your fee rate by dF (<0) results in an increase of V by dV (>0), then you want: (F+dF)(V+dV) > FV FV + VdF + FdV + dFdV > FV FdV > -VdF dV/dF < -V/F (flip the inequality because dF is negative) (dV/V)/(dF/F) < -1 (fee-elasticity of volume is in the elastic region) (<-1 == elastic == flow changes more than the fee does == drop the fee rate; >-1 == ineleastic == flow changes less than the fee does == raise the fee rate; =-1 == unit elastic ==
Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation
Thanks for this Alex. Here's a transcript of your recent presentation at Bitcoin++ on Minisketch and Lightning gossip: https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/ Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] for a while now I was especially interested in how the challenges of using Minisketch for Lightning gossip (node_announcement, channel_announcement, channel_update messages) would differ to the challenges of using Minisketch for transaction relay on the base layer. I guess one of the major differences is full nodes are trying to verify a block every 10 minutes (on average) and so there is a sense of urgency to get the transactions of the next block to be mined. With Lightning gossip unless you are planning to send a payment (or route a payment) across a certain route you are less concerned about learning about the current state of the network urgently. If a new channel pops up you might choose not to route through it regardless given its "newness" and its lack of track record of successfully routing payments. There are parts of the network you care less about (if they can't help you get to your regular destinations say) whereas with transaction relay you have to care about all transactions (paying a sufficient fee rate). "The problem that Bitcoin faced with transaction relay was pretty similar but there are a few differences.For one, any time you introduce that short hash function that produces a 64 bit fingerprint you have to be concerned with collisions between hash functions. Someone could potentially take advantage of that and grind out a hash that would resolve to the same fingerprint." Could you elaborate on this? Why are hash collisions a concern for Lightning gossip and not for Erlay? Is it not a DoS vector for both? It seems you are leaning towards per-peer sketches with inventory sets (like Erlay) rather than global sketches. This makes sense to me and seems to be moving in a direction where your peer connections are more stable as you are storing data on what your peer's understanding of the network is. There could even be centralized APIs which allow you to compare your current understanding of the network to the centralized service's understanding. (Of course we don't want to have to rely on centralized services or bake them into the protocol if you don't want to use them.) Erlay falls back to flooding if the set reconciliation algorithm doesn't work which I'm assuming you'll do with Lightning gossip. I was also surprised to hear that channel_update made up 97 percent of gossip messages. Isn't it recommended that you don't make too changes to your channel as it is likely to result in failed routed payments and being dropped as a routing node for future payments? It seems that this advice isn't being followed if there are so many channel_update messages being sent around. I almost wonder if Lightning implementations should include user prompts like "Are you sure you want to update your channel given this may affect your routing success?" :) Thanks Michael P.S. Are we referring to "routing nodes" as "forwarding nodes" now? I've noticed "forwarding nodes" being used more recently on this list. [0]: https://github.com/bitcoin/bitcoin/pull/21515 -- Michael Folkson Email: michaelfolkson at [protonmail.com](http://protonmail.com/) Keybase: michaelfolkson PGP: 43ED C999 9F85 1D40 EAF4 9835 92D6 0159 214C FEE3 --- Original Message --- On Thursday, April 14th, 2022 at 22:00, Alex Myers wrote: > Hello lightning developers, > > I’ve been investigating set reconciliation as a means to reduce bandwidth and > redundancy of gossip message propagation. This builds on some earlier work > from Rusty using the minisketch library [1]. The idea is that each node will > build a sketch representing it’s own gossip set. Alice’s node will encode and > transmit this sketch to Bob’s node, where it will be merged with his own > sketch, and the differences produced. These differences should ideally be > exactly the latest missing gossip of both nodes. Due to size constraints, the > set differences will necessarily be encoded, but Bob’s node will be able to > identify which gossip Alice is missing, and may then transmit exactly those > messages. > > This process is relatively straightforward, with the caveat that the sets > must otherwise match very closely (each sketch has a maximum capacity for > differences.) The difficulty here is that each node and lightning > implementation may have its own rules for gossip acceptance and propagation. > Depending on their gossip partners, not all gossip may propagate to the > entire network. > > Core-lightning implements rate limiting for incoming channel updates and node > announcements. The default rate limit is 1 per day, with a burst of 4. I > analyzed my node’s gossip over a 14 day period, and found that, of all >
Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality
Good morning aj, > On Sun, Jun 05, 2022 at 02:29:28PM +, ZmnSCPxj via Lightning-dev wrote: > > Just sharing my thoughts on this. > > > Introduction > > > > Optimize for reliability+ > > uncertainty+fee+drain+uptime... > > .--~~--. > > / \ > > / \ > > / \ > > / \ > > / \ > > --' `-- > > Just Just > > optimize optimize > > for for > > low fee low fee > > > I think ideally you want to optimise for some combination of fee, speed > and reliability (both liklihood of a clean failure that you can retry > and of generating stuck payments). As Matt/Peter suggest in another > thread, maybe for some uses you can accept low speed for low fees, > while in others you'd rather pay more and get near-instant results. I > think drain should just go to fee, and uncertainty/uptime are just ways > of estimating reliability. > > It might be reasonable to generate local estimates for speed/reliability > by regularly sending onion messages or designed-to-fail htlcs. > > Sorry if that makes me a midwit :) Actually feerate cards help with this; it just requires an economic insight to translate probability-of-success to an actual cost that the payer incurs. > > ### Inverting The Filter: Feerate Cards > > Basically, a feerate card is a mapping between a probability-of-success > > range and a feerate. > > * 00%->25%: -10ppm > > * 26%->50%: 1ppm > > * 51%->75%: 5ppm > > * 76%->100%: 50ppm > > > Feerate cards don't really make sense to me; "probability of success" > isn't a real measure the payer can use -- naively, if it were, they could > just retry at 1ppm 10 times and get to 95% chances of success. But if > they can afford to retry (background rebalancing?), they might as well > just try at -10ppm, 1ppm, 5ppm, 10ppm (or perhaps with a binary search?), > and see if they're lucky; but if they want a 1s response time, and can't > afford retries, what good is even a 75% chance of success if that's the > individual success rate on each hop of their five hop path? The economic insight here is this: * The payer wants to pay because it values a service / product more highly than the sats they are spending. * There is a subjective difference in value between the service / product being bought and the amount to be spent. * In short, if the payment succeeds and the service / product is acquired, then the payer perceives itself as richer (increased utilons) by that subjective difference. * If payment fails, then the payer incurs an opportunity cost, as it is unable to utilize the difference in subjective value between the service / product and the sats being spent. * Thus, the subjective difference in value between the service / product being bought, and the sats to be paid, is the cost of payment failure. * That difference in value is the "fee budget" that Lightning Network payment algorithms all require as an argument. * If the LN fee total is greater than the fee budget, the payment algorithm will reject that path outright. * If the LN fee total is greater than the subjective difference in value between the service / product being bought and the amount to be delivered at the destination, then the payer gets negative utility and would prefer not to continue paying --- which is exactly what the payment algorithm does, it rejects such paths. Therefore the fee budget is the cost of failure. We can now use the left-hand side of the feerate card table, by multiplying `100% - middle_probability_of_success` (i.e. probability of failure) by the fee budget (i.e. cost of failure), and getting the cost-of-failure-for-this-entry. We then evaluate the fee card by plugging this in to each entry of the feerate card, and picking which entry gives the lowest total fee. This is then added as a fee in payment algorithms, thus translated down to "just optimize for low fee". If the above logic seems dubious, consider this: * Nodes utilizing wall strategies and doing lots of rebalancing put low limits on the fee budget of the rebalancing cost. * These nodes are willing to try lots of possible routes, hoping to nab the liquidity of a low-fee node on the cheap in order to resell it later. * i.e. those nodes are fine with taking a long time to successfully route a payment from themselves to themselves; they absolutely insist on low fees or else they will not earn anything. * Such nodes are fine with low probability of success. * Being fine with low probability of success means that the effect of the left-hand side of the feerate card is smaller and such nodes will tend to get the low probability of success entries. * Buyers getting FOMOed into buying some neat new widget want to get their grubby hands on the widget ASAP. * These nodes are willing to pay a premium to get the neat new widget RIGHT NOW. * i.e. these nodes will be willing to provide a higher fee budget. * Being fine with a higher fee budget means that the effect of the left-hand side of the feerate card is larger and such nodes
Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation
Hi Michael, Thanks for the transcript and the questions, especially those you asked in Gleb's original Erlay presentation. I tried to cover a lot of ground in only 30 minutes and the finer points may have suffered. The most significant difference in concern between bitcoin transaction relay and lightning gossip may be one of privacy: Source nodes of Bitcoin transactions have an interest in privacy (avoid trivially triangulating the source.) Lightning gossip is already signed by and linked to a node ID - the source is completely transparent by nature. The lack of a timing concern would allow for a global sketch where it would have been infeasible for Erlay (among other reasons such as DoS.) > Why are hash collisions a concern for Lightning gossip and not for Erlay? Is > it not a DoS vector for both? If lightning gossip were encoded for minisketch entries with the short_channel_id, it would create a unique fingerprint by default thanks to referencing the unique funding transaction on chain - no hashing required. This was Rusty's original concept and what I had been proceeding with. However, given the ongoing privacy discussion and desire to eventually decouple lightning channels from their layer one funding transaction (gossip v2), I think we should prepare for a future in which channels are not explicitly linked to a SCID. That means hashing just as in Erlay and the same DoS vector would be present. Salting with a per-peer shared secret works here, but the solution is driven back toward inventory sets. > It seems you are leaning towards per-peer sketches with inventory sets (like > Erlay) rather than global sketches. Yes. There are pros and cons to each method, but most critically, this would be compatible with eventual removal of the SCID. > Erlay falls back to flooding if the set reconciliation algorithm doesn't work > which I'm assuming you'll do with Lightning gossip. Fallback will take some consideration (Erlay's bisect is an elegant feature), but yes, flooding is still the ultimate fallback. > I was also surprised to hear that channel_update made up 97 percent of gossip > messages. Isn't it recommended that you don't make too changes to your > channel as it is likely to result in failed routed payments and being dropped > as a routing node for future payments? It seems that this advice isn't being > followed if there are so many channel_update messages being sent around. I > almost wonder if Lightning implementations should include user prompts like > "Are you sure you want to update your channel given this may affect your > routing success?" :) Running the numbers, I currently see 15,761 public nodes on the network and 148,295 half channels. Those each need refreshed gossip every two weeks. By default that would result in 90% channel updates. That we're seeing roughly three times as many channel updates vs node announcements compared to what's strictly required is maybe not that surprising. I agree, there would be a benefit to nodes taking a more active role in tracking calls to broadcast gossip. Thanks, Alex --- Original Message --- On Wednesday, June 29th, 2022 at 6:09 AM, Michael Folkson wrote: > Thanks for this Alex. > > Here's a transcript of your recent presentation at Bitcoin++ on Minisketch > and Lightning gossip: > > https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/ > > Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] > for a while now I was especially interested in how the challenges of using > Minisketch for Lightning gossip (node_announcement, channel_announcement, > channel_update messages) would differ to the challenges of using Minisketch > for transaction relay on the base layer. > > I guess one of the major differences is full nodes are trying to verify a > block every 10 minutes (on average) and so there is a sense of urgency to get > the transactions of the next block to be mined. With Lightning gossip unless > you are planning to send a payment (or route a payment) across a certain > route you are less concerned about learning about the current state of the > network urgently. If a new channel pops up you might choose not to route > through it regardless given its "newness" and its lack of track record of > successfully routing payments. There are parts of the network you care less > about (if they can't help you get to your regular destinations say) whereas > with transaction relay you have to care about all transactions (paying a > sufficient fee rate). > > "The problem that Bitcoin faced with transaction relay was pretty similar but > there are a few differences.For one, any time you introduce that short hash > function that produces a 64 bit fingerprint you have to be concerned with > collisions between hash functions. Someone could potentially take advantage > of that and grind out a hash that would resolve to the same
Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality
On Wed, Jun 29, 2022 at 12:38:17PM +, ZmnSCPxj wrote: > > > ### Inverting The Filter: Feerate Cards > > > Basically, a feerate card is a mapping between a probability-of-success > > > range and a feerate. > > > * 00%->25%: -10ppm > > > * 26%->50%: 1ppm > > > * 51%->75%: 5ppm > > > * 76%->100%: 50ppm > The economic insight here is this: > * The payer wants to pay because it values a service / product more highly > than the sats they are spending. > * If payment fails, then the payer incurs an opportunity cost, as it is > unable to utilize the difference in subjective value between the service / > product and the sats being spent. (If payment fails, the only opportunity cost they incur is that they can't use the funds that they locked up for the payment. The opportunity cost is usually considered to occur when the payment succeeds: at that point you've lost the ability to use those funds for any other purpose) > * Thus, the subjective difference in value between the service / product > being bought, and the sats to be paid, is the cost of payment failure. If you couldn't successfully route the payment at any price, you never had the opportunity to buy whatever the thing was. > We can now use the left-hand side of the feerate card table, by multiplying > `100% - middle_probability_of_success` (i.e. probability of failure) by the > fee budget (i.e. cost of failure), and getting the > cost-of-failure-for-this-entry. I don't think that makes much sense; your expected gain if you just try one option is: (1-p)*0 + p*cost*(benefit/cost - fee) where p is the probability of success that corresponds with the fee. I don't think you can do that calculation with a range; if I fix the probabilities as: 12.5% -10ppm 27.5%1ppm 62.5%5ppm 87.5% 50ppm then that approach chooses: -10 ppm if the benefit/cost is in (-10ppm, 8.77ppm) 5 ppm if the benefit/cost is in [8.77ppm, 162.52ppm) 50 ppm if the benefit/cost is >= 162.52ppm so for that policy, one of those entries is already irrelevant. But that just feels super unrealistic to me. If your benefit is 8ppm, and you try at -10ppm, and that fails, why wouldn't you try again at 5ppm? That means the real calculation is: p1*(benefit/cost - fee1) + (p2-p1)*(benefit/cost - fee2 - retry_delay) - (1-p2)*(2*retry_delay) Which is: p2*(benefit/cost) - p1*fee1 - (p2-p1)*fee2 - (2-p1-p2)*retry_delay My feeling is that the retry_delay factor's going to dominate... That's also only considering one hop; to get the entire path, you need them all to succeed, giving an expected benefit (for a particular combination of rate card entries) of: (p1*p2*p3*p4*p5)*cost*(benefit/cost - (fee1 + fee2 + fee3 + fee4 + fee5) And (p1*..*p5) is going to be pretty small in most cases -- 5 hops at 87.5% each already gets you down to only a 51% total chance of success. And there's an exponential explosion of combinations, if each of the 5 hops has 4 options on their rate card, that's up to 1024 different options to be evaluated... > We then evaluate the fee card by plugging this in to each entry of the > feerate card, and picking which entry gives the lowest total fee. I don't think that combines hops correctly. For example, if the rate cards for hop1 and hop2 are both: 10% 10ppm 100% 92ppm and your expected benefit/cost is 200ppm (so 100ppm per hop), then treated individually you get: 10%*(100ppm - 10ppm) = 9ppm <-- this one! 100%*(100ppm - 92ppm) = 8ppm but treated together, you get: 1%*(200ppm - 20ppm) = 1.8ppm 10%*(200ppm - 102ppm) = 9.8ppm (twice) 100%*(200ppm - 184ppm) = 16ppm <-- this one! > This is then added as a fee in payment algorithms, thus translated down to > "just optimize for low fee". You're not optimising for low fee though, you're optimising for maximal expected value, assuming you can't retry. But you can retry, and probably in reality also want to minimise the chance of failure up to some threshold. For example: if I buy a coffee with lightning every week day for a year, that's 250 days, so maybe I'd like to choose a fee so that my payment failure rate is <0.4%, to avoid embarassment and holding up the queue. > * Nodes utilizing wall strategies and doing lots of rebalancing put low > limits on the fee budget of the rebalancing cost. > * These nodes are willing to try lots of possible routes, hoping to nab the > liquidity of a low-fee node on the cheap in order to resell it later. > * Such nodes are fine with low probability of success. Sure. But in that case, they don't care about delays, so why wouldn't they just try the lowest fee rates all the time, no matter what their expected value is? They can retry once an hour indefinitely, and eventually they should get lucky, if the rate card's even remotely accurate. (Though chances are they won't get -10ppm lucky for the entire path) Finding out that you're paying 50ppm at the exact same time someone else is
Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality
Good morning aj, > On Wed, Jun 29, 2022 at 12:38:17PM +, ZmnSCPxj wrote: > > > > > ### Inverting The Filter: Feerate Cards > > > > Basically, a feerate card is a mapping between a probability-of-success > > > > range and a feerate. > > > > * 00%->25%: -10ppm > > > > * 26%->50%: 1ppm > > > > * 51%->75%: 5ppm > > > > * 76%->100%: 50ppm > > > > The economic insight here is this: > > > > * The payer wants to pay because it values a service / product more > > > > highly than the sats they are spending. > > > * If payment fails, then the payer incurs an opportunity cost, as it is > > unable to utilize the difference in subjective value between the service / > > product and the sats being spent. > > > (If payment fails, the only opportunity cost they incur is that they > can't use the funds that they locked up for the payment. The opportunity > cost is usually considered to occur when the payment succeeds: at that > point you've lost the ability to use those funds for any other purpose) I think you misunderstand me completely. The "payment fails" term here means that *all possible routes below the fee budget have failed*, i.e. a complete payment failure that will cause your `pay` command to error out with a frownie face. In that case, the payer is unable to purchase the service or product. The opportunity cost they lose is the lack of the service or product; they keep the value of the sats that did not get paid, but lose the value of the service or product they wanted to buy in the first place. In that case, the payer loses the subjective difference in value between the service / product, and the sats they would have paid. Regards, ZmnSCPxj ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Hi Rusty, Thanks for the feedback! > This is over-design: if you fail to get reliable gossip, your routing will > suffer anyway. Nothing new here. Idk, it's pretty simple: you're already watching for closes, so if a close looks a certain way, it's a splice. When you see that, you can even take note of the _new_ channel size (funds added/removed) and update your pathfinding/blindedpaths/hophints accordingly. If this is an over-designed solution, that I'd categorize _only_ waiting N blocks as wishful thinking, given we have effectively no guarantees w.r.t how long it'll take a message to propagate. If by routing you mean a routing node then: no, a routing node doesn't even really need the graph at all to do their job. If by routing you mean a sender, then imo still no: you don't necessarily need _all_ gossip, just the latest policies of the nodes you route most frequently to. On top of that, since you can get the latest policy each time you incur a routing failure, as you make payments, you'll get the latest policies of the nodes you care about over time. Also consider that you might fail to get "reliable" gossip, simply just due to your peer neighborhood aggressively rate limiting gossip (they only allow 1 update a day for a node, you updated your fee, oops, no splice msg for you). So it appears you don't agree that the "wait N blocks before you close your channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? >From my PoV, the whole point of even signalling that a splice is on going, is for the sender's/receivers: they can continue to send/recv payments over the channel while the splice is in process. It isn't that a node isn't getting any gossip, it's that if the node fails to obtain the gossip message within the N block period of time, then the channel has effectively closed from their PoV, and it may be an hour+ until it's seen as a usable (new) channel again. If there isn't a 100% reliable way to signal that a splice is in progress, then this disincentives its usage, as routers can lose out on potential fee revenue, and sends/receivers may grow to favor only very long lived channels. IMO _only_ having a gossip message simply isn't enough: there're no real guarantees w.r.t _when_ all relevant parties will get your gossip message. So why not give them a 100% reliable on chain signal that: something is in progress here, stay tuned for the gossip message, whenever you receive that. -- Laolu On Tue, Jun 28, 2022 at 6:40 PM Rusty Russell wrote: > Hi Roasbeef, > > This is over-design: if you fail to get reliable gossip, your routing > will suffer anyway. Nothing new here. > > And if you *know* you're missing gossip, you can simply delay onchain > closures for longer: since nodes should respect the old channel ids for > a while anyway. > > Matt's proposal to simply defer treating onchain closes is elegant and > minimal. We could go further and relax requirements to detect onchain > closes at all, and optionally add a perm close message. > > Cheers, > Rusty. > > Olaoluwa Osuntokun writes: > > Hi y'all, > > > > This mail was inspired by this [1] spec PR from Lisa. At a high level, it > > proposes the nodes add a delay between the time they see a channel > closed on > > chain, to when they remove it from their local channel graph. The motive > > here is to give the gossip message that indicates a splice is in process, > > "enough" time to propagate through the network. If a node can see this > > message before/during the splicing operation, then they'll be able relate > > the old and the new channels, meaning it's usable again by > senders/receiver > > _before_ the entire chain of transactions confirms on chain. > > > > IMO, this sort of arbitrary delay (expressed in blocks) won't actually > > address the issue in practice. The proposal suffers from the following > > issues: > > > > 1. 12 blocks is chosen arbitrarily. If for w/e reason an announcement > > takes longer than 2 hours to reach the "economic majority" of > > senders/receivers, then the channel won't be able to mask the splicing > > downtime. > > > > 2. Gossip propagation delay and offline peers. These days most nodes > > throttle gossip pretty aggressively. As a result, a pair of nodes doing > > several in-flight splices (inputs become double spent or something, so > > they need to try a bunch) might end up being rate limited within the > > network, causing the splice update msg to be lost or delayed > significantly > > (IIRC CLN resets these values after 24 hours). On top of that, if a > peer > > is offline for too long (think mobile senders), then they may miss the > > update all together as most nodes don't do a full historical > > _channel_update_ dump anymore. > > > > In order to resolve these issues, I think instead we need to rely on the > > primary splicing signal being sourced from the chain itself. In other > words, > > if I see a channel close, and a closing transaction "looks"
Re: [Lightning-dev] Onion messages rate-limiting
Hi t-bast, Happy to see this finally written up! With this, we have two classes of proposals for rate limiting onion messaging: 1. Back propagation based rate limiting as described here. 2. Allowing nodes to express a per-message cost for their forwarding services, which is described here [1]. I still need to digest everything proposed here, but personally I'm more optimistic about the 2nd category than the 1st. One issue I see w/ the first category is that a single party can flood the network and cause nodes to trigger their rate limits, which then affects the usability of the onion messages for all other well-behaving parties. An example, this might mean I can't fetch invoices, give up after a period of time (how long?), then result to a direct connection (perceived payment latency accumulated along the way). With the 2nd route, if an attacker floods the network, they need to directly pay for the forwarding usage themselves, though they may also directly cause nodes to adjust their forwarding rate accordingly. However in this case, the attacker has incurred a concrete cost, and even if the rates rise, then those that really need the service (notifying an LSP that a user is online or w/e) can continue to pay that new rate. In other words, by _pricing_ the resource utilization, demand preferences can be exchanged, leading to more efficient long term resource allocation. W.r.t this topic, one event that imo is worth pointing out is that a very popular onion routing system, Tor, has been facing a severe DDoS attack that has lasted weeks, and isn't yet fully resolved [2]. The on going flooding attack on Tor has actually started to affect LN (iirc over half of all public routing nodes w/ an advertised address are tor-only), and other related systems like Umbrel that 100% rely on tor for networking traversal. Funnily enough, Tor developers have actually suggested adding some PoW to attempt to mitigate DDoS attacks [3]. In that same post they throw around the idea of using anonymous tokens to allow nodes to give them to "good" clients, which is pretty similar to my lofty Forwarding Pass idea as relates to onion messaging, and also general HTLC jamming mitigation. In summary, we're not the first to attempt to tackle the problem of rate limiting relayed message spam in an anonymous/pseudonymous network, and we can probably learn a lot from what is and isn't working w.r.t how Tor handles things. As you note near the end of your post, this might just be the first avenue in a long line of research to best figure out how to handle the spam concerns introduced by onion messaging. From my PoV, it still seems to be an open question if the same network can be _both_ a reliable micro-payment system _and_ also a reliable arbitrary message transport layer. I guess only time will tell... > The `shared_secret_hash` field contains a BIP 340 tagged hash Any reason to use the tagged hash here vs just a plain ol HMAC? Under the hood, they have a pretty similar construction [4]. [1]: https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html [2]: https://status.torproject.org/issues/2022-06-09-network-ddos/ [3]: https://blog.torproject.org/stop-the-onion-denial/ [4]: https://datatracker.ietf.org/doc/html/rfc2104 -- Laolu On Wed, Jun 29, 2022 at 1:28 AM Bastien TEINTURIER wrote: > During the recent Oakland Dev Summit, some lightning engineers got together > to discuss DoS > protection for onion messages. Rusty proposed a very simple rate-limiting > scheme that > statistically propagates back to the correct sender, which we describe in > details below. > > You can also read this in gist format if that works better for you [1]. > > Nodes apply per-peer rate limits on _incoming_ onion messages that should be > relayed (e.g. > N/seconds with some burst tolerance). It is recommended to allow more onion > messages from > peers with whom you have channels, for example 10/seconds when you have a > channel and 1/second > when you don't. > > When relaying an onion message, nodes keep track of where it came from (by > using the `node_id` of > the peer who sent that message). Nodes only need the last such `node_id` per > outgoing connection, > which ensures the memory footprint is very small. Also, this data doesn't > need to be persisted. > > Let's walk through an example to illustrate this mechanism: > > * Bob receives an onion message from Alice that should be relayed to Carol > * After relaying that message, Bob stores Alice's `node_id` in its > per-connection state with Carol > * Bob receives an onion message from Eve that should be relayed to Carol > * After relaying that message, Bob replaces Alice's `node_id` with Eve's > `node_id` in its > per-connection state with Carol > * Bob receives an onion message from Alice that should be relayed to Dave > * After relaying that message, Bob stores Alice's `node_id` in its > per-connection state with Dave > * ... > > We introduce a new
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Adding a noticeable on-chain signal runs counter to the goal of the move to taproot / gossip v2, which is to make lightning's onchain footprint indistinguishable from any other onchain usage. I'm admittedly a bit confused as to why onchain signals are even being seriously proposed. Aside from "infallibility", is there another reason for suggesting we add an onchain detectable signal for this? Seems heavy handed imo, given that the severity of a comms failure is pretty minimal (*potential* for lost routing fees). > So it appears you don't agree that the "wait N blocks before you close your channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? fwiw I seem to remember seeing that it takes ~an hour for gossip to propagate (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first estimate. I trust we'll have time to tune this after we've had some real-world experience with them. Further, we can always add more robust signaling later, if lost routing fees turns out to be a huge issue. Finally, worth noting that Alex Myer's minisketch project may well help/improve gossip reconciliation efficiency to the point where gossip reliability is less of an issue. ~nifty ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Onion messages rate-limiting
Thanks Bastien for writing this up! This is a pretty trivial and straightforward way to rate-limit onion messages in a way that allows legitimate users to continue using the system in spite of some bad actors trying (and failing, due to being rate-limited) to DoS the network. I do think any spec for this shouldn't make any recommendations about willingness to relay onion messages for anonymous no-channel third parties, if anything deliberately staying mum on it and allowing nodes to adapt policy (and probably rate-limit no-channel third-parties before they rate limit any peer they have a channel with). Ultimately, we have to assume that nodes will attempt to send onion messages by routing through the existing channel graph, so there's little reason to worry too much about ensuring ability to relay for anonymous parties. Better yet, as Val points out, requiring a channel to relay onion messages puts a very real, nontrivial (in a world of msats) cost to getting an onion messaging channel. Better yet, with backpressure ability to DoS onion message links isn't denominated in number of messages, but instead in number of channels you are able to create, making the backpressure system equivalent to today's HTLC DoS considerations, whereas explicit payment allows an attacker to pay much less to break the system. As for the proposal to charge for onion messages, I'm still not at all sure where its coming from. It seems to flow from a classic "have a hammer (a system to make micropayments for things), better turn this problem into a nail (by making users pay for it)" approach, but it doesn't actually solve the problem at hand. Even if you charge for onion messages, users may legitimately want to send a bunch of payments in bulk, and trivially overflow a home or Tor nodes' bandwidth. The only response to that, whether its a DoS attack or a legitimate user, is to rate-limit, and to rate-limit in a way that tells the user sending the messages to back off! Sure, you could do that by failing onion messages with an error that updates the fee you charge, but you're ultimately doing a poor-man's (or, I suppose, rich-man's) version of what Bastien proposes, not adding some fundamental difference. Ultimately, paying suffers from the standard PoW-for-spam issue - you cannot assign a reasonable cost that an attacker cares about without impacting the system's usability due to said cost. Indeed, making it expensive enough to mount a months-long DDoS without impacting legitimate users be pretty easy - at 1msat per relay of a 1366 byte onion message you can only saturate an average home users' 30Mbps connection for 30 minutes before you rack up a dollar in costs, but if your concern is whether someone can reasonably trivially take out the network for minutes at a time to make it have perceptibly high failure rates, no reasonable cost scheme will work. Quite the opposite - the only reasonable way to respond is to respond to a spike in traffic while maintaining QoS is to rate-limit by inbound edge! Ultimately, what we have here is a networking problem, that has to be solved with networking solutions, not a costing problem, which can be solved with payment. I can only assume that the desire to add a cost to onion messages ultimately stems from a desire to ensure every possible avenue for value extraction is given to routing nodes, but I think that desire is misplaced in this case - the cost of bandwidth is diminutive compared to other costs of routing node operation, especially when you consider sensible rate-limits as proposed in Bastien's email. Indeed, if anyone were proposing rate-limits which would allow anything close to enough bandwidth usage to cause "lightning is turning into Tor and has Tor's problems" to be a legitimate concern I'd totally agree we should charge for its use. But no one is, nor has anyone ever seriously, to my knowledge, proposed such a thing. If lightning messages get deployed and start eating up even single Mbps's on a consistent basis on nodes, we can totally revisit this, its not like we are shutting the door to any possible costing system if it becomes necessary, but rate-limiting has to happen either way, so we should start there and see if we need costing, not jump to costing on day one, hampering utility. Matt On 6/29/22 8:22 PM, Olaoluwa Osuntokun wrote: Hi t-bast, Happy to see this finally written up! With this, we have two classes of proposals for rate limiting onion messaging: 1. Back propagation based rate limiting as described here. 2. Allowing nodes to express a per-message cost for their forwarding services, which is described here [1]. I still need to digest everything proposed here, but personally I'm more optimistic about the 2nd category than the 1st. One issue I see w/ the first category is that a single party can flood the network and cause nodes to trigger their rate limits, which then affects the usability of the
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Had another thought: if you've seen a chain close but also have a gossip message that indicates this is a splice, you SHOULD propagate that gossip more urgently/widely than any other gossip you've got. Adding an urgency metric to gossip is fuzzy to enforce... *handwaves*. You *do* get the onchain signal, we just change the behavior of the secondary information system instead of embedding the info into the chain.. "Spamming" gossip with splices expensive -- there's a real-world cost (onchain fees) to closing a channel (the signal to promote/prioritize a gossip msg) which cuts down on the ability to send out these 'urgent' messages with any frequency. ~nifty On Wed, Jun 29, 2022 at 7:43 PM lisa neigut wrote: > Adding a noticeable on-chain signal runs counter to the goal of the move > to taproot / gossip v2, which is to make lightning's onchain footprint > indistinguishable from > any other onchain usage. > > I'm admittedly a bit confused as to why onchain signals are even being > seriously > proposed. Aside from "infallibility", is there another reason for > suggesting > we add an onchain detectable signal for this? Seems heavy handed imo, > given > that the severity of a comms failure is pretty minimal (*potential* for > lost routing fees). > > > So it appears you don't agree that the "wait N blocks before you close > your > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? > > fwiw I seem to remember seeing that it takes ~an hour for gossip to > propagate > (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first > estimate. > I trust we'll have time to tune this after we've had some real-world > experience with them. > > Further, we can always add more robust signaling later, if lost routing > fees turns > out to be a huge issue. > > Finally, worth noting that Alex Myer's minisketch project may well > help/improve gossip > reconciliation efficiency to the point where gossip reliability is less > of an issue. > > ~nifty > > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Olaoluwa Osuntokun writes: > Hi Rusty, > > Thanks for the feedback! > >> This is over-design: if you fail to get reliable gossip, your routing will >> suffer anyway. Nothing new here. > > Idk, it's pretty simple: you're already watching for closes, so if a close > looks a certain way, it's a splice. When you see that, you can even take > note of the _new_ channel size (funds added/removed) and update your > pathfinding/blindedpaths/hophints accordingly. Why spam the chain? > If this is an over-designed solution, that I'd categorize _only_ waiting N > blocks as wishful thinking, given we have effectively no guarantees w.r.t > how long it'll take a message to propagate. Sure, it's a simplification on "wait 6 blocks plus 30 minutes". > If by routing you mean a sender, then imo still no: you don't necessarily > need _all_ gossip, just the latest policies of the nodes you route most > frequently to. On top of that, since you can get the latest policy each time > you incur a routing failure, as you make payments, you'll get the latest > policies of the nodes you care about over time. Also consider that you might > fail to get "reliable" gossip, simply just due to your peer neighborhood > aggressively rate limiting gossip (they only allow 1 update a day for a > node, you updated your fee, oops, no splice msg for you). There's no ratelimiting on new channel announcements? > So it appears you don't agree that the "wait N blocks before you close your > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? Because it's simple. >>From my PoV, the whole point of even signalling that a splice is on going, > is for the sender's/receivers: they can continue to send/recv payments over > the channel while the splice is in process. It isn't that a node isn't > getting any gossip, it's that if the node fails to obtain the gossip message > within the N block period of time, then the channel has effectively closed > from their PoV, and it may be an hour+ until it's seen as a usable (new) > channel again. Sure. If you want to not forget channels at all on close, that works too. > If there isn't a 100% reliable way to signal that a splice is in progress, > then this disincentives its usage, as routers can lose out on potential fee > revenue, and sends/receivers may grow to favor only very long lived > channels. IMO _only_ having a gossip message simply isn't enough: there're > no real guarantees w.r.t _when_ all relevant parties will get your gossip > message. So why not give them a 100% reliable on chain signal that: > something is in progress here, stay tuned for the gossip message, whenever > you receive that. That's not 100% reliable at all. How long to you want for the new gossip? Just treat every close as signalling "stay tuned for the gossip message". That's reliable. And simple. Cheers, Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Onion messages rate-limiting
Heya Laolu, From my PoV, adding prepayments to onion messages is putting the cart before the horse a bit, think there's a good amount of recourse before resorting to that. Seems there are two cases to address here: 1. People are trying to stream GoT over lightning In this case, just rate limiting should disrupt their viewing experience such that it becomes unusable. Don’t think LN can be compared to Tor here because they explicitly want to support this case and we don’t. 2. An attacker is trying to flood the network with OMs In this case, IMO LN also can’t be compared to Tor because you can limit your OMs to channel partners only, and this in itself provides a “proof of work” that an attacker can’t surmount without actually opening channels. Another huge win of backpressure is that it only needs to happen in DoS situations, meaning it doesn’t have to impact users in the normal case. Cheers —Val --- Original Message --- On Wednesday, June 29th, 2022 at 8:22 PM, Olaoluwa Osuntokun wrote: > Hi t-bast, > > Happy to see this finally written up! With this, we have two classes of > proposals for rate limiting onion messaging: > > 1. Back propagation based rate limiting as described here. > > 2. Allowing nodes to express a per-message cost for their forwarding > services, which is described here [1]. > > I still need to digest everything proposed here, but personally I'm more > optimistic about the 2nd category than the 1st. > > One issue I see w/ the first category is that a single party can flood the > network and cause nodes to trigger their rate limits, which then affects the > usability of the onion messages for all other well-behaving parties. An > example, this might mean I can't fetch invoices, give up after a period of > time (how long?), then result to a direct connection (perceived payment > latency accumulated along the way). > > With the 2nd route, if an attacker floods the network, they need to directly > pay for the forwarding usage themselves, though they may also directly cause > nodes to adjust their forwarding rate accordingly. However in this case, the > attacker has incurred a concrete cost, and even if the rates rise, then > those that really need the service (notifying an LSP that a user is online > or w/e) can continue to pay that new rate. In other words, by _pricing_ the > resource utilization, demand preferences can be exchanged, leading to more > efficient long term resource allocation. > > W.r.t this topic, one event that imo is worth pointing out is that a very > popular onion routing system, Tor, has been facing a severe DDoS attack that > has lasted weeks, and isn't yet fully resolved [2]. The on going flooding > attack on Tor has actually started to affect LN (iirc over half of all > public routing nodes w/ an advertised address are tor-only), and other > related systems like Umbrel that 100% rely on tor for networking traversal. > Funnily enough, Tor developers have actually suggested adding some PoW to > attempt to mitigate DDoS attacks [3]. In that same post they throw around > the idea of using anonymous tokens to allow nodes to give them to "good" > clients, which is pretty similar to my lofty Forwarding Pass idea as relates > to onion messaging, and also general HTLC jamming mitigation. > > In summary, we're not the first to attempt to tackle the problem of rate > limiting relayed message spam in an anonymous/pseudonymous network, and we > can probably learn a lot from what is and isn't working w.r.t how Tor > handles things. As you note near the end of your post, this might just be > the first avenue in a long line of research to best figure out how to handle > the spam concerns introduced by onion messaging. From my PoV, it still seems > to be an open question if the same network can be _both_ a reliable > micro-payment system _and_ also a reliable arbitrary message transport > layer. I guess only time will tell... > >> The `shared_secret_hash` field contains a BIP 340 tagged hash > > Any reason to use the tagged hash here vs just a plain ol HMAC? Under the > hood, they have a pretty similar construction [4]. > > [1]: > https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html > [2]: https://status.torproject.org/issues/2022-06-09-network-ddos/ > [3]: https://blog.torproject.org/stop-the-onion-denial/ > [4]: https://datatracker.ietf.org/doc/html/rfc2104 > > -- Laolu > > On Wed, Jun 29, 2022 at 1:28 AM Bastien TEINTURIER wrote: > >> During the recent Oakland Dev Summit, some lightning engineers got together >> to discuss DoS >> protection for onion messages. Rusty proposed a very simple rate-limiting >> scheme that >> statistically propagates back to the correct sender, which we describe in >> details below. >> >> You can also read this in gist format if that works better for you [1]. >> >> Nodes apply per-peer rate limits on _incoming_ onion messages that should be >> relayed (e.g. >> N/seconds with some burst