[Lightning-dev] Onion messages rate-limiting

2022-06-29 Thread Bastien TEINTURIER
During the recent Oakland Dev Summit, some lightning engineers got
together to discuss DoS
protection for onion messages. Rusty proposed a very simple
rate-limiting scheme that
statistically propagates back to the correct sender, which we describe
in details below.

You can also read this in gist format if that works better for you [1].

Nodes apply per-peer rate limits on _incoming_ onion messages that
should be relayed (e.g.
N/seconds with some burst tolerance). It is recommended to allow more
onion messages from
peers with whom you have channels, for example 10/seconds when you
have a channel and 1/second
when you don't.

When relaying an onion message, nodes keep track of where it came from
(by using the `node_id` of
the peer who sent that message). Nodes only need the last such
`node_id` per outgoing connection,
which ensures the memory footprint is very small. Also, this data
doesn't need to be persisted.

Let's walk through an example to illustrate this mechanism:

* Bob receives an onion message from Alice that should be relayed to Carol
* After relaying that message, Bob stores Alice's `node_id` in its
per-connection state with Carol
* Bob receives an onion message from Eve that should be relayed to Carol
* After relaying that message, Bob replaces Alice's `node_id` with
Eve's `node_id` in its
per-connection state with Carol
* Bob receives an onion message from Alice that should be relayed to Dave
* After relaying that message, Bob stores Alice's `node_id` in its
per-connection state with Dave
* ...

We introduce a new message that will be sent when dropping an incoming
onion message because it
reached rate limits:

1. type: 515 (`onion_message_drop`)
2. data:
   * [`rate_limited`:`u8`]
   * [`shared_secret_hash`:`32*byte`]

Whenever an incoming onion message reaches the rate limit, the
receiver sends `onion_message_drop`
to the sender. The sender looks at its per-connection state to find
where the message was coming
from and relays `onion_message_drop` to the last sender, halving their
rate limits with that peer.

If the sender doesn't overflow the rate limit again, the receiver
should double the rate limit
after 30 seconds, until it reaches the default rate limit again.

The flow will look like:

Alice  Bob  Carol
  | | |
  |  onion_message  | |
  |>| |
  | |  onion_message  |
  | |>|
  | |onion_message_drop   |
  | |<|
  |onion_message_drop   | |
  |<| |

The `shared_secret_hash` field contains a BIP 340 tagged hash of the
Sphinx shared secret of the
rate limiting peer (in the example above, Carol):

* `shared_secret_hash = SHA256(SHA256("onion_message_drop") ||
SHA256("onion_message_drop") || sphinx_shared_secret)`

This value is known by the node that created the onion message: if
`onion_message_drop` propagates
all the way back to them, it lets them know which part of the route is
congested, allowing them
to retry through a different path.

Whenever there is some latency between nodes and many onion messages,
`onion_message_drop` may
be relayed to the incorrect incoming peer (since we only store the
`node_id` of the _last_ incoming
peer in our outgoing connection state). The following example highlights this:

 Eve   Bob  Carol
  |  onion_message  | |
  |>|  onion_message  |
  |  onion_message  |>|
  |>|  onion_message  |
  |  onion_message  |>|
  |>|  onion_message  |
|>|
Alice   |onion_message_drop   |
  |  onion_message  |+|
  |>|  onion_message ||
  | ||--->|
  | |||
  | |||
  | |||
  |onion_message_drop   |<---+|
  |<| |

In this example, Eve is spamming but `onion_message_drop` is
propagated back to Alice instead.
However, this scheme will _statistically_ penalize the right incoming
peer (with a probability
depending on the volume of onion messages that the spamming peer is
generating compared to the
volume of legitimate onion messages).

It is an interesting research problem to find formulas for those
probabilities to evaluate how
efficient this will be against various 

Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality

2022-06-29 Thread Anthony Towns
On Sun, Jun 05, 2022 at 02:29:28PM +, ZmnSCPxj via Lightning-dev wrote:

Just sharing my thoughts on this.

> Introduction
> 
>   Optimize for reliability+
>uncertainty+fee+drain+uptime...
>  .--~~--.
> /\
>/  \
>   /\
>  /  \
> /\
> _--'  `--_
> Just  Just
>   optimize  optimize
> for   for
>   low fee   low fee

I think ideally you want to optimise for some combination of fee, speed
and reliability (both liklihood of a clean failure that you can retry
and of generating stuck payments). As Matt/Peter suggest in another
thread, maybe for some uses you can accept low speed for low fees,
while in others you'd rather pay more and get near-instant results. I
think drain should just go to fee, and uncertainty/uptime are just ways
of estimating reliability.

It might be reasonable to generate local estimates for speed/reliability
by regularly sending onion messages or designed-to-fail htlcs.

Sorry if that makes me a midwit :)

> Rene Pickhardt also presented the idea of leaking friend-of-a-friend 
> balances, to help payers increase their payment reliability.

I think foaf (as opposed to global) gossip of *fee rates* is a very
interesting approach to trying to give nodes more *current* information,
without flooding the entire network with more traffic than it can
cope with.

> Now we can consider that *every channel is a marketplace*.
> What is being sold is the sats inside the channel.

(Really, the marketplace is a channel pair (the incoming channel and
the outgoing channel), and what's being sold is their relative balance)

> So my concrete proposal is that we can do the same friend-of-a-friend balance 
> leakage proposed by Rene, except we leak it using *existing* mechanisms --- 
> i.e. gossiping a `channel_update` with new feerates adjusted according to the 
> supply on the channel --- rather than having a new message to leak 
> friend-of-a-friend balance directly.

+42

> Because we effectively leak the balance of channels by the feerates on the 
> channel, this totally leaks the balance of channels.

I don't think this is true -- you ideally want to adjust fees not to
maintain a balanced channel (50% on each side), but a balanced *flow*
(1:1 incoming/outgoing payment volume) -- it doesn't really matter if
you get the balanced flow that results in an average of a 50:50, 80:20
or 20:80 ratio of channel balances (at least, it doesn't as long as your
channel capacity is 10 or 100 times the payment size, and your variance
is correspondingly low).

Further, you have two degrees of freedom when setting fee rates: one
is how balanced the flows are, which controls how long your channel can
remain useful, but the other is how *much* flow there is -- if halving
your fee rate doubles the flow rate in sats/hour, then that will still
increase your profit. That also doesn't leak balance information.

> ### Inverting The Filter: Feerate Cards
> Basically, a feerate card is a mapping between a probability-of-success range 
> and a feerate.
> * 00%->25%: -10ppm
> * 26%->50%: 1ppm
> * 51%->75%: 5ppm
> * 76%->100%: 50ppm

Feerate cards don't really make sense to me; "probability of success"
isn't a real measure the payer can use -- naively, if it were, they could
just retry at 1ppm 10 times and get to 95% chances of success. But if
they can afford to retry (background rebalancing?), they might as well
just try at -10ppm, 1ppm, 5ppm, 10ppm (or perhaps with a binary search?),
and see if they're lucky; but if they want a 1s response time, and can't
afford retries, what good is even a 75% chance of success if that's the
individual success rate on each hop of their five hop path?

And if you're not just going by odds of having to retry, then you need to
get some current information about the channel to plug into the formula;
but if you're getting *current* information, why not let that information
be the feerate directly?

> More concretely, we set some high feerate, impose some kind of constant 
> "gravity" that pulls down the feerate over time, then we measure the relative 
> loss of outgoing liquidity to serve as "lift" to the feerate.

If your current fee rate is F (ppm), and your current volume (flow) is V
(sats forwarded per hour), then your profit is FV. If dropping your fee
rate by dF (<0) results in an increase of V by dV (>0), then you want:

   (F+dF)(V+dV) > FV
   FV + VdF + FdV + dFdV > FV
   FdV > -VdF
   dV/dF < -V/F (flip the inequality because dF is negative)

   (dV/V)/(dF/F) < -1  (fee-elasticity of volume is in the elastic
region)

(<-1 == elastic == flow changes more than the fee does == drop the fee
rate; >-1 == ineleastic == flow changes less than the fee does == raise
the fee rate; =-1 == unit elastic == 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-06-29 Thread Michael Folkson via Lightning-dev
Thanks for this Alex.

Here's a transcript of your recent presentation at Bitcoin++ on Minisketch and 
Lightning gossip:

https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/

Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] 
for a while now I was especially interested in how the challenges of using 
Minisketch for Lightning gossip (node_announcement, channel_announcement, 
channel_update messages) would differ to the challenges of using Minisketch for 
transaction relay on the base layer.

I guess one of the major differences is full nodes are trying to verify a block 
every 10 minutes (on average) and so there is a sense of urgency to get the 
transactions of the next block to be mined. With Lightning gossip unless you 
are planning to send a payment (or route a payment) across a certain route you 
are less concerned about learning about the current state of the network 
urgently. If a new channel pops up you might choose not to route through it 
regardless given its "newness" and its lack of track record of successfully 
routing payments. There are parts of the network you care less about (if they 
can't help you get to your regular destinations say) whereas with transaction 
relay you have to care about all transactions (paying a sufficient fee rate).

"The problem that Bitcoin faced with transaction relay was pretty similar but 
there are a few differences.For one, any time you introduce that short hash 
function that produces a 64 bit fingerprint you have to be concerned with 
collisions between hash functions. Someone could potentially take advantage of 
that and grind out a hash that would resolve to the same fingerprint."

Could you elaborate on this? Why are hash collisions a concern for Lightning 
gossip and not for Erlay? Is it not a DoS vector for both?

It seems you are leaning towards per-peer sketches with inventory sets (like 
Erlay) rather than global sketches. This makes sense to me and seems to be 
moving in a direction where your peer connections are more stable as you are 
storing data on what your peer's understanding of the network is. There could 
even be centralized APIs which allow you to compare your current understanding 
of the network to the centralized service's understanding. (Of course we don't 
want to have to rely on centralized services or bake them into the protocol if 
you don't want to use them.) Erlay falls back to flooding if the set 
reconciliation algorithm doesn't work which I'm assuming you'll do with 
Lightning gossip.

I was also surprised to hear that channel_update made up 97 percent of gossip 
messages. Isn't it recommended that you don't make too changes to your channel 
as it is likely to result in failed routed payments and being dropped as a 
routing node for future payments? It seems that this advice isn't being 
followed if there are so many channel_update messages being sent around. I 
almost wonder if Lightning implementations should include user prompts like 
"Are you sure you want to update your channel given this may affect your 
routing success?" :)

Thanks
Michael

P.S. Are we referring to "routing nodes" as "forwarding nodes" now? I've 
noticed "forwarding nodes" being used more recently on this list.

[0]: https://github.com/bitcoin/bitcoin/pull/21515

--
Michael Folkson
Email: michaelfolkson at [protonmail.com](http://protonmail.com/)
Keybase: michaelfolkson
PGP: 43ED C999 9F85 1D40 EAF4 9835 92D6 0159 214C FEE3

--- Original Message ---
On Thursday, April 14th, 2022 at 22:00, Alex Myers  wrote:

> Hello lightning developers,
>
> I’ve been investigating set reconciliation as a means to reduce bandwidth and 
> redundancy of gossip message propagation. This builds on some earlier work 
> from Rusty using the minisketch library [1]. The idea is that each node will 
> build a sketch representing it’s own gossip set. Alice’s node will encode and 
> transmit this sketch to Bob’s node, where it will be merged with his own 
> sketch, and the differences produced. These differences should ideally be 
> exactly the latest missing gossip of both nodes. Due to size constraints, the 
> set differences will necessarily be encoded, but Bob’s node will be able to 
> identify which gossip Alice is missing, and may then transmit exactly those 
> messages.
>
> This process is relatively straightforward, with the caveat that the sets 
> must otherwise match very closely (each sketch has a maximum capacity for 
> differences.) The difficulty here is that each node and lightning 
> implementation may have its own rules for gossip acceptance and propagation. 
> Depending on their gossip partners, not all gossip may propagate to the 
> entire network.
>
> Core-lightning implements rate limiting for incoming channel updates and node 
> announcements. The default rate limit is 1 per day, with a burst of 4. I 
> analyzed my node’s gossip over a 14 day period, and found that, of all 
> 

Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality

2022-06-29 Thread ZmnSCPxj via Lightning-dev
Good morning aj,

> On Sun, Jun 05, 2022 at 02:29:28PM +, ZmnSCPxj via Lightning-dev wrote:
>
> Just sharing my thoughts on this.
>
> > Introduction
> > 
> > Optimize for reliability+
> > uncertainty+fee+drain+uptime...
> > .--~~--.
> > / \
> > / \
> > / \
> > / \
> > / \
> > --' `--
> > Just Just
> > optimize optimize
> > for for
> > low fee low fee
>
>
> I think ideally you want to optimise for some combination of fee, speed
> and reliability (both liklihood of a clean failure that you can retry
> and of generating stuck payments). As Matt/Peter suggest in another
> thread, maybe for some uses you can accept low speed for low fees,
> while in others you'd rather pay more and get near-instant results. I
> think drain should just go to fee, and uncertainty/uptime are just ways
> of estimating reliability.
>
> It might be reasonable to generate local estimates for speed/reliability
> by regularly sending onion messages or designed-to-fail htlcs.
>
> Sorry if that makes me a midwit :)

Actually feerate cards help with this; it just requires an economic insight to 
translate probability-of-success to an actual cost that the payer incurs.


> > ### Inverting The Filter: Feerate Cards
> > Basically, a feerate card is a mapping between a probability-of-success 
> > range and a feerate.
> > * 00%->25%: -10ppm
> > * 26%->50%: 1ppm
> > * 51%->75%: 5ppm
> > * 76%->100%: 50ppm
>
>
> Feerate cards don't really make sense to me; "probability of success"
> isn't a real measure the payer can use -- naively, if it were, they could
> just retry at 1ppm 10 times and get to 95% chances of success. But if
> they can afford to retry (background rebalancing?), they might as well
> just try at -10ppm, 1ppm, 5ppm, 10ppm (or perhaps with a binary search?),
> and see if they're lucky; but if they want a 1s response time, and can't
> afford retries, what good is even a 75% chance of success if that's the
> individual success rate on each hop of their five hop path?

The economic insight here is this:

* The payer wants to pay because it values a service / product more highly than 
the sats they are spending.
* There is a subjective difference in value between the service / product being 
bought and the amount to be spent.
  * In short, if the payment succeeds and the service / product is acquired, 
then the payer perceives itself as richer (increased utilons) by that 
subjective difference.
* If payment fails, then the payer incurs an opportunity cost, as it is unable 
to utilize the difference in subjective value between the service / product and 
the sats being spent.
  * Thus, the subjective difference in value between the service / product 
being bought, and the sats to be paid, is the cost of payment failure.
* That difference in value is the "fee budget" that Lightning Network payment 
algorithms all require as an argument.
  * If the LN fee total is greater than the fee budget, the payment algorithm 
will reject that path outright.
  * If the LN fee total is greater than the subjective difference in value 
between the service / product being bought and the amount to be delivered at 
the destination, then the payer gets negative utility and would prefer not to 
continue paying --- which is exactly what the payment algorithm does, it 
rejects such paths.

Therefore the fee budget is the cost of failure.

We can now use the left-hand side of the feerate card table, by multiplying 
`100% - middle_probability_of_success` (i.e. probability of failure) by the fee 
budget (i.e. cost of failure), and getting the cost-of-failure-for-this-entry.
We then evaluate the fee card by plugging this in to each entry of the feerate 
card, and picking which entry gives the lowest total fee.
This is then added as a fee in payment algorithms, thus translated down to 
"just optimize for low fee".

If the above logic seems dubious, consider this:

* Nodes utilizing wall strategies and doing lots of rebalancing put low limits 
on the fee budget of the rebalancing cost.
  * These nodes are willing to try lots of possible routes, hoping to nab the 
liquidity of a low-fee node on the cheap in order to resell it later.
  * i.e. those nodes are fine with taking a long time to successfully route a 
payment from themselves to themselves; they absolutely insist on low fees or 
else they will not earn anything.
  * Such nodes are fine with low probability of success.
  * Being fine with low probability of success means that the effect of the 
left-hand side of the feerate card is smaller and such nodes will tend to get 
the low probability of success entries.
* Buyers getting FOMOed into buying some neat new widget want to get their 
grubby hands on the widget ASAP.
  * These nodes are willing to pay a premium to get the neat new widget RIGHT 
NOW.
  * i.e. these nodes will be willing to provide a higher fee budget.
  * Being fine with a higher fee budget means that the effect of the left-hand 
side of the feerate card is larger and such nodes 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-06-29 Thread Alex Myers
Hi Michael,

Thanks for the transcript and the questions, especially those you asked in 
Gleb's original Erlay presentation.

I tried to cover a lot of ground in only 30 minutes and the finer points may 
have suffered. The most significant difference in concern between bitcoin 
transaction relay and lightning gossip may be one of privacy: Source nodes of 
Bitcoin transactions have an interest in privacy (avoid trivially triangulating 
the source.) Lightning gossip is already signed by and linked to a node ID - 
the source is completely transparent by nature. The lack of a timing concern 
would allow for a global sketch where it would have been infeasible for Erlay 
(among other reasons such as DoS.)

> Why are hash collisions a concern for Lightning gossip and not for Erlay? Is 
> it not a DoS vector for both?

If lightning gossip were encoded for minisketch entries with the 
short_channel_id, it would create a unique fingerprint by default thanks to 
referencing the unique funding transaction on chain - no hashing required. This 
was Rusty's original concept and what I had been proceeding with. However, 
given the ongoing privacy discussion and desire to eventually decouple 
lightning channels from their layer one funding transaction (gossip v2), I 
think we should prepare for a future in which channels are not explicitly 
linked to a SCID. That means hashing just as in Erlay and the same DoS vector 
would be present. Salting with a per-peer shared secret works here, but the 
solution is driven back toward inventory sets.

> It seems you are leaning towards per-peer sketches with inventory sets (like 
> Erlay) rather than global sketches.

​
Yes. There are pros and cons to each method, but most critically, this would be 
compatible with eventual removal of the SCID.

> Erlay falls back to flooding if the set reconciliation algorithm doesn't work 
> which I'm assuming you'll do with Lightning gossip.

Fallback will take some consideration (Erlay's bisect is an elegant feature), 
but yes, flooding is still the ultimate fallback.

> I was also surprised to hear that channel_update made up 97 percent of gossip 
> messages. Isn't it recommended that you don't make too changes to your 
> channel as it is likely to result in failed routed payments and being dropped 
> as a routing node for future payments? It seems that this advice isn't being 
> followed if there are so many channel_update messages being sent around. I 
> almost wonder if Lightning implementations should include user prompts like 
> "Are you sure you want to update your channel given this may affect your 
> routing success?" :)

Running the numbers, I currently see 15,761 public nodes on the network and 
148,295 half channels. Those each need refreshed gossip every two weeks. By 
default that would result in 90% channel updates. That we're seeing roughly 
three times as many channel updates vs node announcements compared to what's 
strictly required is maybe not that surprising. I agree, there would be a 
benefit to nodes taking a more active role in tracking calls to broadcast 
gossip.

Thanks,
Alex

--- Original Message ---
On Wednesday, June 29th, 2022 at 6:09 AM, Michael Folkson 
 wrote:

> Thanks for this Alex.
>
> Here's a transcript of your recent presentation at Bitcoin++ on Minisketch 
> and Lightning gossip:
>
> https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/
>
> Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] 
> for a while now I was especially interested in how the challenges of using 
> Minisketch for Lightning gossip (node_announcement, channel_announcement, 
> channel_update messages) would differ to the challenges of using Minisketch 
> for transaction relay on the base layer.
>
> I guess one of the major differences is full nodes are trying to verify a 
> block every 10 minutes (on average) and so there is a sense of urgency to get 
> the transactions of the next block to be mined. With Lightning gossip unless 
> you are planning to send a payment (or route a payment) across a certain 
> route you are less concerned about learning about the current state of the 
> network urgently. If a new channel pops up you might choose not to route 
> through it regardless given its "newness" and its lack of track record of 
> successfully routing payments. There are parts of the network you care less 
> about (if they can't help you get to your regular destinations say) whereas 
> with transaction relay you have to care about all transactions (paying a 
> sufficient fee rate).
>
> "The problem that Bitcoin faced with transaction relay was pretty similar but 
> there are a few differences.For one, any time you introduce that short hash 
> function that produces a 64 bit fingerprint you have to be concerned with 
> collisions between hash functions. Someone could potentially take advantage 
> of that and grind out a hash that would resolve to the same 

Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality

2022-06-29 Thread Anthony Towns
On Wed, Jun 29, 2022 at 12:38:17PM +, ZmnSCPxj wrote:
> > > ### Inverting The Filter: Feerate Cards
> > > Basically, a feerate card is a mapping between a probability-of-success 
> > > range and a feerate.
> > > * 00%->25%: -10ppm
> > > * 26%->50%: 1ppm
> > > * 51%->75%: 5ppm
> > > * 76%->100%: 50ppm
> The economic insight here is this:
> * The payer wants to pay because it values a service / product more highly 
> than the sats they are spending.

> * If payment fails, then the payer incurs an opportunity cost, as it is 
> unable to utilize the difference in subjective value between the service / 
> product and the sats being spent.

(If payment fails, the only opportunity cost they incur is that they
can't use the funds that they locked up for the payment. The opportunity
cost is usually considered to occur when the payment succeeds: at that
point you've lost the ability to use those funds for any other purpose)

>   * Thus, the subjective difference in value between the service / product 
> being bought, and the sats to be paid, is the cost of payment failure.

If you couldn't successfully route the payment at any price, you never
had the opportunity to buy whatever the thing was.

> We can now use the left-hand side of the feerate card table, by multiplying 
> `100% - middle_probability_of_success` (i.e. probability of failure) by the 
> fee budget (i.e. cost of failure), and getting the 
> cost-of-failure-for-this-entry.

I don't think that makes much sense; your expected gain if you just try
one option is:

 (1-p)*0 + p*cost*(benefit/cost - fee)
 
where p is the probability of success that corresponds with the fee.

I don't think you can do that calculation with a range; if I fix the
probabilities as:

  12.5%  -10ppm
  27.5%1ppm
  62.5%5ppm
  87.5%   50ppm

then that approach chooses:

  -10 ppm if the benefit/cost is in (-10ppm, 8.77ppm)
5 ppm if the benefit/cost is in [8.77ppm, 162.52ppm)
   50 ppm if the benefit/cost is >= 162.52ppm

so for that policy, one of those entries is already irrelevant.

But that just feels super unrealistic to me. If your benefit is 8ppm,
and you try at -10ppm, and that fails, why wouldn't you try again at
5ppm? That means the real calculation is:

   p1*(benefit/cost - fee1) 
   + (p2-p1)*(benefit/cost - fee2 - retry_delay)
   - (1-p2)*(2*retry_delay)

Which is:

   p2*(benefit/cost)
 - p1*fee1 - (p2-p1)*fee2
 - (2-p1-p2)*retry_delay

My feeling is that the retry_delay factor's going to dominate...

That's also only considering one hop; to get the entire path, you
need them all to succeed, giving an expected benefit (for a particular
combination of rate card entries) of:

  (p1*p2*p3*p4*p5)*cost*(benefit/cost - (fee1 + fee2 + fee3 + fee4 + fee5)

And (p1*..*p5) is going to be pretty small in most cases -- 5 hops at
87.5% each already gets you down to only a 51% total chance of success.
And there's an exponential explosion of combinations, if each of the
5 hops has 4 options on their rate card, that's up to 1024 different
options to be evaluated...

> We then evaluate the fee card by plugging this in to each entry of the 
> feerate card, and picking which entry gives the lowest total fee.

I don't think that combines hops correctly. For example, if the rate
cards for hop1 and hop2 are both:

   10%  10ppm
  100%  92ppm

and your expected benefit/cost is 200ppm (so 100ppm per hop), then
treated individually you get:

   10%*(100ppm - 10ppm) = 9ppm  <-- this one!
  100%*(100ppm - 92ppm) = 8ppm

but treated together, you get:

1%*(200ppm -  20ppm) =  1.8ppm
   10%*(200ppm - 102ppm) =  9.8ppm (twice)
  100%*(200ppm - 184ppm) = 16ppm <-- this one!

> This is then added as a fee in payment algorithms, thus translated down to 
> "just optimize for low fee".

You're not optimising for low fee though, you're optimising for
maximal expected value, assuming you can't retry. But you can retry,
and probably in reality also want to minimise the chance of failure up
to some threshold.

For example: if I buy a coffee with lightning every week day for a year,
that's 250 days, so maybe I'd like to choose a fee so that my payment
failure rate is <0.4%, to avoid embarassment and holding up the queue.

> * Nodes utilizing wall strategies and doing lots of rebalancing put low 
> limits on the fee budget of the rebalancing cost.
>   * These nodes are willing to try lots of possible routes, hoping to nab the 
> liquidity of a low-fee node on the cheap in order to resell it later.
>   * Such nodes are fine with low probability of success.

Sure. But in that case, they don't care about delays, so why wouldn't they
just try the lowest fee rates all the time, no matter what their expected
value is? They can retry once an hour indefinitely, and eventually they
should get lucky, if the rate card's even remotely accurate. (Though
chances are they won't get -10ppm lucky for the entire path)

Finding out that you're paying 50ppm at the exact same time someone else
is 

Re: [Lightning-dev] Solving the Price Of Anarchy Problem, Or: Cheap AND Reliable Payments Via Forwarding Fee Economic Rationality

2022-06-29 Thread ZmnSCPxj via Lightning-dev
Good morning aj,


> On Wed, Jun 29, 2022 at 12:38:17PM +, ZmnSCPxj wrote:
>
> > > > ### Inverting The Filter: Feerate Cards
> > > > Basically, a feerate card is a mapping between a probability-of-success 
> > > > range and a feerate.
> > > > * 00%->25%: -10ppm
> > > > * 26%->50%: 1ppm
> > > > * 51%->75%: 5ppm
> > > > * 76%->100%: 50ppm
> > > > The economic insight here is this:
> > > > * The payer wants to pay because it values a service / product more 
> > > > highly than the sats they are spending.
>
> > * If payment fails, then the payer incurs an opportunity cost, as it is 
> > unable to utilize the difference in subjective value between the service / 
> > product and the sats being spent.
>
>
> (If payment fails, the only opportunity cost they incur is that they
> can't use the funds that they locked up for the payment. The opportunity
> cost is usually considered to occur when the payment succeeds: at that
> point you've lost the ability to use those funds for any other purpose)

I think you misunderstand me completely.

The "payment fails" term here means that *all possible routes below the fee 
budget have failed*, i.e. a complete payment failure that will cause your `pay` 
command to error out with a frownie face.

In that case, the payer is unable to purchase the service or product.

The opportunity cost they lose is the lack of the service or product; they keep 
the value of the sats that did not get paid, but lose the value of the service 
or product they wanted to buy in the first place.

In that case, the payer loses the subjective difference in value between the 
service / product, and the sats they would have paid.

Regards,
ZmnSCPxj
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals

2022-06-29 Thread Olaoluwa Osuntokun
Hi Rusty,

Thanks for the feedback!

> This is over-design: if you fail to get reliable gossip, your routing will
> suffer anyway.  Nothing new here.

Idk, it's pretty simple: you're already watching for closes, so if a close
looks a certain way, it's a splice. When you see that, you can even take
note of the _new_ channel size (funds added/removed) and update your
pathfinding/blindedpaths/hophints accordingly.

If this is an over-designed solution, that I'd categorize _only_ waiting N
blocks as wishful thinking, given we have effectively no guarantees w.r.t
how long it'll take a message to propagate.

If by routing you mean a routing node then: no, a routing node doesn't even
really need the graph at all to do their job.

If by routing you mean a sender, then imo still no: you don't necessarily
need _all_ gossip, just the latest policies of the nodes you route most
frequently to. On top of that, since you can get the latest policy each time
you incur a routing failure, as you make payments, you'll get the latest
policies of the nodes you care about over time. Also consider that you might
fail to get "reliable" gossip, simply just due to your peer neighborhood
aggressively rate limiting gossip (they only allow 1 update a day for a
node, you updated your fee, oops, no splice msg for you).

So it appears you don't agree that the "wait N blocks before you close your
channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144?

>From my PoV, the whole point of even signalling that a splice is on going,
is for the sender's/receivers: they can continue to send/recv payments over
the channel while the splice is in process. It isn't that a node isn't
getting any gossip, it's that if the node fails to obtain the gossip message
within the N block period of time, then the channel has effectively closed
from their PoV, and it may be an hour+ until it's seen as a usable (new)
channel again.

If there isn't a 100% reliable way to signal that a splice is in progress,
then this disincentives its usage, as routers can lose out on potential fee
revenue, and sends/receivers may grow to favor only very long lived
channels. IMO _only_ having a gossip message simply isn't enough: there're
no real guarantees w.r.t _when_ all relevant parties will get your gossip
message. So why not give them a 100% reliable on chain signal that:
something is in progress here, stay tuned for the gossip message, whenever
you receive that.

-- Laolu


On Tue, Jun 28, 2022 at 6:40 PM Rusty Russell  wrote:

> Hi Roasbeef,
>
> This is over-design: if you fail to get reliable gossip, your routing
> will suffer anyway.  Nothing new here.
>
> And if you *know* you're missing gossip, you can simply delay onchain
> closures for longer: since nodes should respect the old channel ids for
> a while anyway.
>
> Matt's proposal to simply defer treating onchain closes is elegant and
> minimal.  We could go further and relax requirements to detect onchain
> closes at all, and optionally add a perm close message.
>
> Cheers,
> Rusty.
>
> Olaoluwa Osuntokun  writes:
> > Hi y'all,
> >
> > This mail was inspired by this [1] spec PR from Lisa. At a high level, it
> > proposes the nodes add a delay between the time they see a channel
> closed on
> > chain, to when they remove it from their local channel graph. The motive
> > here is to give the gossip message that indicates a splice is in process,
> > "enough" time to propagate through the network. If a node can see this
> > message before/during the splicing operation, then they'll be able relate
> > the old and the new channels, meaning it's usable again by
> senders/receiver
> > _before_ the entire chain of transactions confirms on chain.
> >
> > IMO, this sort of arbitrary delay (expressed in blocks) won't actually
> > address the issue in practice. The proposal suffers from the following
> > issues:
> >
> >   1. 12 blocks is chosen arbitrarily. If for w/e reason an announcement
> >   takes longer than 2 hours to reach the "economic majority" of
> >   senders/receivers, then the channel won't be able to mask the splicing
> >   downtime.
> >
> >   2. Gossip propagation delay and offline peers. These days most nodes
> >   throttle gossip pretty aggressively. As a result, a pair of nodes doing
> >   several in-flight splices (inputs become double spent or something, so
> >   they need to try a bunch) might end up being rate limited within the
> >   network, causing the splice update msg to be lost or delayed
> significantly
> >   (IIRC CLN resets these values after 24 hours). On top of that, if a
> peer
> >   is offline for too long (think mobile senders), then they may miss the
> >   update all together as most nodes don't do a full historical
> >   _channel_update_ dump anymore.
> >
> > In order to resolve these issues, I think instead we need to rely on the
> > primary splicing signal being sourced from the chain itself. In other
> words,
> > if I see a channel close, and a closing transaction "looks" 

Re: [Lightning-dev] Onion messages rate-limiting

2022-06-29 Thread Olaoluwa Osuntokun
Hi t-bast,

Happy to see this finally written up! With this, we have two classes of
proposals for rate limiting onion messaging:

  1. Back propagation based rate limiting as described here.

  2. Allowing nodes to express a per-message cost for their forwarding
  services, which is described here [1].

I still need to digest everything proposed here, but personally I'm more
optimistic about the 2nd category than the 1st.

One issue I see w/ the first category is that a single party can flood the
network and cause nodes to trigger their rate limits, which then affects the
usability of the onion messages for all other well-behaving parties. An
example, this might mean I can't fetch invoices, give up after a period of
time (how long?), then result to a direct connection (perceived payment
latency accumulated along the way).

With the 2nd route, if an attacker floods the network, they need to directly
pay for the forwarding usage themselves, though they may also directly cause
nodes to adjust their forwarding rate accordingly. However in this case, the
attacker has incurred a concrete cost, and even if the rates rise, then
those that really need the service (notifying an LSP that a user is online
or w/e) can continue to pay that new rate. In other words, by _pricing_ the
resource utilization, demand preferences can be exchanged, leading to more
efficient long term resource allocation.

W.r.t this topic, one event that imo is worth pointing out is that a very
popular onion routing system, Tor, has been facing a severe DDoS attack that
has lasted weeks, and isn't yet fully resolved [2]. The on going flooding
attack on Tor has actually started to affect LN (iirc over half of all
public routing nodes w/ an advertised address are tor-only), and other
related systems like Umbrel that 100% rely on tor for networking traversal.
Funnily enough, Tor developers have actually suggested adding some PoW to
attempt to mitigate DDoS attacks [3]. In that same post they throw around
the idea of using anonymous tokens to allow nodes to give them to "good"
clients, which is pretty similar to my lofty Forwarding Pass idea as relates
to onion messaging, and also general HTLC jamming mitigation.

In summary, we're not the first to attempt to tackle the problem of rate
limiting relayed message spam in an anonymous/pseudonymous network, and we
can probably learn a lot from what is and isn't working w.r.t how Tor
handles things. As you note near the end of your post, this might just be
the first avenue in a long line of research to best figure out how to handle
the spam concerns introduced by onion messaging. From my PoV, it still seems
to be an open question if the same network can be _both_ a reliable
micro-payment system _and_ also a reliable arbitrary message transport
layer. I guess only time will tell...

> The `shared_secret_hash` field contains a BIP 340 tagged hash

Any reason to use the tagged hash here vs just a plain ol HMAC? Under the
hood, they have a pretty similar construction [4].

[1]:
https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html
[2]: https://status.torproject.org/issues/2022-06-09-network-ddos/
[3]: https://blog.torproject.org/stop-the-onion-denial/
[4]: https://datatracker.ietf.org/doc/html/rfc2104

-- Laolu



On Wed, Jun 29, 2022 at 1:28 AM Bastien TEINTURIER  wrote:

> During the recent Oakland Dev Summit, some lightning engineers got together 
> to discuss DoS
> protection for onion messages. Rusty proposed a very simple rate-limiting 
> scheme that
> statistically propagates back to the correct sender, which we describe in 
> details below.
>
> You can also read this in gist format if that works better for you [1].
>
> Nodes apply per-peer rate limits on _incoming_ onion messages that should be 
> relayed (e.g.
> N/seconds with some burst tolerance). It is recommended to allow more onion 
> messages from
> peers with whom you have channels, for example 10/seconds when you have a 
> channel and 1/second
> when you don't.
>
> When relaying an onion message, nodes keep track of where it came from (by 
> using the `node_id` of
> the peer who sent that message). Nodes only need the last such `node_id` per 
> outgoing connection,
> which ensures the memory footprint is very small. Also, this data doesn't 
> need to be persisted.
>
> Let's walk through an example to illustrate this mechanism:
>
> * Bob receives an onion message from Alice that should be relayed to Carol
> * After relaying that message, Bob stores Alice's `node_id` in its 
> per-connection state with Carol
> * Bob receives an onion message from Eve that should be relayed to Carol
> * After relaying that message, Bob replaces Alice's `node_id` with Eve's 
> `node_id` in its
> per-connection state with Carol
> * Bob receives an onion message from Alice that should be relayed to Dave
> * After relaying that message, Bob stores Alice's `node_id` in its 
> per-connection state with Dave
> * ...
>
> We introduce a new 

Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals

2022-06-29 Thread lisa neigut
Adding a noticeable on-chain signal runs counter to the goal of the move
to taproot / gossip v2, which is to make lightning's onchain footprint
indistinguishable from
any other onchain usage.

I'm admittedly a bit confused as to why onchain signals are even being
seriously
 proposed. Aside from "infallibility", is there another reason for
suggesting
we add an onchain detectable signal for this? Seems heavy handed imo, given
that the severity of a comms failure is pretty minimal (*potential* for
lost routing fees).

> So it appears you don't agree that the "wait N blocks before you close
your
channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144?

fwiw I seem to remember seeing that it takes  ~an hour for gossip to
propagate
(no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first
estimate.
I trust we'll have time to tune this after we've had some real-world
experience with them.

Further, we can always add more robust signaling later, if lost routing
fees turns
out to be a huge issue.

Finally, worth noting that Alex Myer's minisketch project may well
help/improve gossip
reconciliation efficiency to the point where gossip reliability is less
of an issue.

~nifty
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Onion messages rate-limiting

2022-06-29 Thread Matt Corallo
Thanks Bastien for writing this up! This is a pretty trivial and straightforward way to rate-limit 
onion messages in a way that allows legitimate users to continue using the system in spite of some 
bad actors trying (and failing, due to being rate-limited) to DoS the network.


I do think any spec for this shouldn't make any recommendations about willingness to relay onion 
messages for anonymous no-channel third parties, if anything deliberately staying mum on it and 
allowing nodes to adapt policy (and probably rate-limit no-channel third-parties before they rate 
limit any peer they have a channel with). Ultimately, we have to assume that nodes will attempt to 
send onion messages by routing through the existing channel graph, so there's little reason to worry 
too much about ensuring ability to relay for anonymous parties.


Better yet, as Val points out, requiring a channel to relay onion messages puts a very real, 
nontrivial (in a world of msats) cost to getting an onion messaging channel. Better yet, with 
backpressure ability to DoS onion message links isn't denominated in number of messages, but instead 
in number of channels you are able to create, making the backpressure system equivalent to today's 
HTLC DoS considerations, whereas explicit payment allows an attacker to pay much less to break the 
system.


As for the proposal to charge for onion messages, I'm still not at all sure where its coming from. 
It seems to flow from a classic "have a hammer (a system to make micropayments for things), better 
turn this problem into a nail (by making users pay for it)" approach, but it doesn't actually solve 
the problem at hand.


Even if you charge for onion messages, users may legitimately want to send a bunch of payments in 
bulk, and trivially overflow a home or Tor nodes' bandwidth. The only response to that, whether its 
a DoS attack or a legitimate user, is to rate-limit, and to rate-limit in a way that tells the user 
sending the messages to back off! Sure, you could do that by failing onion messages with an error 
that updates the fee you charge, but you're ultimately doing a poor-man's (or, I suppose, 
rich-man's) version of what Bastien proposes, not adding some fundamental difference.


Ultimately, paying suffers from the standard PoW-for-spam issue - you cannot assign a reasonable 
cost that an attacker cares about without impacting the system's usability due to said cost. Indeed, 
making it expensive enough to mount a months-long DDoS without impacting legitimate users be pretty 
easy - at 1msat per relay of a 1366 byte onion message you can only saturate an average home users' 
30Mbps connection for 30 minutes before you rack up a dollar in costs, but if your concern is 
whether someone can reasonably trivially take out the network for minutes at a time to make it have 
perceptibly high failure rates, no reasonable cost scheme will work. Quite the opposite - the only 
reasonable way to respond is to respond to a spike in traffic while maintaining QoS is to rate-limit 
by inbound edge!


Ultimately, what we have here is a networking problem, that has to be solved with networking 
solutions, not a costing problem, which can be solved with payment. I can only assume that the 
desire to add a cost to onion messages ultimately stems from a desire to ensure every possible 
avenue for value extraction is given to routing nodes, but I think that desire is misplaced in this 
case - the cost of bandwidth is diminutive compared to other costs of routing node operation, 
especially when you consider sensible rate-limits as proposed in Bastien's email.


Indeed, if anyone were proposing rate-limits which would allow anything close to enough bandwidth 
usage to cause "lightning is turning into Tor and has Tor's problems" to be a legitimate concern I'd 
totally agree we should charge for its use. But no one is, nor has anyone ever seriously, to my 
knowledge, proposed such a thing. If lightning messages get deployed and start eating up even single 
Mbps's on a consistent basis on nodes, we can totally revisit this, its not like we are shutting the 
door to any possible costing system if it becomes necessary, but rate-limiting has to happen either 
way, so we should start there and see if we need costing, not jump to costing on day one, hampering 
utility.


Matt

On 6/29/22 8:22 PM, Olaoluwa Osuntokun wrote:

Hi t-bast,

Happy to see this finally written up! With this, we have two classes of
proposals for rate limiting onion messaging:

   1. Back propagation based rate limiting as described here.

   2. Allowing nodes to express a per-message cost for their forwarding
   services, which is described here [1].

I still need to digest everything proposed here, but personally I'm more
optimistic about the 2nd category than the 1st.

One issue I see w/ the first category is that a single party can flood the
network and cause nodes to trigger their rate limits, which then affects the
usability of the 

Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals

2022-06-29 Thread lisa neigut
Had another thought: if you've seen a chain close but also have a gossip
message that
indicates this is a splice, you SHOULD propagate that gossip more
urgently/widely than
any other gossip you've got. Adding an urgency metric to gossip is fuzzy to
enforce... *handwaves*.

You *do* get the onchain signal, we just change the behavior of the
secondary information system
instead of embedding the info into the chain..

"Spamming" gossip with splices expensive -- there's a real-world cost
(onchain fees) to
closing a channel (the signal to promote/prioritize a gossip msg) which
cuts down on the ability to send out these 'urgent' messages with any
frequency.

~nifty

On Wed, Jun 29, 2022 at 7:43 PM lisa neigut  wrote:

> Adding a noticeable on-chain signal runs counter to the goal of the move
> to taproot / gossip v2, which is to make lightning's onchain footprint
> indistinguishable from
> any other onchain usage.
>
> I'm admittedly a bit confused as to why onchain signals are even being
> seriously
>  proposed. Aside from "infallibility", is there another reason for
> suggesting
> we add an onchain detectable signal for this? Seems heavy handed imo,
> given
> that the severity of a comms failure is pretty minimal (*potential* for
> lost routing fees).
>
> > So it appears you don't agree that the "wait N blocks before you close
> your
> channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144?
>
> fwiw I seem to remember seeing that it takes  ~an hour for gossip to
> propagate
> (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first
> estimate.
> I trust we'll have time to tune this after we've had some real-world
> experience with them.
>
> Further, we can always add more robust signaling later, if lost routing
> fees turns
> out to be a huge issue.
>
> Finally, worth noting that Alex Myer's minisketch project may well
> help/improve gossip
> reconciliation efficiency to the point where gossip reliability is less
> of an issue.
>
> ~nifty
>
>
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals

2022-06-29 Thread Rusty Russell
Olaoluwa Osuntokun  writes:
> Hi Rusty,
>
> Thanks for the feedback!
>
>> This is over-design: if you fail to get reliable gossip, your routing will
>> suffer anyway.  Nothing new here.
>
> Idk, it's pretty simple: you're already watching for closes, so if a close
> looks a certain way, it's a splice. When you see that, you can even take
> note of the _new_ channel size (funds added/removed) and update your
> pathfinding/blindedpaths/hophints accordingly.

Why spam the chain?

> If this is an over-designed solution, that I'd categorize _only_ waiting N
> blocks as wishful thinking, given we have effectively no guarantees w.r.t
> how long it'll take a message to propagate.

Sure, it's a simplification on "wait 6 blocks plus 30 minutes".

> If by routing you mean a sender, then imo still no: you don't necessarily
> need _all_ gossip, just the latest policies of the nodes you route most
> frequently to. On top of that, since you can get the latest policy each time
> you incur a routing failure, as you make payments, you'll get the latest
> policies of the nodes you care about over time. Also consider that you might
> fail to get "reliable" gossip, simply just due to your peer neighborhood
> aggressively rate limiting gossip (they only allow 1 update a day for a
> node, you updated your fee, oops, no splice msg for you).

There's no ratelimiting on new channel announcements?

> So it appears you don't agree that the "wait N blocks before you close your
> channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144?

Because it's simple.

>>From my PoV, the whole point of even signalling that a splice is on going,
> is for the sender's/receivers: they can continue to send/recv payments over
> the channel while the splice is in process. It isn't that a node isn't
> getting any gossip, it's that if the node fails to obtain the gossip message
> within the N block period of time, then the channel has effectively closed
> from their PoV, and it may be an hour+ until it's seen as a usable (new)
> channel again.

Sure.  If you want to not forget channels at all on close, that works too.

> If there isn't a 100% reliable way to signal that a splice is in progress,
> then this disincentives its usage, as routers can lose out on potential fee
> revenue, and sends/receivers may grow to favor only very long lived
> channels. IMO _only_ having a gossip message simply isn't enough: there're
> no real guarantees w.r.t _when_ all relevant parties will get your gossip
> message. So why not give them a 100% reliable on chain signal that:
> something is in progress here, stay tuned for the gossip message, whenever
> you receive that.

That's not 100% reliable at all.   How long to you want for the new
gossip?

Just treat every close as signalling "stay tuned for the gossip
message".  That's reliable.  And simple.

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Onion messages rate-limiting

2022-06-29 Thread vwallace via Lightning-dev
Heya Laolu,

From my PoV, adding prepayments to onion messages is putting the cart before 
the horse a bit, think there's a good amount of recourse before resorting to 
that.

Seems there are two cases to address here:

1. People are trying to stream GoT over lightning

In this case, just rate limiting should disrupt their viewing experience such 
that it becomes unusable. Don’t think LN can be compared to Tor here because 
they explicitly want to support this case and we don’t.

2. An attacker is trying to flood the network with OMs

In this case, IMO LN also can’t be compared to Tor because you can limit your 
OMs to channel partners only, and this in itself provides a “proof of work” 
that an attacker can’t surmount without actually opening channels.

Another huge win of backpressure is that it only needs to happen in DoS 
situations, meaning it doesn’t have to impact users in the normal case.

Cheers —Val

--- Original Message ---
On Wednesday, June 29th, 2022 at 8:22 PM, Olaoluwa Osuntokun 
 wrote:

> Hi t-bast,
>
> Happy to see this finally written up! With this, we have two classes of
> proposals for rate limiting onion messaging:
>
> 1. Back propagation based rate limiting as described here.
>
> 2. Allowing nodes to express a per-message cost for their forwarding
> services, which is described here [1].
>
> I still need to digest everything proposed here, but personally I'm more
> optimistic about the 2nd category than the 1st.
>
> One issue I see w/ the first category is that a single party can flood the
> network and cause nodes to trigger their rate limits, which then affects the
> usability of the onion messages for all other well-behaving parties. An
> example, this might mean I can't fetch invoices, give up after a period of
> time (how long?), then result to a direct connection (perceived payment
> latency accumulated along the way).
>
> With the 2nd route, if an attacker floods the network, they need to directly
> pay for the forwarding usage themselves, though they may also directly cause
> nodes to adjust their forwarding rate accordingly. However in this case, the
> attacker has incurred a concrete cost, and even if the rates rise, then
> those that really need the service (notifying an LSP that a user is online
> or w/e) can continue to pay that new rate. In other words, by _pricing_ the
> resource utilization, demand preferences can be exchanged, leading to more
> efficient long term resource allocation.
>
> W.r.t this topic, one event that imo is worth pointing out is that a very
> popular onion routing system, Tor, has been facing a severe DDoS attack that
> has lasted weeks, and isn't yet fully resolved [2]. The on going flooding
> attack on Tor has actually started to affect LN (iirc over half of all
> public routing nodes w/ an advertised address are tor-only), and other
> related systems like Umbrel that 100% rely on tor for networking traversal.
> Funnily enough, Tor developers have actually suggested adding some PoW to
> attempt to mitigate DDoS attacks [3]. In that same post they throw around
> the idea of using anonymous tokens to allow nodes to give them to "good"
> clients, which is pretty similar to my lofty Forwarding Pass idea as relates
> to onion messaging, and also general HTLC jamming mitigation.
>
> In summary, we're not the first to attempt to tackle the problem of rate
> limiting relayed message spam in an anonymous/pseudonymous network, and we
> can probably learn a lot from what is and isn't working w.r.t how Tor
> handles things. As you note near the end of your post, this might just be
> the first avenue in a long line of research to best figure out how to handle
> the spam concerns introduced by onion messaging. From my PoV, it still seems
> to be an open question if the same network can be _both_ a reliable
> micro-payment system _and_ also a reliable arbitrary message transport
> layer. I guess only time will tell...
>
>> The `shared_secret_hash` field contains a BIP 340 tagged hash
>
> Any reason to use the tagged hash here vs just a plain ol HMAC? Under the
> hood, they have a pretty similar construction [4].
>
> [1]: 
> https://lists.linuxfoundation.org/pipermail/lightning-dev/2022-February/003498.html
> [2]: https://status.torproject.org/issues/2022-06-09-network-ddos/
> [3]: https://blog.torproject.org/stop-the-onion-denial/
> [4]: https://datatracker.ietf.org/doc/html/rfc2104
>
> -- Laolu
>
> On Wed, Jun 29, 2022 at 1:28 AM Bastien TEINTURIER  wrote:
>
>> During the recent Oakland Dev Summit, some lightning engineers got together 
>> to discuss DoS
>> protection for onion messages. Rusty proposed a very simple rate-limiting 
>> scheme that
>> statistically propagates back to the correct sender, which we describe in 
>> details below.
>>
>> You can also read this in gist format if that works better for you [1].
>>
>> Nodes apply per-peer rate limits on _incoming_ onion messages that should be 
>> relayed (e.g.
>> N/seconds with some burst