Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-06-30 Thread Michael Folkson via Lightning-dev
Awesome, thanks Alex. Just one follow up.

> Running the numbers, I currently see 15,761 public nodes on the network and 
> 148,295 half channels. Those each need refreshed gossip every two weeks. By 
> default that would result in 90% channel updates.

And the rationale for each channel needing refreshed gossip every 2 weeks is to 
inform the network that the channel is still active (i.e. not disabled) and its 
parameters haven't changed?

(I did look it up in BOLT 7 [0] but it wasn't clear to me that a channel would 
be assumed to be inactive/disabled if there wasn't a channel_update for 2 
weeks.)

That seems a lot of gossip to me if the recommended behavior of routing nodes 
is to maintain ~100 percent uptime and only when absolutely necessary change 
the parameters of the channel. I guess the alternative of significantly less 
gossip messages and a potential uptick in failed routes would be worse though.

[0]: 
https://github.com/lightning/bolts/blob/master/07-routing-gossip.md#rationale-4

--
Michael Folkson
Email: michaelfolkson at [protonmail.com](http://protonmail.com/)
Keybase: michaelfolkson
PGP: 43ED C999 9F85 1D40 EAF4 9835 92D6 0159 214C FEE3

--- Original Message ---
On Wednesday, June 29th, 2022 at 7:07 PM, Alex Myers  
wrote:

> Hi Michael,
>
> Thanks for the transcript and the questions, especially those you asked in 
> Gleb's original Erlay presentation.
>
> I tried to cover a lot of ground in only 30 minutes and the finer points may 
> have suffered. The most significant difference in concern between bitcoin 
> transaction relay and lightning gossip may be one of privacy: Source nodes of 
> Bitcoin transactions have an interest in privacy (avoid trivially 
> triangulating the source.) Lightning gossip is already signed by and linked 
> to a node ID - the source is completely transparent by nature. The lack of a 
> timing concern would allow for a global sketch where it would have been 
> infeasible for Erlay (among other reasons such as DoS.)
>
>> Why are hash collisions a concern for Lightning gossip and not for Erlay? Is 
>> it not a DoS vector for both?
>
> If lightning gossip were encoded for minisketch entries with the 
> short_channel_id, it would create a unique fingerprint by default thanks to 
> referencing the unique funding transaction on chain - no hashing required. 
> This was Rusty's original concept and what I had been proceeding with. 
> However, given the ongoing privacy discussion and desire to eventually 
> decouple lightning channels from their layer one funding transaction (gossip 
> v2), I think we should prepare for a future in which channels are not 
> explicitly linked to a SCID. That means hashing just as in Erlay and the same 
> DoS vector would be present. Salting with a per-peer shared secret works 
> here, but the solution is driven back toward inventory sets.
>
>> It seems you are leaning towards per-peer sketches with inventory sets (like 
>> Erlay) rather than global sketches.
>
> ​
> Yes. There are pros and cons to each method, but most critically, this would 
> be compatible with eventual removal of the SCID.
>
>> Erlay falls back to flooding if the set reconciliation algorithm doesn't 
>> work which I'm assuming you'll do with Lightning gossip.
>
> Fallback will take some consideration (Erlay's bisect is an elegant feature), 
> but yes, flooding is still the ultimate fallback.
>
>> I was also surprised to hear that channel_update made up 97 percent of 
>> gossip messages. Isn't it recommended that you don't make too changes to 
>> your channel as it is likely to result in failed routed payments and being 
>> dropped as a routing node for future payments? It seems that this advice 
>> isn't being followed if there are so many channel_update messages being sent 
>> around. I almost wonder if Lightning implementations should include user 
>> prompts like "Are you sure you want to update your channel given this may 
>> affect your routing success?" :)
>
> Running the numbers, I currently see 15,761 public nodes on the network and 
> 148,295 half channels. Those each need refreshed gossip every two weeks. By 
> default that would result in 90% channel updates. That we're seeing roughly 
> three times as many channel updates vs node announcements compared to what's 
> strictly required is maybe not that surprising. I agree, there would be a 
> benefit to nodes taking a more active role in tracking calls to broadcast 
> gossip.
>
> Thanks,
> Alex
>
> --- Original Message ---
> On Wednesday, June 29th, 2022 at 6:09 AM, Michael Folkson 
>  wrote:
>
>> Thanks for this Alex.
>>
>> Here's a transcript of your recent presentation at Bitcoin++ on Minisketch 
>> and Lightning gossip:
>>
>> https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/
>>
>> Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core 
>> [0] for a while now I was especially interested in how the challenges of 
>> 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-06-29 Thread Alex Myers
Hi Michael,

Thanks for the transcript and the questions, especially those you asked in 
Gleb's original Erlay presentation.

I tried to cover a lot of ground in only 30 minutes and the finer points may 
have suffered. The most significant difference in concern between bitcoin 
transaction relay and lightning gossip may be one of privacy: Source nodes of 
Bitcoin transactions have an interest in privacy (avoid trivially triangulating 
the source.) Lightning gossip is already signed by and linked to a node ID - 
the source is completely transparent by nature. The lack of a timing concern 
would allow for a global sketch where it would have been infeasible for Erlay 
(among other reasons such as DoS.)

> Why are hash collisions a concern for Lightning gossip and not for Erlay? Is 
> it not a DoS vector for both?

If lightning gossip were encoded for minisketch entries with the 
short_channel_id, it would create a unique fingerprint by default thanks to 
referencing the unique funding transaction on chain - no hashing required. This 
was Rusty's original concept and what I had been proceeding with. However, 
given the ongoing privacy discussion and desire to eventually decouple 
lightning channels from their layer one funding transaction (gossip v2), I 
think we should prepare for a future in which channels are not explicitly 
linked to a SCID. That means hashing just as in Erlay and the same DoS vector 
would be present. Salting with a per-peer shared secret works here, but the 
solution is driven back toward inventory sets.

> It seems you are leaning towards per-peer sketches with inventory sets (like 
> Erlay) rather than global sketches.

​
Yes. There are pros and cons to each method, but most critically, this would be 
compatible with eventual removal of the SCID.

> Erlay falls back to flooding if the set reconciliation algorithm doesn't work 
> which I'm assuming you'll do with Lightning gossip.

Fallback will take some consideration (Erlay's bisect is an elegant feature), 
but yes, flooding is still the ultimate fallback.

> I was also surprised to hear that channel_update made up 97 percent of gossip 
> messages. Isn't it recommended that you don't make too changes to your 
> channel as it is likely to result in failed routed payments and being dropped 
> as a routing node for future payments? It seems that this advice isn't being 
> followed if there are so many channel_update messages being sent around. I 
> almost wonder if Lightning implementations should include user prompts like 
> "Are you sure you want to update your channel given this may affect your 
> routing success?" :)

Running the numbers, I currently see 15,761 public nodes on the network and 
148,295 half channels. Those each need refreshed gossip every two weeks. By 
default that would result in 90% channel updates. That we're seeing roughly 
three times as many channel updates vs node announcements compared to what's 
strictly required is maybe not that surprising. I agree, there would be a 
benefit to nodes taking a more active role in tracking calls to broadcast 
gossip.

Thanks,
Alex

--- Original Message ---
On Wednesday, June 29th, 2022 at 6:09 AM, Michael Folkson 
 wrote:

> Thanks for this Alex.
>
> Here's a transcript of your recent presentation at Bitcoin++ on Minisketch 
> and Lightning gossip:
>
> https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/
>
> Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] 
> for a while now I was especially interested in how the challenges of using 
> Minisketch for Lightning gossip (node_announcement, channel_announcement, 
> channel_update messages) would differ to the challenges of using Minisketch 
> for transaction relay on the base layer.
>
> I guess one of the major differences is full nodes are trying to verify a 
> block every 10 minutes (on average) and so there is a sense of urgency to get 
> the transactions of the next block to be mined. With Lightning gossip unless 
> you are planning to send a payment (or route a payment) across a certain 
> route you are less concerned about learning about the current state of the 
> network urgently. If a new channel pops up you might choose not to route 
> through it regardless given its "newness" and its lack of track record of 
> successfully routing payments. There are parts of the network you care less 
> about (if they can't help you get to your regular destinations say) whereas 
> with transaction relay you have to care about all transactions (paying a 
> sufficient fee rate).
>
> "The problem that Bitcoin faced with transaction relay was pretty similar but 
> there are a few differences.For one, any time you introduce that short hash 
> function that produces a 64 bit fingerprint you have to be concerned with 
> collisions between hash functions. Someone could potentially take advantage 
> of that and grind out a hash that would resolve to the same 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-06-29 Thread Michael Folkson via Lightning-dev
Thanks for this Alex.

Here's a transcript of your recent presentation at Bitcoin++ on Minisketch and 
Lightning gossip:

https://btctranscripts.com/bitcoinplusplus/2022/2022-06-07-alex-myers-minisketch-lightning-gossip/

Having followed Gleb's work on using Minisketch for Erlay in Bitcoin Core [0] 
for a while now I was especially interested in how the challenges of using 
Minisketch for Lightning gossip (node_announcement, channel_announcement, 
channel_update messages) would differ to the challenges of using Minisketch for 
transaction relay on the base layer.

I guess one of the major differences is full nodes are trying to verify a block 
every 10 minutes (on average) and so there is a sense of urgency to get the 
transactions of the next block to be mined. With Lightning gossip unless you 
are planning to send a payment (or route a payment) across a certain route you 
are less concerned about learning about the current state of the network 
urgently. If a new channel pops up you might choose not to route through it 
regardless given its "newness" and its lack of track record of successfully 
routing payments. There are parts of the network you care less about (if they 
can't help you get to your regular destinations say) whereas with transaction 
relay you have to care about all transactions (paying a sufficient fee rate).

"The problem that Bitcoin faced with transaction relay was pretty similar but 
there are a few differences.For one, any time you introduce that short hash 
function that produces a 64 bit fingerprint you have to be concerned with 
collisions between hash functions. Someone could potentially take advantage of 
that and grind out a hash that would resolve to the same fingerprint."

Could you elaborate on this? Why are hash collisions a concern for Lightning 
gossip and not for Erlay? Is it not a DoS vector for both?

It seems you are leaning towards per-peer sketches with inventory sets (like 
Erlay) rather than global sketches. This makes sense to me and seems to be 
moving in a direction where your peer connections are more stable as you are 
storing data on what your peer's understanding of the network is. There could 
even be centralized APIs which allow you to compare your current understanding 
of the network to the centralized service's understanding. (Of course we don't 
want to have to rely on centralized services or bake them into the protocol if 
you don't want to use them.) Erlay falls back to flooding if the set 
reconciliation algorithm doesn't work which I'm assuming you'll do with 
Lightning gossip.

I was also surprised to hear that channel_update made up 97 percent of gossip 
messages. Isn't it recommended that you don't make too changes to your channel 
as it is likely to result in failed routed payments and being dropped as a 
routing node for future payments? It seems that this advice isn't being 
followed if there are so many channel_update messages being sent around. I 
almost wonder if Lightning implementations should include user prompts like 
"Are you sure you want to update your channel given this may affect your 
routing success?" :)

Thanks
Michael

P.S. Are we referring to "routing nodes" as "forwarding nodes" now? I've 
noticed "forwarding nodes" being used more recently on this list.

[0]: https://github.com/bitcoin/bitcoin/pull/21515

--
Michael Folkson
Email: michaelfolkson at [protonmail.com](http://protonmail.com/)
Keybase: michaelfolkson
PGP: 43ED C999 9F85 1D40 EAF4 9835 92D6 0159 214C FEE3

--- Original Message ---
On Thursday, April 14th, 2022 at 22:00, Alex Myers  wrote:

> Hello lightning developers,
>
> I’ve been investigating set reconciliation as a means to reduce bandwidth and 
> redundancy of gossip message propagation. This builds on some earlier work 
> from Rusty using the minisketch library [1]. The idea is that each node will 
> build a sketch representing it’s own gossip set. Alice’s node will encode and 
> transmit this sketch to Bob’s node, where it will be merged with his own 
> sketch, and the differences produced. These differences should ideally be 
> exactly the latest missing gossip of both nodes. Due to size constraints, the 
> set differences will necessarily be encoded, but Bob’s node will be able to 
> identify which gossip Alice is missing, and may then transmit exactly those 
> messages.
>
> This process is relatively straightforward, with the caveat that the sets 
> must otherwise match very closely (each sketch has a maximum capacity for 
> differences.) The difficulty here is that each node and lightning 
> implementation may have its own rules for gossip acceptance and propagation. 
> Depending on their gossip partners, not all gossip may propagate to the 
> entire network.
>
> Core-lightning implements rate limiting for incoming channel updates and node 
> announcements. The default rate limit is 1 per day, with a burst of 4. I 
> analyzed my node’s gossip over a 14 day period, and found that, of all 
> 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-05-26 Thread Matt Corallo




On 5/26/22 8:59 PM, Alex Myers wrote:

Ah, this is an additional proposal on top, and requires a gossip "hard fork", 
which means your new
protocol would only work for taproot channels, and any old/unupgraded channels 
will have to be
propagated via the old mechanism. I'd kinda prefer to be able to rip out the 
old gossip sync code
sooner than a few years from now :(.


I viewed it as a soft fork, where if you want to use set reconciliation, 
anything added to the set would be subject to a constricted ruleset - in this 
case the gossip must be accompanied by a blockheight tlv (or otherwise 
reference a blockheight) and it must not replace a message in the current 100 
block range.

It doesn't necessarily have to reference blockheight, but that would simplify 
many edge cases.  The key is merely that a node is responsible for limiting 
it's own gossip to a predefined interval, and it must be easily verifiable for 
any other nodes building and reconciling sketches.  Given that we have access 
to a timechain, this just made the most sense.


Ah, good point, you can just add it as a TLV. It still implies that "old-gossip" can't go away for a 
lont time - ~everyone has to upgrade, so we'll have two parallel systems. Worse, people are relying 
on the old behavior and some nodes may avoid upgrading to avoid the new rate-limits :(.



If some nodes have 60 and others have 600099 (because you broke the
ratelimiting recommendation, and propagated both approx the same
time), then the network will split, sure.



Right, so what do you do in that case, though? AFAIU, in your proposed sync 
mechanism if a node does
this once, you're stuck with all of your gossip reconciliations with every peer 
"wasting" one
difference "slot" for a day or however long it takes before the peer does a 
sane update. In my
proposed alternative it only appears once and then you move on (or maybe once 
more on startup, but
we can maybe be willing to take on some extra cost there?).


This case may not be all that difficult. Easiest answer is you offer a spam 
proof to your peer.  Send both messages, signed by the offending node as proof 
they violated the set reconciliation rate limit, then remove both from your 
sketch. You may want to keep the evidence it in your data store, at least until 
it's superceded by the next valid update, but there's no reason it must occupy 
a slot of the sketch.  Meanwhile, feel free to use the message as you wish, 
just keep both out of the sketch. It's not perfect, but the sketch capacity is 
not compromised and the second incidence of spam should not propagate at all. 
(It may be possible to keep one, but this is the simplest answer.)


Right, well if we're gonna start adding "spam-proofs" we shouldn't start talking about complexity of 
tracking the changed-set :p.


Worse, unlike tracking the chanaged-set as proposed this protocol is a ton of unused code to handle 
an edge case we should only rarely hit...in other words code that will almost certainly be buggy, 
untested, and fail if people start hitting it. In general, I'm not a huge fan of protocols with any 
more usually-unused code than is strictly necessary.


This also doesn't capture things like channel_update extensions - BOLTs today say a recipient "MAY 
choose NOT to for messages longer than the minimum expected length" - so now we'd need remove that 
(and I guess have a fixed "maximum length" for channel updates that everyone agrees to...basically 
we have to have exact consensus on valid channel updates across nodes.



Heh, I'm surprised you'd complain about this - IIUC your existing gossip 
storage system keeps this
as a side-effect so it'd be a single integer for y'all :p. In any case, if it 
makes the protocol a
chunk more efficient I don't see why its a big deal to keep track of the set of 
which invoices have
changed recently, you could even make it super efficient by just saying 
"anything more recent than
timestamp X except a few exceptions that we got with some lag against the update 
timestamp".


The benefit of a single global sketch is less overhead in adding additional 
gossip peers, though looking at the numbers, sketch decoding time seems to be 
the more significant driving factor than rebuilding sketches (when they're 
incremental.) I also like maximizing the utility of the sketch by adding the 
full gossip store if possible.


Note that the alternative here does not prevent you from having a single global sketch. You can keep 
a rolling global sketch that you send to all your peers at once, it would just be a bit of a 
bandwidth burst when they all request a few channel updates/announcements from you.


More generally, I'm somewhat surprised to hear a performance concern here - I can't imagine we'd be 
including any more entries in such a sketch than Bitcoin Core does transactions to relay to peers, 
and this is exactly the design direction they went in (because of basically the same concerns).



I still think getting 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-05-26 Thread Alex Myers
> > The update contains a block number. Let's say we allow an update every
> > 100 blocks. This must be <= current block height (and presumably, newer
> > than height - 2016).
> >
> > If you send an update number 60, and then 600100, it will propagate.
> > 600099 will not.
>
>
> Ah, this is an additional proposal on top, and requires a gossip "hard fork", 
> which means your new
> protocol would only work for taproot channels, and any old/unupgraded 
> channels will have to be
> propagated via the old mechanism. I'd kinda prefer to be able to rip out the 
> old gossip sync code
> sooner than a few years from now :(.

I viewed it as a soft fork, where if you want to use set reconciliation, 
anything added to the set would be subject to a constricted ruleset - in this 
case the gossip must be accompanied by a blockheight tlv (or otherwise 
reference a blockheight) and it must not replace a message in the current 100 
block range.

It doesn't necessarily have to reference blockheight, but that would simplify 
many edge cases.  The key is merely that a node is responsible for limiting 
it's own gossip to a predefined interval, and it must be easily verifiable for 
any other nodes building and reconciling sketches.  Given that we have access 
to a timechain, this just made the most sense.

> > If some nodes have 60 and others have 600099 (because you broke the
> > ratelimiting recommendation, and propagated both approx the same
> > time), then the network will split, sure.
>
>
> Right, so what do you do in that case, though? AFAIU, in your proposed sync 
> mechanism if a node does
> this once, you're stuck with all of your gossip reconciliations with every 
> peer "wasting" one
> difference "slot" for a day or however long it takes before the peer does a 
> sane update. In my
> proposed alternative it only appears once and then you move on (or maybe once 
> more on startup, but
> we can maybe be willing to take on some extra cost there?).

This case may not be all that difficult. Easiest answer is you offer a spam 
proof to your peer.  Send both messages, signed by the offending node as proof 
they violated the set reconciliation rate limit, then remove both from your 
sketch. You may want to keep the evidence it in your data store, at least until 
it's superceded by the next valid update, but there's no reason it must occupy 
a slot of the sketch.  Meanwhile, feel free to use the message as you wish, 
just keep both out of the sketch. It's not perfect, but the sketch capacity is 
not compromised and the second incidence of spam should not propagate at all. 
(It may be possible to keep one, but this is the simplest answer.)

> Heh, I'm surprised you'd complain about this - IIUC your existing gossip 
> storage system keeps this
> as a side-effect so it'd be a single integer for y'all :p. In any case, if it 
> makes the protocol a
> chunk more efficient I don't see why its a big deal to keep track of the set 
> of which invoices have
> changed recently, you could even make it super efficient by just saying 
> "anything more recent than
> timestamp X except a few exceptions that we got with some lag against the 
> update timestamp".

The benefit of a single global sketch is less overhead in adding additional 
gossip peers, though looking at the numbers, sketch decoding time seems to be 
the more significant driving factor than rebuilding sketches (when they're 
incremental.) I also like maximizing the utility of the sketch by adding the 
full gossip store if possible.

I still think getting the rate-limit responsibility to the originating node 
would be a win in either case. It will chew into sketch capacity regardless.

-Alex


--- Original Message ---
On Thursday, May 26th, 2022 at 5:19 PM, Matt Corallo  
wrote:


>
> On 5/26/22 1:25 PM, Rusty Russell wrote:
>
> > Matt Corallo lf-li...@mattcorallo.com writes:
> >
> > > > > I agree there should be some rough consensus, but rate-limits are a 
> > > > > locally-enforced thing, not a
> > > > > global one. There will always be races and updates you reject that 
> > > > > your peers dont, no matter the
> > > > > rate-limit, and while I agree we should have guidelines, we can't 
> > > > > "just make them the same" - it
> > > > > both doesn't solve the problem and means we can't change them in the 
> > > > > future.
> > > >
> > > > Sure it does! It severly limits the set divergence to race conditions
> > > > (down to block height divergence, in practice).
> > >
> > > Huh? There's always some line you draw, if an update happens right on the 
> > > line (which they almost
> > > certainly often will because people want to update, and they'll update 
> > > every X hours to whatever the
> > > rate limit is), then ~half the network will accept the update and half 
> > > won't. I don't see how you
> > > solve this problem.
> >
> > The update contains a block number. Let's say we allow an update every
> > 100 blocks. This must be <= current block height (and 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-05-26 Thread Matt Corallo




On 5/26/22 1:25 PM, Rusty Russell wrote:

Matt Corallo  writes:

I agree there should be *some* rough consensus, but rate-limits are a 
locally-enforced thing, not a
global one. There will always be races and updates you reject that your peers 
dont, no matter the
rate-limit, and while I agree we should have guidelines, we can't "just make them 
the same" - it
both doesn't solve the problem and means we can't change them in the future.


Sure it does!  It severly limits the set divergence to race conditions
(down to block height divergence, in practice).


Huh? There's always some line you draw, if an update happens right on the line 
(which they almost
certainly often will because people want to update, and they'll update every X 
hours to whatever the
rate limit is), then ~half the network will accept the update and half won't. I 
don't see how you
solve this problem.


The update contains a block number.  Let's say we allow an update every
100 blocks.  This must be <= current block height (and presumably, newer
than height - 2016).

If you send an update number 60, and then 600100, it will propagate.
600099 will not.


Ah, this is an additional proposal on top, and requires a gossip "hard fork", which means your new 
protocol would only work for taproot channels, and any old/unupgraded channels will have to be 
propagated via the old mechanism. I'd kinda prefer to be able to rip out the old gossip sync code 
sooner than a few years from now :(.



If some nodes have 60 and others have 600099 (because you broke the
ratelimiting recommendation, *and* propagated both approx the same
time), then the network will split, sure.


Right, so what do you do in that case, though? AFAIU, in your proposed sync mechanism if a node does 
this once, you're stuck with all of your gossip reconciliations with every peer "wasting" one 
difference "slot" for a day or however long it takes before the peer does a sane update. In my 
proposed alternative it only appears once and then you move on (or maybe once more on startup, but 
we can maybe be willing to take on some extra cost there?).



Maybe.  What's a "non-update" based sketch?  Some huge percentage of
gossip is channel_update, so it's kind of the thing we want?


Oops, maybe we're on *very* different pages, here - I mean doing sketches based on 
"the things that
I received since the last sync, ie all the gossip updates from the last hour" 
vs doing sketches
based on "the things I have, ie my full gossip store".


But that requires state.  Full store requires none, keeping it
super-simple


Heh, I'm surprised you'd complain about this - IIUC your existing gossip storage system keeps this 
as a side-effect so it'd be a single integer for y'all :p. In any case, if it makes the protocol a 
chunk more efficient I don't see why its a big deal to keep track of the set of which invoices have 
changed recently, you could even make it super efficient by just saying "anything more recent than 
timestamp X *except* a few exceptions that we got with some lag against the update timestamp".


Better, the state is global, not per-peer, and a small fraction of the total state of the gossip 
store anyway, so its not like its introducing some new giant or non-constant-factor blowup.


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-05-26 Thread Rusty Russell
Matt Corallo  writes:
>>> I agree there should be *some* rough consensus, but rate-limits are a 
>>> locally-enforced thing, not a
>>> global one. There will always be races and updates you reject that your 
>>> peers dont, no matter the
>>> rate-limit, and while I agree we should have guidelines, we can't "just 
>>> make them the same" - it
>>> both doesn't solve the problem and means we can't change them in the future.
>> 
>> Sure it does!  It severly limits the set divergence to race conditions
>> (down to block height divergence, in practice).
>
> Huh? There's always some line you draw, if an update happens right on the 
> line (which they almost 
> certainly often will because people want to update, and they'll update every 
> X hours to whatever the 
> rate limit is), then ~half the network will accept the update and half won't. 
> I don't see how you 
> solve this problem.

The update contains a block number.  Let's say we allow an update every
100 blocks.  This must be <= current block height (and presumably, newer
than height - 2016).

If you send an update number 60, and then 600100, it will propagate.
600099 will not.

If some nodes have 60 and others have 600099 (because you broke the
ratelimiting recommendation, *and* propagated both approx the same
time), then the network will split, sure.

We could be fascist and penalize nodes which do this, but that's
overkill unless it actually happens a lot.

Nodes which want to keep an potential update "up their sleeve" will
backdate updates by 101 blocks (everyone should do this, in fact).

As I said, this has a problem with block height differences, but that's
explicitly included in the messages so you can ignore and wait if you
want.  Again, may not be a problem in practice.

>> Maybe.  What's a "non-update" based sketch?  Some huge percentage of
>> gossip is channel_update, so it's kind of the thing we want?
>
> Oops, maybe we're on *very* different pages, here - I mean doing sketches 
> based on "the things that 
> I received since the last sync, ie all the gossip updates from the last hour" 
> vs doing sketches 
> based on "the things I have, ie my full gossip store".

But that requires state.  Full store requires none, keeping it
super-simple

Though Alex has a idea for a "include even the expired entries" then
"regenerate every N blocks" which avoids the problem that each change is
two deltas (one remove, one add), at cost of some complexity.

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-05-26 Thread Matt Corallo

Oops, sorry, I don't really monitor the dev lists but once every few months so 
this fell off my plate :/

On 4/28/22 6:11 PM, Rusty Russell wrote:

Matt Corallo  writes:
OK, let's step back.  Unlike Bitcoin, we can use a single sketch for
*all* peers.  This is because we *can* encode enough information that
you can get useful info from the 64 bit id, and because it's expensive
to create them so you can't spam.


Yep, makes sense.


The more boutique per-peer handling we need, the further it gets from
this ideal;.


The second potential thing I think you might have meant here I don't see as an 
issue at all? You can
simply...let the sketch include one channel update that you ignored? See above 
discussion of similar
rate-limits.


No, you need to get all the ignored ones somehow?  There's so much cruft
in the sketch you can't decode it.  Now you need to remember the ones
you ratelimited, and try to match other's ratelimiting.


Right, you'd end up downloading the thing you rate-limited, but only once (possibly per-peer). If 
you use the total-sync approach you'd download it on every sync, vs a "only updates" approach you'd 
do it once.



I agree there should be *some* rough consensus, but rate-limits are a 
locally-enforced thing, not a
global one. There will always be races and updates you reject that your peers 
dont, no matter the
rate-limit, and while I agree we should have guidelines, we can't "just make them 
the same" - it
both doesn't solve the problem and means we can't change them in the future.


Sure it does!  It severly limits the set divergence to race conditions
(down to block height divergence, in practice).


Huh? There's always some line you draw, if an update happens right on the line (which they almost 
certainly often will because people want to update, and they'll update every X hours to whatever the 
rate limit is), then ~half the network will accept the update and half won't. I don't see how you 
solve this problem.

Maybe.  What's a "non-update" based sketch?  Some huge percentage of
gossip is channel_update, so it's kind of the thing we want?


Oops, maybe we're on *very* different pages, here - I mean doing sketches based on "the things that 
I received since the last sync, ie all the gossip updates from the last hour" vs doing sketches 
based on "the things I have, ie my full gossip store".


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-28 Thread Rusty Russell
Matt Corallo  writes:
> On 4/26/22 11:53 PM, Rusty Russell wrote:
>> Matt Corallo  writes:
 This same problem will occur if *anyone* does ratelimiting, unless
 *everyone* does.  And with minisketch, there's a good reason to do so.
>>>
>>> None of this seems like a good argument for *not* taking the "send updates 
>>> since the last sync in
>>> the minisketch" approach to reduce the damage inconsistent policies
>>> cause, though?
>> 
>> You can't do this, with minisketch.  You end up having to keep all the
>> ratelimited differences you're ignoring *per peer*, and then cancelling
>> them out of the minisketch on every receive or send.
>
> Hmm? I'm a bit confused, let me attempt to restate to make sure we're on the 
> same page. What I 
> *think* you said here is: "If you have a node which is rejecting a large 
> percentage *channel*'s 
> updates (on a per-channel, not per-update basis), and it tries to sync, 
> you'll end up having to keep 
> some huge set of 'I dont want any more updates for that channel' on a 
> per-peer basis"? Or maybe you 
> might have said "When you rate-limit, you have to tell your peer that you 
> rate-limited a channel 
> update and that it shouldn't add that update to its next sketch"?

OK, let's step back.  Unlike Bitcoin, we can use a single sketch for
*all* peers.  This is because we *can* encode enough information that
you can get useful info from the 64 bit id, and because it's expensive
to create them so you can't spam.

The more boutique per-peer handling we need, the further it gets from
this ideal;.

> The second potential thing I think you might have meant here I don't see as 
> an issue at all? You can 
> simply...let the sketch include one channel update that you ignored? See 
> above discussion of similar 
> rate-limits.

No, you need to get all the ignored ones somehow?  There's so much cruft
in the sketch you can't decode it.  Now you need to remember the ones
you ratelimited, and try to match other's ratelimiting.

>> So you end up doing that LND and core-lightning do, which is "pick 3
>> peers to gossip with" and tell everyone else to shut up.
>> 
>> Yet the point of minisketch is robustness; you can (at cost of 1 message
>> per minute) keep in sync with an arbitrary number of peers.
>> 
>> So, we might as well define a preferred ratelimit, so nodes know that
>> spamming past a certain point is not going to propagate.  At the moment,
>> LND has no effective ratelimit at all, so it's a race to the bottom.
>
> I agree there should be *some* rough consensus, but rate-limits are a 
> locally-enforced thing, not a 
> global one. There will always be races and updates you reject that your peers 
> dont, no matter the 
> rate-limit, and while I agree we should have guidelines, we can't "just make 
> them the same" - it 
> both doesn't solve the problem and means we can't change them in the future.

Sure it does!  It severly limits the set divergence to race conditions
(down to block height divergence, in practice).

> Ultimately, a updates-based sync is more robust in such a case - if there's 
> some race and your peer 
> accepts something you don't it may mean one more entry in the sketch one 
> time, but it won't hang 
> around forever.
>
>> We need that limit eventually, this just makes it more of a priority.
>> 
>>> I'm not really
>>> sure in a world where you do "update-based-sketch" gossip sync you're any 
>>> worse off than today even
>>> with different rate-limit policies, though I obviously agree there are 
>>> substantial issues with the
>>> massively inconsistent rate-limit policies we see today.
>> 
>> You can't really do it, since rate-limited junk overwhelms the sketch
>> really fast :(
>
> How is this any better in a non-update-based-sketch? The only way to address 
> it is to have a bigger 
> sketch, which you can do no matter the thing you're building the sketch over.
>
> Maybe lets schedule a call to get on the same page, throwing text at each 
> other will likely not move 
> very quickly.

Maybe.  What's a "non-update" based sketch?  Some huge percentage of
gossip is channel_update, so it's kind of the thing we want?

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-27 Thread Matt Corallo




On 4/26/22 11:53 PM, Rusty Russell wrote:

Matt Corallo  writes:

This same problem will occur if *anyone* does ratelimiting, unless
*everyone* does.  And with minisketch, there's a good reason to do so.


None of this seems like a good argument for *not* taking the "send updates 
since the last sync in
the minisketch" approach to reduce the damage inconsistent policies
cause, though?


You can't do this, with minisketch.  You end up having to keep all the
ratelimited differences you're ignoring *per peer*, and then cancelling
them out of the minisketch on every receive or send.


Hmm? I'm a bit confused, let me attempt to restate to make sure we're on the same page. What I 
*think* you said here is: "If you have a node which is rejecting a large percentage *channel*'s 
updates (on a per-channel, not per-update basis), and it tries to sync, you'll end up having to keep 
some huge set of 'I dont want any more updates for that channel' on a per-peer basis"? Or maybe you 
might have said "When you rate-limit, you have to tell your peer that you rate-limited a channel 
update and that it shouldn't add that update to its next sketch"?


Either way, I don't think its all that interesting an issue. The first case is definitely an issue, 
but is an issue in both a new-data-only sketch and all-data sketch world, and is not completely 
solved with identical rate-limits in any case. It can be largely addressed by sane software defaults 
and roughly-similar rate-limits, though, and because its a per-channel, not per-update issue I'm 
much less concerned.


The second potential thing I think you might have meant here I don't see as an issue at all? You can 
simply...let the sketch include one channel update that you ignored? See above discussion of similar 
rate-limits.



So you end up doing that LND and core-lightning do, which is "pick 3
peers to gossip with" and tell everyone else to shut up.

Yet the point of minisketch is robustness; you can (at cost of 1 message
per minute) keep in sync with an arbitrary number of peers.

So, we might as well define a preferred ratelimit, so nodes know that
spamming past a certain point is not going to propagate.  At the moment,
LND has no effective ratelimit at all, so it's a race to the bottom.


I agree there should be *some* rough consensus, but rate-limits are a locally-enforced thing, not a 
global one. There will always be races and updates you reject that your peers dont, no matter the 
rate-limit, and while I agree we should have guidelines, we can't "just make them the same" - it 
both doesn't solve the problem and means we can't change them in the future.


Ultimately, a updates-based sync is more robust in such a case - if there's some race and your peer 
accepts something you don't it may mean one more entry in the sketch one time, but it won't hang 
around forever.



We need that limit eventually, this just makes it more of a priority.


I'm not really
sure in a world where you do "update-based-sketch" gossip sync you're any worse 
off than today even
with different rate-limit policies, though I obviously agree there are 
substantial issues with the
massively inconsistent rate-limit policies we see today.


You can't really do it, since rate-limited junk overwhelms the sketch
really fast :(


How is this any better in a non-update-based-sketch? The only way to address it is to have a bigger 
sketch, which you can do no matter the thing you're building the sketch over.


Maybe lets schedule a call to get on the same page, throwing text at each other will likely not move 
very quickly.


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-24 Thread Matt Corallo



On 4/22/22 6:40 PM, Rusty Russell wrote:

Matt Corallo  writes:

Allowing only 1 a day, ended up with 18% of channels hitting the spam
limit.  We cannot fit that many channel differences inside a set!

Perhaps Alex should post his more detailed results, but it's pretty
clear that we can't stay in sync with this many differences :(


Right, the fact that most nodes don't do any limiting at all and y'all have a 
*very* aggressive (by
comparison) limit is going to be an issue in any context.


I'm unable to find the post years ago where I proposed this limit
and nobody had major objections.  I just volunteered to go first :)


I'm not trying to argue the number is good or bad, only that being several orders of magnitude away 
from everything else is going to lead to rejections.



We could set some guidelines and improve
things, but luckily regular-update-sync bypasses some of these issues anyway - 
if we sync once per
block and your limit is once per block, getting 1000 updates per block for some 
channel doesn't
result in multiple failures in the sync. Sure, multiple peers sending different 
updates for that
channel can still cause some failures, but its still much better.


Nodes will want to aggressively spam as much as they can, so I think we
need a widely-agreed limit.  I don't really care what it is, but
somewhere between per 1 and 1000 blocks makes sense?


I don't really disagree, but my point is that we should strive for the sync system to not need to 
care about this number as much as possible. Because views of the rate limits are a local view, not a 
global view, you'll always end up with things on the edge getting rejected during sync, and, worse, 
when we eventually want to change the limit, we'd be hosed.




But we might end up with a gossip2 if we want to enable taproot, and use
blockheight as timestamps, in which case we could probably just support
that one operation (and maybe a direct query op).


Like eclair, we don’t bother to rate limit and don’t see any issues with it, 
though we will skip relaying outbound updates if we’re saturating outbound 
connections.


Yeah, we did as a trial, and in some cases it's become limiting.  In
particular, people restarting their LND nodes once a day resulting in 2
updates per day (which, in 0.11.0, we now allow).


What do you mean "its become limiting"? As in you hit some reasonably-low 
CPU/disk/bandwidth limit
in doing this? We have a pretty aggressive bandwidth limit for this kinda stuff 
(well, indirect
bandwidth limit) and it very rarely hits in my experience (unless the peer is 
very overloaded and
not responding to pings, which is a somewhat separate thing...)


By rejecting more than 1 per day, some LND nodes had 50% of their
channels left disabled :(

This same problem will occur if *anyone* does ratelimiting, unless
*everyone* does.  And with minisketch, there's a good reason to do so.


None of this seems like a good argument for *not* taking the "send updates since the last sync in 
the minisketch" approach to reduce the damage inconsistent policies cause, though? I'm not really 
sure in a world where you do "update-based-sketch" gossip sync you're any worse off than today even 
with different rate-limit policies, though I obviously agree there are substantial issues with the 
massively inconsistent rate-limit policies we see today.


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-22 Thread Rusty Russell
Matt Corallo  writes:
>> Allowing only 1 a day, ended up with 18% of channels hitting the spam
>> limit.  We cannot fit that many channel differences inside a set!
>> 
>> Perhaps Alex should post his more detailed results, but it's pretty
>> clear that we can't stay in sync with this many differences :(
>
> Right, the fact that most nodes don't do any limiting at all and y'all have a 
> *very* aggressive (by 
> comparison) limit is going to be an issue in any context.

I'm unable to find the post years ago where I proposed this limit
and nobody had major objections.  I just volunteered to go first :)

> We could set some guidelines and improve 
> things, but luckily regular-update-sync bypasses some of these issues anyway 
> - if we sync once per 
> block and your limit is once per block, getting 1000 updates per block for 
> some channel doesn't 
> result in multiple failures in the sync. Sure, multiple peers sending 
> different updates for that 
> channel can still cause some failures, but its still much better.

Nodes will want to aggressively spam as much as they can, so I think we
need a widely-agreed limit.  I don't really care what it is, but
somewhere between per 1 and 1000 blocks makes sense?

Normally I'd suggest a burst, but that's bad for consensus: better to
say "just create your update N-6 blocks behind so you can always create a
new one 6 blocks behind".

>>> gossip queries  is broken in at least five ways.
>> 
>> Naah, it's perfect if you simply want to ask "give me updates since XXX"
>> to get you close enough on reconnect to start using set reconciliation.
>> This might allow us to remove some of the other features?
>
> Sure, but that's *just* the "gossip_timestamp_filter" message, there's 
> several other messages and a 
> whole query system that we can throw away if we just want that message :)

I agree.  Removing features would be nice :)

>> But we might end up with a gossip2 if we want to enable taproot, and use
>> blockheight as timestamps, in which case we could probably just support
>> that one operation (and maybe a direct query op).
>> 
>>> Like eclair, we don’t bother to rate limit and don’t see any issues with 
>>> it, though we will skip relaying outbound updates if we’re saturating 
>>> outbound connections.
>> 
>> Yeah, we did as a trial, and in some cases it's become limiting.  In
>> particular, people restarting their LND nodes once a day resulting in 2
>> updates per day (which, in 0.11.0, we now allow).
>
> What do you mean "its become limiting"? As in you hit some reasonably-low 
> CPU/disk/bandwidth limit 
> in doing this? We have a pretty aggressive bandwidth limit for this kinda 
> stuff (well, indirect 
> bandwidth limit) and it very rarely hits in my experience (unless the peer is 
> very overloaded and 
> not responding to pings, which is a somewhat separate thing...)

By rejecting more than 1 per day, some LND nodes had 50% of their
channels left disabled :(

This same problem will occur if *anyone* does ratelimiting, unless
*everyone* does.  And with minisketch, there's a good reason to do so.

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-22 Thread Matt Corallo



On 4/21/22 7:20 PM, Rusty Russell wrote:

Matt Corallo  writes:

Sure, if you’re rejecting a large % of channel updates in total
you’re gonna end up hitting degenerate cases, but we can consider
tuning the sync frequency if that becomes an issue.


Let's be clear: it's a problem.

Allowing only 1 a day, ended up with 18% of channels hitting the spam
limit.  We cannot fit that many channel differences inside a set!

Perhaps Alex should post his more detailed results, but it's pretty
clear that we can't stay in sync with this many differences :(


Right, the fact that most nodes don't do any limiting at all and y'all have a *very* aggressive (by 
comparison) limit is going to be an issue in any context. We could set some guidelines and improve 
things, but luckily regular-update-sync bypasses some of these issues anyway - if we sync once per 
block and your limit is once per block, getting 1000 updates per block for some channel doesn't 
result in multiple failures in the sync. Sure, multiple peers sending different updates for that 
channel can still cause some failures, but its still much better.



gossip queries  is broken in at least five ways.


Naah, it's perfect if you simply want to ask "give me updates since XXX"
to get you close enough on reconnect to start using set reconciliation.
This might allow us to remove some of the other features?


Sure, but that's *just* the "gossip_timestamp_filter" message, there's several other messages and a 
whole query system that we can throw away if we just want that message :)



But we might end up with a gossip2 if we want to enable taproot, and use
blockheight as timestamps, in which case we could probably just support
that one operation (and maybe a direct query op).


Like eclair, we don’t bother to rate limit and don’t see any issues with it, 
though we will skip relaying outbound updates if we’re saturating outbound 
connections.


Yeah, we did as a trial, and in some cases it's become limiting.  In
particular, people restarting their LND nodes once a day resulting in 2
updates per day (which, in 0.11.0, we now allow).


What do you mean "its become limiting"? As in you hit some reasonably-low CPU/disk/bandwidth limit 
in doing this? We have a pretty aggressive bandwidth limit for this kinda stuff (well, indirect 
bandwidth limit) and it very rarely hits in my experience (unless the peer is very overloaded and 
not responding to pings, which is a somewhat separate thing...)


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-22 Thread Matt Corallo



On 4/22/22 9:15 AM, Alex Myers wrote:

Hi Matt,

Appreciate your responses.  Hope you'll bear with me as I'm a bit new to this.

Instead of trying to make sure everyone’s gossip acceptance matches 
exactly, which as you point
it seems like a quagmire, why not (a) do a sync on startup and (b) do syncs 
of the *new* things.

I'm not opposed to this technique, and maybe it ends up as a better solution.  The rationale for not 
going full Erlay approach was that it's far less overhead to maintain a single sketch than to 
maintain a per-peer sketch and associated state for every gossip peer.  In this way there's very 
little cost to adding additional gossip peers, which further encourages propagation and convergence 
of the gossip network.


I'm not sure what you mean by per-node state here - I'd think you can implement it with a simple 
"list of updates that happened since time X" data, instead of having to maintain per-peer state.


IIUC Erlay's design was concerned for privacy of originating nodes.  Lightning gossip is public by 
nature, so I'm not sure we should constrain ourselves to the same design route without trying the 
alternative first.


Part of the design of Erlay, especially the insight of syncing updates instead of full mempools, was 
actually this precise issue - Bitcoin Core nodes differ in policy for a number of reasons 
(especially across updates), and thus syncing the full mempool will result in degenerate cases of 
trying over and over and over again to sync stuff your peer is rejecting. At least if I recall 
correctly.



if we're gonna add a minisketch-based sync anyway, please lets also use it 
for initial sync
after restart

This was out of the scope of what I had in mind, but I will give this some thought. I could see how 
a block_height reference coupled with set reconciliation could provide some better options here. 
This may not be all that difficult to shoe-horn in.


Regardless of single sketch or per-peer set reconciliation, it should be easier to implement with 
tighter rules on rate-limiting. (Keep in mind, the node's graph can presumably be updated 
independently of the gossip it rebroadcasts if desired.) As a thought experiment, if we consider a 
CLN-LDK set reconciliation, and that each node is gossiping with 5 other peers in an evenly spaced 
frequency, we would currently see 42.8 commonly accepted channel_updates over an average 60s window 
along with 11 more updates which LDK accepts and CLN rejects (spam.)[1] Assuming the other 5 peers 
have shared 5/6ths of this gossip before the CLN/LDK set reconciliation, we're left with CLN seeing 
7 updates to reconcile, while LDK sees 18.  Already we've lost 60% efficiency due to lack of a 
common rate-limit heuristic.


I do not believe that we will ever form a strong agreement on exactly what the rate-limits should 
be. And even if we do, we still have the issue of upgrades, where a simple change to the rate-limits 
causes sync to suddenly blow up and hit degenerate cases all over the place. Unless we can make the 
sync system relatively robust against slightly different policies, I think we're kinda screwed.


Worse, what happens if someone sends updates at exactly the limit of the rate-limiters? Presumably 
people will do this because "that's what the limit is and I want to send updates as often as I can 
becaux...". Now you'll still have similar issues, I believe.


I understand gossip traffic is manageable now, but I'm not sure it will be that long before it 
becomes an issue. Furthermore, any particular set reconciliation technique would benefit from a 
simple common rate-limit heuristic, not to mention originating nodes, who may not currently realize 
their channel updates are being rejected by a portion of the network due to differing criteria 
across implementations.


Yes, I agree there is definitely a concern with differing criteria resulting in nodes not realizing 
their gossip is not propagating. I agree guidelines would be nice, but guidelines doesn't solve the 
issue for sync, sadly, I think. Luckily lightning does provide a mechanism to bypass the rejection - 
send an update back with an HTLC failure. If you're trying to route an HTLC and a node has new 
parameters for you, it'll helpfully let you know when you try to use the old parameters.


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-22 Thread Alex Myers
Hi Matt,

Appreciate your responses. Hope you'll bear with me as I'm a bit new to this.

> Instead of trying to make sure everyone’s gossip acceptance matches exactly, 
> which as you point it seems like a quagmire, why not (a) do a sync on startup 
> and (b) do syncs of the *new* things.

I'm not opposed to this technique, and maybe it ends up as a better solution. 
The rationale for not going full Erlay approach was that it's far less overhead 
to maintain a single sketch than to maintain a per-peer sketch and associated 
state for every gossip peer. In this way there's very little cost to adding 
additional gossip peers, which further encourages propagation and convergence 
of the gossip network.

IIUC Erlay's design was concerned for privacy of originating nodes. Lightning 
gossip is public by nature, so I'm not sure we should constrain ourselves to 
the same design route without trying the alternative first.

> if we're gonna add a minisketch-based sync anyway, please lets also use it 
> for initial sync after restart

This was out of the scope of what I had in mind, but I will give this some 
thought. I could see how a block_height reference coupled with set 
reconciliation could provide some better options here. This may not be all that 
difficult to shoe-horn in.

Regardless of single sketch or per-peer set reconciliation, it should be easier 
to implement with tighter rules on rate-limiting. (Keep in mind, the node's 
graph can presumably be updated independently of the gossip it rebroadcasts if 
desired.) As a thought experiment, if we consider a CLN-LDK set reconciliation, 
and that each node is gossiping with 5 other peers in an evenly spaced 
frequency, we would currently see 42.8 commonly accepted channel_updates over 
an average 60s window along with 11 more updates which LDK accepts and CLN 
rejects (spam.)[1] Assuming the other 5 peers have shared 5/6ths of this gossip 
before the CLN/LDK set reconciliation, we're left with CLN seeing 7 updates to 
reconcile, while LDK sees 18. Already we've lost 60% efficiency due to lack of 
a common rate-limit heuristic.

I understand gossip traffic is manageable now, but I'm not sure it will be that 
long before it becomes an issue. Furthermore, any particular set reconciliation 
technique would benefit from a simple common rate-limit heuristic, not to 
mention originating nodes, who may not currently realize their channel updates 
are being rejected by a portion of the network due to differing criteria across 
implementations.

Thanks,
Alex

[1] https://github.com/endothermicdev/lnspammityspam/blob/main/sampleoutput.txt

--- Original Message ---
On Thursday, April 21st, 2022 at 3:47 PM, Matt Corallo lf-li...@mattcorallo.com 
wrote:

> On 4/21/22 1:31 PM, Alex Myers wrote:
>
>> Hello Bastien,
>>
>> Thank you for your feedback. I hope you don't mind I let it percolate for a 
>> while.
>>
>> Eclair doesn't do any rate-limiting. We wanted to "feel the pain" before 
>> adding
>> anything, and to be honest we haven't really felt it yet.
>>
>> I understand the “feel the pain first” approach, but attempting set 
>> reconciliation has forced me to
>> confront the issue a bit early.
>>
>> My thoughts on sync were that set-reconciliation would only be used once a 
>> node had fully synced
>> gossip through traditional means (initial_routing_sync / gossip_queries.) 
>> There should be many
>> levers to pull in order to help maintain sync after this. I'm going to have 
>> to experiment with them
>> a bit before I can claim they are sufficient, but I'm optimistic.
>
> Please, no. initial_routing_sync was removed from most implementations (it 
> sucks) and gossip queries
> is broken in at least five ways. May we can recover it by adding yet more 
> extensions but if we're
> gonna add a minisketch-based sync anyway, please lets also use it for initial 
> sync after restart
> (unless you have no channels at all, in which case lets maybe revive 
> initial_routing_sync...)
>
> Matt___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-21 Thread Rusty Russell
Matt Corallo  writes:
> Sure, if you’re rejecting a large % of channel updates in total
> you’re gonna end up hitting degenerate cases, but we can consider
> tuning the sync frequency if that becomes an issue.

Let's be clear: it's a problem.

Allowing only 1 a day, ended up with 18% of channels hitting the spam
limit.  We cannot fit that many channel differences inside a set!

Perhaps Alex should post his more detailed results, but it's pretty
clear that we can't stay in sync with this many differences :(

> gossip queries  is broken in at least five ways.

Naah, it's perfect if you simply want to ask "give me updates since XXX"
to get you close enough on reconnect to start using set reconciliation.
This might allow us to remove some of the other features?

But we might end up with a gossip2 if we want to enable taproot, and use
blockheight as timestamps, in which case we could probably just support
that one operation (and maybe a direct query op).

> Like eclair, we don’t bother to rate limit and don’t see any issues with it, 
> though we will skip relaying outbound updates if we’re saturating outbound 
> connections.

Yeah, we did as a trial, and in some cases it's become limiting.  In
particular, people restarting their LND nodes once a day resulting in 2
updates per day (which, in 0.11.0, we now allow).

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-21 Thread Matt Corallo



On 4/21/22 1:31 PM, Alex Myers wrote:

Hello Bastien,

Thank you for your feedback. I hope you don't mind I let it percolate for a 
while.

Eclair doesn't do any rate-limiting. We wanted to "feel the pain" before 
adding
anything, and to be honest we haven't really felt it yet.

I understand the “feel the pain first” approach, but attempting set reconciliation has forced me to 
confront the issue a bit early.


My thoughts on sync were that set-reconciliation would only be used once a node had fully synced 
gossip through traditional means (initial_routing_sync / gossip_queries.) There should be many 
levers to pull in order to help maintain sync after this. I'm going to have to experiment with them 
a bit before I can claim they are sufficient, but I'm optimistic.


Please, no. initial_routing_sync was removed from most implementations (it sucks) and gossip queries 
is broken in at least five ways. May we can recover it by adding yet more extensions but if we're 
gonna add a minisketch-based sync anyway, please lets also use it for initial sync after restart 
(unless you have no channels at all, in which case lets maybe revive initial_routing_sync...)


Matt
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-21 Thread Alex Myers
Hello Bastien,

Thank you for your feedback. I hope you don't mind I let it percolate for a 
while.

> Eclair doesn't do any rate-limiting. We wanted to "feel the pain" before 
> adding
> anything, and to be honest we haven't really felt it yet.

I understand the “feel the pain first” approach, but attempting set 
reconciliation has forced me to confront the issue a bit early.

My thoughts on sync were that set-reconciliation would only be used once a node 
had fully synced gossip through traditional means (initial_routing_sync / 
gossip_queries.) There should be many levers to pull in order to help maintain 
sync after this. I'm going to have to experiment with them a bit before I can 
claim they are sufficient, but I'm optimistic.

> One thing that may help here from an implementation's point of view is to 
> avoid
> sending a disabled channel update every time a channel goes offline. What
> eclair does to avoid spamming is to only send a disabled channel update when
> someone actually tries to use that channel. Of course, if people choose this
> offline node in their route, you don't have a choice and will need to send a
> disabled channel update, but we've observed that many channels come back
> online before we actually need to use them, so we're saving two channel 
> updates
> (one to disable the channel and one to re-enable it). I think all 
> implementations
> should do this. Is that the case today?
> We could go even further, and when we receive an htlc that should be relayed
> to an offline node, wait a bit to give them an opportunity to come online 
> instead
> of failing the htlc and sending a disabled channel update. Eclair currently 
> doesn'tdo that, but it would be very easy to add.

Core-Lightning also delays sending disabled channel updates in an effort to 
minimize unnecessary gossip. I hadn’t considered an additional delay before 
failing an htlc on a disabled channel. That will be interesting to explore in 
the context of transient disconnects of Tor v3 nodes.

I like the idea of a block_height in the channel update tlv. That would be 
sufficient to enable a simple rate-limit heuristic for this application anyway. 
Allowing leeway for the chain tip is no problem. I would also expect most 
implementations to hold a couple updates in reserve, defaulting to predated 
updates when available. This would allow a “burst” functionality similar to the 
current LND/CLN rate-limit, but the responsibility is now placed on the 
originating node to provide that allowance.

Cheers,

Alex

--- Original Message ---
On Friday, April 15th, 2022 at 2:15 AM, Bastien TEINTURIER  
wrote:

> Good morning Alex,
>
>> I’ve been investigating set reconciliation as a means to reduce bandwidth
>
>> and redundancy of gossip message propagation.
>
> Cool project, glad to see someone working on it! The main difficulty here will
> indeed be to ensure that the number of differences between sets is bounded.
> We will need to maintain a mechanism to sync the whole graph from scratch
> for new nodes, so the minisketch diff must be efficient enough otherwise nodes
> will just fall back to a full sync way too often (which would waste a lot of
> bandwidth).
>
>> Picking several offending channel ids, and digging further, the majority of 
>> these
>
>> appear to be flapping due to Tor or otherwise intermittent connections.
>
> One thing that may help here from an implementation's point of view is to 
> avoid
> sending a disabled channel update every time a channel goes offline. What
> eclair does to avoid spamming is to only send a disabled channel update when
> someone actually tries to use that channel. Of course, if people choose this
> offline node in their route, you don't have a choice and will need to send a
> disabled channel update, but we've observed that many channels come back
> online before we actually need to use them, so we're saving two channel 
> updates
> (one to disable the channel and one to re-enable it). I think all 
> implementations
> should do this. Is that the case today?
>
> We could go even further, and when we receive an htlc that should be relayed
> to an offline node, wait a bit to give them an opportunity to come online 
> instead
> of failing the htlc and sending a disabled channel update. Eclair currently 
> doesn't
> do that, but it would be very easy to add.
>
>> - A common listing of current default rate limits across lightning network 
>> implementations.
>
> Eclair doesn't do any rate-limiting. We wanted to "feel the pain" before 
> adding
> anything, and to be honest we haven't really felt it yet.
>
>> which will use a common, simple heuristic to accept or reject a gossip 
>> message.
>
>> (Think one channel update per block, or perhaps one per block_height << 5.)
>
> I think it would be easy to come to agreement between implementations and
> restrict channel updates to at most one every N blocks. We simply need to add
> the `block_height` in a tlv in `channel_update` and then 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-21 Thread Greg Sanders
I think I mentioned this out of band to Alex, but (b) is what Erlay's
proposal is for Bitcoin gossip, so it's worth studying up.

On Thu, Apr 21, 2022 at 9:18 AM Matt Corallo 
wrote:

> Instead of trying to make sure everyone’s gossip acceptance matches
> exactly, which as you point it seems like a quagmire, why not (a) do a sync
> on startup and (b) do syncs of the *new* things. This way you aren’t stuck
> staring at the same channels every time you do a sync. Sure, if you’re
> rejecting a large % of channel updates in total you’re gonna end up hitting
> degenerate cases, but we can consider tuning the sync frequency if that
> becomes an issue.
>
> Like eclair, we don’t bother to rate limit and don’t see any issues with
> it, though we will skip relaying outbound updates if we’re saturating
> outbound connections.
>
> On Apr 14, 2022, at 17:06, Alex Myers  wrote:
>
> 
>
> Hello lightning developers,
>
>
> I’ve been investigating set reconciliation as a means to reduce bandwidth
> and redundancy of gossip message propagation. This builds on some earlier work
> from Rusty using the minisketch library [1]. The idea is that each node
> will build a sketch representing it’s own gossip set. Alice’s node will
> encode and transmit this sketch to Bob’s node, where it will be merged with
> his own sketch, and the differences produced. These differences should
> ideally be exactly the latest missing gossip of both nodes. Due to size
> constraints, the set differences will necessarily be encoded, but Bob’s
> node will be able to identify which gossip Alice is missing, and may then
> transmit exactly those messages.
>
>
> This process is relatively straightforward, with the caveat that the sets
> must otherwise match very closely (each sketch has a maximum capacity for
> differences.) The difficulty here is that each node and lightning
> implementation may have its own rules for gossip acceptance and
> propagation. Depending on their gossip partners, not all gossip may
> propagate to the entire network.
>
>
> Core-lightning implements rate limiting for incoming channel updates and
> node announcements. The default rate limit is 1 per day, with a burst of
> 4. I analyzed my node’s gossip over a 14 day period, and found that, of
> all publicly broadcasting half-channels, 18% of them fell afoul of our
> spam-limiting rules at least once. [2]
>
>
> Picking several offending channel ids, and digging further, the majority
> of these appear to be flapping due to Tor or otherwise intermittent
> connections. Well connected nodes may be more susceptible to this due to more
> frequent routing attempts, and failures resulting in a returned channel
> update (which otherwise might not have been broadcast.) A slight
> relaxation of the rate limit resolves the majority of these cases.
>
>
> A smaller subset of channels broadcast frequent channel updates with minor
> adjustments to htlc_maximum_msat and fee_proportional_millionths
> parameters. These nodes appear to be power users, with many channels and
> large balances. I assume this is automated channel management at work.
>
>
> Core-Lightning has updated rate-limiting in the upcoming release to
> achieve a higher acceptance of incoming gossip, however, it seems that a
> broader discussion of rate limits may now be worthwhile. A few immediate
> ideas:
>
> - A common listing of current default rate limits across lightning
> network implementations.
>
> - Internal checks of RPC input to limit or warn of network propagation
> issues if certain rates are exceeded.
>
> - A commonly adopted rate-limit standard.
>
>
> My aim is a set reconciliation gossip type, which will use a common,
> simple heuristic to accept or reject a gossip message. (Think one channel
> update per block, or perhaps one per block_height << 5.) See my github
> for my current draft. [3] This solution allows tighter consensus, yet suffers
> from the same problem as original anti-spam measures – it remains
> somewhat arbitrary. I would like to start a conversation regarding gossip
> propagation, channel_update and node_announcement usage, and perhaps even
> bandwidth goals for syncing gossip in the future (how about a million
> channels?) This would aid in the development of gossip set
> reconciliation, but could also benefit current node connection and
> routing reliability more generally.
>
>
> Thanks,
>
> Alex
>
>
> [1] https://github.com/sipa/minisketch
>
> [2]
> https://github.com/endothermicdev/lnspammityspam/blob/main/sampleoutput.txt
>
> [3]
> https://github.com/endothermicdev/lightning-rfc/blob/gossip-minisketch/07-routing-gossip.md#set-reconciliation
>
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
>
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> 

Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-21 Thread Matt Corallo
Instead of trying to make sure everyone’s gossip acceptance matches exactly, 
which as you point it seems like a quagmire, why not (a) do a sync on startup 
and (b) do syncs of the *new* things. This way you aren’t stuck staring at the 
same channels every time you do a sync. Sure, if you’re rejecting a large % of 
channel updates in total you’re gonna end up hitting degenerate cases, but we 
can consider tuning the sync frequency if that becomes an issue.

Like eclair, we don’t bother to rate limit and don’t see any issues with it, 
though we will skip relaying outbound updates if we’re saturating outbound 
connections.

> On Apr 14, 2022, at 17:06, Alex Myers  wrote:
> 
> 
> Hello lightning developers,
> 
> I’ve been investigating set reconciliation as a means to reduce bandwidth and 
> redundancy of gossip message propagation. This builds on some earlier work 
> from Rusty using the minisketch library [1]. The idea is that each node will 
> build a sketch representing it’s own gossip set. Alice’s node will encode and 
> transmit this sketch to Bob’s node, where it will be merged with his own 
> sketch, and the differences produced. These differences should ideally be 
> exactly the latest missing gossip of both nodes. Due to size constraints, the 
> set differences will necessarily be encoded, but Bob’s node will be able to 
> identify which gossip Alice is missing, and may then transmit exactly those 
> messages.
> 
> This process is relatively straightforward, with the caveat that the sets 
> must otherwise match very closely (each sketch has a maximum capacity for 
> differences.) The difficulty here is that each node and lightning 
> implementation may have its own rules for gossip acceptance and propagation. 
> Depending on their gossip partners, not all gossip may propagate to the 
> entire network.
> 
> Core-lightning implements rate limiting for incoming channel updates and node 
> announcements. The default rate limit is 1 per day, with a burst of 4. I 
> analyzed my node’s gossip over a 14 day period, and found that, of all 
> publicly broadcasting half-channels, 18% of them fell afoul of our 
> spam-limiting rules at least once. [2]
> 
> Picking several offending channel ids, and digging further, the majority of 
> these appear to be flapping due to Tor or otherwise intermittent connections. 
> Well connected nodes may be more susceptible to this due to more frequent 
> routing attempts, and failures resulting in a returned channel update (which 
> otherwise might not have been broadcast.) A slight relaxation of the rate 
> limit resolves the majority of these cases.
> 
> A smaller subset of channels broadcast frequent channel updates with minor 
> adjustments to htlc_maximum_msat and fee_proportional_millionths parameters. 
> These nodes appear to be power users, with many channels and large balances. 
> I assume this is automated channel management at work.
> 
> Core-Lightning has updated rate-limiting in the upcoming release to achieve a 
> higher acceptance of incoming gossip, however, it seems that a broader 
> discussion of rate limits may now be worthwhile. A few immediate ideas:
> - A common listing of current default rate limits across lightning network 
> implementations.
> - Internal checks of RPC input to limit or warn of network propagation issues 
> if certain rates are exceeded.
> - A commonly adopted rate-limit standard.
> 
> My aim is a set reconciliation gossip type, which will use a common, simple 
> heuristic to accept or reject a gossip message. (Think one channel update per 
> block, or perhaps one per block_height << 5.) See my github for my current 
> draft. [3] This solution allows tighter consensus, yet suffers from the same 
> problem as original anti-spam measures – it remains somewhat arbitrary. I 
> would like to start a conversation regarding gossip propagation, 
> channel_update and node_announcement usage, and perhaps even bandwidth goals 
> for syncing gossip in the future (how about a million channels?) This would 
> aid in the development of gossip set reconciliation, but could also benefit 
> current node connection and routing reliability more generally.
> 
> Thanks,
> Alex
> 
> [1] https://github.com/sipa/minisketch
> [2] 
> https://github.com/endothermicdev/lnspammityspam/blob/main/sampleoutput.txt
> [3] 
> https://github.com/endothermicdev/lightning-rfc/blob/gossip-minisketch/07-routing-gossip.md#set-reconciliation
> 
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Gossip Propagation, Anti-spam, and Set Reconciliation

2022-04-15 Thread Bastien TEINTURIER
Good morning Alex,

I’ve been investigating set reconciliation as a means to reduce bandwidth

and redundancy of gossip message propagation.
>

Cool project, glad to see someone working on it! The main difficulty here
will
indeed be to ensure that the number of differences between sets is bounded.
We will need to maintain a mechanism to sync the whole graph from scratch
for new nodes, so the minisketch diff must be efficient enough otherwise
nodes
will just fall back to a full sync way too often (which would waste a lot of
bandwidth).

Picking several offending channel ids, and digging further, the majority of
> these

appear to be flapping due to Tor or otherwise intermittent connections.
>

One thing that may help here from an implementation's point of view is to
avoid
sending a disabled channel update every time a channel goes offline. What
eclair does to avoid spamming is to only send a disabled channel update when
someone actually tries to use that channel. Of course, if people choose this
offline node in their route, you don't have a choice and will need to send a
disabled channel update, but we've observed that many channels come back
online before we actually need to use them, so we're saving two channel
updates
(one to disable the channel and one to re-enable it). I think all
implementations
should do this. Is that the case today?

We could go even further, and when we receive an htlc that should be relayed
to an offline node, wait a bit to give them an opportunity to come online
instead
of failing the htlc and sending a disabled channel update. Eclair currently
doesn't
do that, but it would be very easy to add.

- A common listing of current default rate limits across lightning network
> implementations.
>

Eclair doesn't do any rate-limiting. We wanted to "feel the pain" before
adding
anything, and to be honest we haven't really felt it yet.

which will use a common, simple heuristic to accept or reject a gossip
> message.

(Think one channel update per block, or perhaps one per block_height << 5.)
>

I think it would be easy to come to agreement between implementations and
restrict channel updates to at most one every N blocks. We simply need to
add
the `block_height` in a tlv in `channel_update` and then we'll be able to
actually
rate-limit based on it. Given how much time it takes to upgrade most of the
network, it may be a good idea to add the `block_height` tlv now in the
spec,
and act on it later? Unless your work requires bigger changes in channel
update
in which case it will probably be a new message.

Note that it will never be completely accurate though, as different nodes
can
have different blockchain tips. My nodes may be one or two blocks late
compared
to the node that emits the channel update. We need to allow a bit of leeway
there.

Cheers,
Bastien




Le jeu. 14 avr. 2022 à 23:06, Alex Myers  a écrit :

> Hello lightning developers,
>
>
> I’ve been investigating set reconciliation as a means to reduce bandwidth
> and redundancy of gossip message propagation. This builds on some earlier work
> from Rusty using the minisketch library [1]. The idea is that each node
> will build a sketch representing it’s own gossip set. Alice’s node will
> encode and transmit this sketch to Bob’s node, where it will be merged with
> his own sketch, and the differences produced. These differences should
> ideally be exactly the latest missing gossip of both nodes. Due to size
> constraints, the set differences will necessarily be encoded, but Bob’s
> node will be able to identify which gossip Alice is missing, and may then
> transmit exactly those messages.
>
>
> This process is relatively straightforward, with the caveat that the sets
> must otherwise match very closely (each sketch has a maximum capacity for
> differences.) The difficulty here is that each node and lightning
> implementation may have its own rules for gossip acceptance and
> propagation. Depending on their gossip partners, not all gossip may
> propagate to the entire network.
>
>
> Core-lightning implements rate limiting for incoming channel updates and
> node announcements. The default rate limit is 1 per day, with a burst of
> 4. I analyzed my node’s gossip over a 14 day period, and found that, of
> all publicly broadcasting half-channels, 18% of them fell afoul of our
> spam-limiting rules at least once. [2]
>
>
> Picking several offending channel ids, and digging further, the majority
> of these appear to be flapping due to Tor or otherwise intermittent
> connections. Well connected nodes may be more susceptible to this due to more
> frequent routing attempts, and failures resulting in a returned channel
> update (which otherwise might not have been broadcast.) A slight
> relaxation of the rate limit resolves the majority of these cases.
>
>
> A smaller subset of channels broadcast frequent channel updates with minor
> adjustments to htlc_maximum_msat and fee_proportional_millionths
> parameters. These nodes appear to be