Re: [Lightning-dev] Quick analysis of channel_update data
I'll start collecting and checking data again, but from what I see now using our checksum extension still significantly reduces gossip traffic. I'm not saying that heuristics to reduce the number of updates cannot help, but I just don't think it should be our primary way of handling such traffic. If you've opened channels to nodes that are unreliable then you should eventually close these channels, but delaying how you publish updates that disable/enable them has an impact on everyone, especially if they mostly send payments (as opposed to relaying or receiving them). Cheers, Fabrice On Mon, 18 Feb 2019 at 13:10, Rusty Russell wrote: > > BTW, I took a snapshot of our gossip store from two weeks back, which > simply stores all gossip in order (compacting every week or so). > > channel_updates which updated existing channels: 17766 > ... which changed *only* the timestamps: 12644 > ... which were a week since the last: 7233 > ... which only changed the disable/enable: 4839 > > So there are about 5100 timestamp-only updates less than a week apart > (about 2000 are 1036 seconds apart, who is this?). > > 1. I'll look at getting even more conservative with flapping (120second >delay if we've just sent an update) but that doesn't seem to be the >majority of traffic. > 2. I'll also slow down refreshes to every 12 days, rather than 7, but >again it's only a marginal change. > > But basically, the majority of updates I saw two weeks ago are actually > refreshes, not spam. > > Hope that adds something? > Rusty. > > Fabrice Drouin writes: > > Additional info on channel_update traffic: > > > > Comparing daily backups of routing tables over the last 2 weeks shows > > that nearly all channels get at least a new update every day. This > > means that channel_update traffic is not primarily cause by nodes > > publishing new updates when channel are about to become stale: > > otherwise we would see 1/14th of our channels getting a new update on > > the first day, then another 1/14th on the second day and so on.This is > > confirmed by comparing routing table backups over a single day: nearly > > all channels were updated, one average once, with an update that > > almost always does not include new information. > > > > It could be caused by "flapping" channels, probably because the hosts > > that are hosting them are not reliable (as in is often offline). > > > > Heuristics can be used to improve traffic but it's orhtogonal to the > > problem of improving our current sync protocol. > > Also, these heuristics would probaly be used to close channels to > > unreliable nodes instead of filtering/delaying publishing updates for > > them. > > > > Finally, this is not just obsessing over bandwidth (though bandwidth > > is a real issue for most mobile users). I'm also over obsessing over > > startup time and payment UX :), because they do matter a lot for > > mobile users, and would like to push the current gossip design as far > > as it can go. I also think that we'll face the same issue when > > designing inventory messages for channel_update messages. > > > > Cheers, > > > > Fabrice > > > > > > > > On Wed, 9 Jan 2019 at 00:44, Rusty Russell wrote: > >> > >> Fabrice Drouin writes: > >> > I think there may even be a simpler case where not replacing updates > >> > will result in nodes not knowing that a channel has been re-enabled: > >> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables > >> > it, U3 enables it again and is the same as U1. If you discard it and > >> > just keep U1, and your peer has U2, how will you tell them that the > >> > channel has been enabled again ? Unless "discard" here means keep the > >> > update but don't broadcast it ? > >> > >> This can only happen if you happen to lose connection to the peer(s) > >> which sent U2 before it sends U3. > >> > >> Again, this corner case penalizes flapping channels. If we also > >> ratelimit our own enables to 1 per 120 seconds, you won't hit this case? > >> > >> > But then there's a risk that nodes would discard channels as stale > >> > because they don't get new updates when they reconnect. > >> > >> You need to accept redundant updates after 1 week, I think. > >> > >> Cheers, > >> Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
BTW, I took a snapshot of our gossip store from two weeks back, which simply stores all gossip in order (compacting every week or so). channel_updates which updated existing channels: 17766 ... which changed *only* the timestamps: 12644 ... which were a week since the last: 7233 ... which only changed the disable/enable: 4839 So there are about 5100 timestamp-only updates less than a week apart (about 2000 are 1036 seconds apart, who is this?). 1. I'll look at getting even more conservative with flapping (120second delay if we've just sent an update) but that doesn't seem to be the majority of traffic. 2. I'll also slow down refreshes to every 12 days, rather than 7, but again it's only a marginal change. But basically, the majority of updates I saw two weeks ago are actually refreshes, not spam. Hope that adds something? Rusty. Fabrice Drouin writes: > Additional info on channel_update traffic: > > Comparing daily backups of routing tables over the last 2 weeks shows > that nearly all channels get at least a new update every day. This > means that channel_update traffic is not primarily cause by nodes > publishing new updates when channel are about to become stale: > otherwise we would see 1/14th of our channels getting a new update on > the first day, then another 1/14th on the second day and so on.This is > confirmed by comparing routing table backups over a single day: nearly > all channels were updated, one average once, with an update that > almost always does not include new information. > > It could be caused by "flapping" channels, probably because the hosts > that are hosting them are not reliable (as in is often offline). > > Heuristics can be used to improve traffic but it's orhtogonal to the > problem of improving our current sync protocol. > Also, these heuristics would probaly be used to close channels to > unreliable nodes instead of filtering/delaying publishing updates for > them. > > Finally, this is not just obsessing over bandwidth (though bandwidth > is a real issue for most mobile users). I'm also over obsessing over > startup time and payment UX :), because they do matter a lot for > mobile users, and would like to push the current gossip design as far > as it can go. I also think that we'll face the same issue when > designing inventory messages for channel_update messages. > > Cheers, > > Fabrice > > > > On Wed, 9 Jan 2019 at 00:44, Rusty Russell wrote: >> >> Fabrice Drouin writes: >> > I think there may even be a simpler case where not replacing updates >> > will result in nodes not knowing that a channel has been re-enabled: >> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables >> > it, U3 enables it again and is the same as U1. If you discard it and >> > just keep U1, and your peer has U2, how will you tell them that the >> > channel has been enabled again ? Unless "discard" here means keep the >> > update but don't broadcast it ? >> >> This can only happen if you happen to lose connection to the peer(s) >> which sent U2 before it sends U3. >> >> Again, this corner case penalizes flapping channels. If we also >> ratelimit our own enables to 1 per 120 seconds, you won't hit this case? >> >> > But then there's a risk that nodes would discard channels as stale >> > because they don't get new updates when they reconnect. >> >> You need to accept redundant updates after 1 week, I think. >> >> Cheers, >> Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Additional info on channel_update traffic: Comparing daily backups of routing tables over the last 2 weeks shows that nearly all channels get at least a new update every day. This means that channel_update traffic is not primarily cause by nodes publishing new updates when channel are about to become stale: otherwise we would see 1/14th of our channels getting a new update on the first day, then another 1/14th on the second day and so on.This is confirmed by comparing routing table backups over a single day: nearly all channels were updated, one average once, with an update that almost always does not include new information. It could be caused by "flapping" channels, probably because the hosts that are hosting them are not reliable (as in is often offline). Heuristics can be used to improve traffic but it's orhtogonal to the problem of improving our current sync protocol. Also, these heuristics would probaly be used to close channels to unreliable nodes instead of filtering/delaying publishing updates for them. Finally, this is not just obsessing over bandwidth (though bandwidth is a real issue for most mobile users). I'm also over obsessing over startup time and payment UX :), because they do matter a lot for mobile users, and would like to push the current gossip design as far as it can go. I also think that we'll face the same issue when designing inventory messages for channel_update messages. Cheers, Fabrice On Wed, 9 Jan 2019 at 00:44, Rusty Russell wrote: > > Fabrice Drouin writes: > > I think there may even be a simpler case where not replacing updates > > will result in nodes not knowing that a channel has been re-enabled: > > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables > > it, U3 enables it again and is the same as U1. If you discard it and > > just keep U1, and your peer has U2, how will you tell them that the > > channel has been enabled again ? Unless "discard" here means keep the > > update but don't broadcast it ? > > This can only happen if you happen to lose connection to the peer(s) > which sent U2 before it sends U3. > > Again, this corner case penalizes flapping channels. If we also > ratelimit our own enables to 1 per 120 seconds, you won't hit this case? > > > But then there's a risk that nodes would discard channels as stale > > because they don't get new updates when they reconnect. > > You need to accept redundant updates after 1 week, I think. > > Cheers, > Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Christian Decker writes: > Assume that we have a network in which a node D receives the updates > from a node A through two or more separate paths: > > A --- B --- D > \--- C ---/ > > And let's assume that some channel of A (c_A) is flapping (not the ones > to B and C). A will send out two updates, one disables and the other one > re-enables c_A, otherwise they are identical (timestamp and signature > are different as well of course). > The flush interval in B is sufficient > to see both updates before flushing, hence both updates get dropped and > nothing apparently changed (D doesn't get told about anything from > B). The flush interval of C triggers after getting the re-enable, and D > gets the disabling update, followed by the enabling update once C's > flush interval triggers again. Yes, we save gossip from B->D, but not C->D. That's OK. In general we won't get coalescing if the DOWN/UP combo spans gossip flush. If everyone is the same 60 second timers this will continue to happen across the network AFAICT? We should probably change our gossip timer to 90 +/- 30 seconds which would (I think?) give more chance of flap suppression. > Worse if the connection A-C gets severed > between the updates, now C and D learned that the channel is disabled > and will not get the re-enabling update since B has dropped that one > altogether. If B now gets told by D about the disable, it'll also go > "ok, I'll disable it as well", leaving the entire network believing that > the channel is disabled. You're right; B needs to remember the last timestamp of the update it discarded, and ignore ones prior. So, in this (fairly obscure) scenario, the flapping channel gets penalized. But network is happier, and this suppression is a nice local policy. > If the routing > protocol is too chatty, we should make efforts towards local policies at > the senders of the update to reduce the number of flapping updates, not > build in-network deduplications. Maybe something like "eager-disable" > and "lazy-enable" is what we should go for, in which disables are sent > right away, and enables are put on an exponential backoff timeout (after > all what use are flappy nodes for routing?). Well, we lazy-disable because we assume it's still advertised as available. We eager-enable (iff we sent a disable) because we assume it's advertised as unavailable so we won't get traffic through it. Though we could set delay of 30 seconds on the enable, I think we're already current best practice? Cheers, Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Fabrice Drouin writes: > I think there may even be a simpler case where not replacing updates > will result in nodes not knowing that a channel has been re-enabled: > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables > it, U3 enables it again and is the same as U1. If you discard it and > just keep U1, and your peer has U2, how will you tell them that the > channel has been enabled again ? Unless "discard" here means keep the > update but don't broadcast it ? This can only happen if you happen to lose connection to the peer(s) which sent U2 before it sends U3. Again, this corner case penalizes flapping channels. If we also ratelimit our own enables to 1 per 120 seconds, you won't hit this case? > But then there's a risk that nodes would discard channels as stale > because they don't get new updates when they reconnect. You need to accept redundant updates after 1 week, I think. Cheers, Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
On Tue, 8 Jan 2019 at 17:11, Christian Decker wrote: > > Rusty Russell writes: > > Fortunately, this seems fairly easy to handle: discard the newer > > duplicate (unless > 1 week old). For future more advanced > > reconstruction schemes (eg. INV or minisketch), we could remember the > > latest timestamp of the duplicate, so we can avoid requesting it again. > > Unfortunately this assumes that you have a single update partner, and > still results in flaps, and might even result in a stuck state for some > channels. > > Assume that we have a network in which a node D receives the updates > from a node A through two or more separate paths: > > A --- B --- D > \--- C ---/ > > And let's assume that some channel of A (c_A) is flapping (not the ones > to B and C). A will send out two updates, one disables and the other one > re-enables c_A, otherwise they are identical (timestamp and signature > are different as well of course). The flush interval in B is sufficient > to see both updates before flushing, hence both updates get dropped and > nothing apparently changed (D doesn't get told about anything from > B). The flush interval of C triggers after getting the re-enable, and D > gets the disabling update, followed by the enabling update once C's > flush interval triggers again. Worse if the connection A-C gets severed > between the updates, now C and D learned that the channel is disabled > and will not get the re-enabling update since B has dropped that one > altogether. If B now gets told by D about the disable, it'll also go > "ok, I'll disable it as well", leaving the entire network believing that > the channel is disabled. > > This is really hard to debug, since A has sent a re-enabling > channel_update, but everybody is stuck in the old state. I think there may even be a simpler case where not replacing updates will result in nodes not knowing that a channel has been re-enabled: suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables it, U3 enables it again and is the same as U1. If you discard it and just keep U1, and your peer has U2, how will you tell them that the channel has been enabled again ? Unless "discard" here means keep the update but don't broadcast it ? > At least locally updating timestamp and signature for identical updates > and then not broadcasting if they were the only changes would at least > prevent the last issue of overriding a dropped state with an earlier > one, but it'd still leave C and D in an inconsistent state until we have > some sort of passive sync that compares routing tables and fixes these > issues. But then there's a risk that nodes would discard channels as stale because they don't get new updates when they reconnect. > I think all the bolted on things are pretty much overkill at this point, > it is unlikely that we will get any consistency in our views of the > routing table, but that's actually not needed to route, and we should > consider this a best effort gossip protocol anyway. If the routing > protocol is too chatty, we should make efforts towards local policies at > the senders of the update to reduce the number of flapping updates, not > build in-network deduplications. Maybe something like "eager-disable" > and "lazy-enable" is what we should go for, in which disables are sent > right away, and enables are put on an exponential backoff timeout (after > all what use are flappy nodes for routing?). Yes there are probably heuristics that would help reducing gossip traffic, and I see your point but I was thinking about doing the opposite: "eager-enable" and "lazy-disable", because from a sender's p.o.v trying to use a disabled channel is better than ignoring an enabled channel. Cheers, Fabrice ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
I mentioned this on IRC, but note that the flapping is not just useless information to be discarded without consideration. An important use of routing data is providing a "good" subset to nodes like mobile clients that don't want all the bandwidth to stay fully in sync. A pretty good indicator of a useless channel would be flapping, given its probably not very reliable for routing. I'm somewhat unconvinced that we should be optimizing for as little bandwidth use as possible here, though wins that don't lose information are nice. Matt > On Jan 8, 2019, at 16:28, Christian Decker wrote: > > Fabrice Drouin writes: > >> I think there may even be a simpler case where not replacing updates >> will result in nodes not knowing that a channel has been re-enabled: >> suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables >> it, U3 enables it again and is the same as U1. If you discard it and >> just keep U1, and your peer has U2, how will you tell them that the >> channel has been enabled again ? Unless "discard" here means keep the >> update but don't broadcast it ? > > Excellent point, that's a simpler example of how it could break down. > >>> I think all the bolted on things are pretty much overkill at this point, >>> it is unlikely that we will get any consistency in our views of the >>> routing table, but that's actually not needed to route, and we should >>> consider this a best effort gossip protocol anyway. If the routing >>> protocol is too chatty, we should make efforts towards local policies at >>> the senders of the update to reduce the number of flapping updates, not >>> build in-network deduplications. Maybe something like "eager-disable" >>> and "lazy-enable" is what we should go for, in which disables are sent >>> right away, and enables are put on an exponential backoff timeout (after >>> all what use are flappy nodes for routing?). >> >> Yes there are probably heuristics that would help reducing gossip >> traffic, and I see your point but I was thinking about doing the >> opposite: "eager-enable" and "lazy-disable", because from a sender's >> p.o.v trying to use a disabled channel is better than ignoring an >> enabled channel. > > That depends on what you are trying to optimize. Your solution keeps > more channels in enabled mode, potentially increasing failures due to > channels being unavailable. I was approaching it from the other side, > since failures are on the critical path in the payment flow, they'd > result in longer delays and many more retries, which I think is annoying > too. It probably depends on the network structure, i.e., if the fanout > from the endpoints is large, missing some channels shouldn't be a > problem, in which case the many failures delaying your payment weighs > more than not finding a route (eager-disable & lazy-enable). If on the > other hand we are really relying on a huge number of flaky connections > then eager-enable & lazy-disable might get lucky and get the payment > through. I'm hoping the network will have the latter structure, because > we'd have really unpredictable behavior anyway. > > We'll probably gain more insight once we start probing the network. My > expectation is that today's network is a baseline, whose resiliency and > redundancy will improve over time, hopefully swinging in favor of > trading off the speed gains over bare routability. > > Cheers, > Christian > ___ > Lightning-dev mailing list > Lightning-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Fabrice Drouin writes: > I think there may even be a simpler case where not replacing updates > will result in nodes not knowing that a channel has been re-enabled: > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables > it, U3 enables it again and is the same as U1. If you discard it and > just keep U1, and your peer has U2, how will you tell them that the > channel has been enabled again ? Unless "discard" here means keep the > update but don't broadcast it ? Excellent point, that's a simpler example of how it could break down. >> I think all the bolted on things are pretty much overkill at this point, >> it is unlikely that we will get any consistency in our views of the >> routing table, but that's actually not needed to route, and we should >> consider this a best effort gossip protocol anyway. If the routing >> protocol is too chatty, we should make efforts towards local policies at >> the senders of the update to reduce the number of flapping updates, not >> build in-network deduplications. Maybe something like "eager-disable" >> and "lazy-enable" is what we should go for, in which disables are sent >> right away, and enables are put on an exponential backoff timeout (after >> all what use are flappy nodes for routing?). > > Yes there are probably heuristics that would help reducing gossip > traffic, and I see your point but I was thinking about doing the > opposite: "eager-enable" and "lazy-disable", because from a sender's > p.o.v trying to use a disabled channel is better than ignoring an > enabled channel. That depends on what you are trying to optimize. Your solution keeps more channels in enabled mode, potentially increasing failures due to channels being unavailable. I was approaching it from the other side, since failures are on the critical path in the payment flow, they'd result in longer delays and many more retries, which I think is annoying too. It probably depends on the network structure, i.e., if the fanout from the endpoints is large, missing some channels shouldn't be a problem, in which case the many failures delaying your payment weighs more than not finding a route (eager-disable & lazy-enable). If on the other hand we are really relying on a huge number of flaky connections then eager-enable & lazy-disable might get lucky and get the payment through. I'm hoping the network will have the latter structure, because we'd have really unpredictable behavior anyway. We'll probably gain more insight once we start probing the network. My expectation is that today's network is a baseline, whose resiliency and redundancy will improve over time, hopefully swinging in favor of trading off the speed gains over bare routability. Cheers, Christian ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Rusty Russell writes: >> But only 18 000 pairs of channel updates carry actual fee and/or HTLC >> value change. 85% of the time, we just queried information that we >> already had! > > Note that this can happen in two legitimate cases: > 1. The weekly refresh of channel_update. > 2. A node updated too fast (A->B->A) and the ->A update caught up with the >->B update. > > Fortunately, this seems fairly easy to handle: discard the newer > duplicate (unless > 1 week old). For future more advanced > reconstruction schemes (eg. INV or minisketch), we could remember the > latest timestamp of the duplicate, so we can avoid requesting it again. Unfortunately this assumes that you have a single update partner, and still results in flaps, and might even result in a stuck state for some channels. Assume that we have a network in which a node D receives the updates from a node A through two or more separate paths: A --- B --- D \--- C ---/ And let's assume that some channel of A (c_A) is flapping (not the ones to B and C). A will send out two updates, one disables and the other one re-enables c_A, otherwise they are identical (timestamp and signature are different as well of course). The flush interval in B is sufficient to see both updates before flushing, hence both updates get dropped and nothing apparently changed (D doesn't get told about anything from B). The flush interval of C triggers after getting the re-enable, and D gets the disabling update, followed by the enabling update once C's flush interval triggers again. Worse if the connection A-C gets severed between the updates, now C and D learned that the channel is disabled and will not get the re-enabling update since B has dropped that one altogether. If B now gets told by D about the disable, it'll also go "ok, I'll disable it as well", leaving the entire network believing that the channel is disabled. This is really hard to debug, since A has sent a re-enabling channel_update, but everybody is stuck in the old state. At least locally updating timestamp and signature for identical updates and then not broadcasting if they were the only changes would at least prevent the last issue of overriding a dropped state with an earlier one, but it'd still leave C and D in an inconsistent state until we have some sort of passive sync that compares routing tables and fixes these issues. >> Adding a basic checksum (4 bytes for example) that covers fees and >> HTLC min/max value to our channel range queries would be a significant >> improvement and I will add this the open BOLT 1.1 proposal to extend >> queries with timestamps. >> >> I also think that such a checksum could also be used >> - in “inventory” based gossip messages >> - in set reconciliation schemes: we could reconcile [channel id | >> timestamp | checksum] first > > I think this is overkill? I think all the bolted on things are pretty much overkill at this point, it is unlikely that we will get any consistency in our views of the routing table, but that's actually not needed to route, and we should consider this a best effort gossip protocol anyway. If the routing protocol is too chatty, we should make efforts towards local policies at the senders of the update to reduce the number of flapping updates, not build in-network deduplications. Maybe something like "eager-disable" and "lazy-enable" is what we should go for, in which disables are sent right away, and enables are put on an exponential backoff timeout (after all what use are flappy nodes for routing?). Cheers, Christian ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Fabrice Drouin writes: > Follow-up: here's more detailed info on the data I collected and > potential savings we could achieve: > > I made hourly routing table backups for 12 days, and collected routing > information for 17 000 channel ids. > > There are 130 000 different channel updates :on average each channel > has been updated 8 times. Here, “different” means that at least the > timestamp has changed, and a node would have queried this channel > update during its syncing process. Side note: some implementations are also sending out updates with the *same* timestamp. This is not allowed... > But only 18 000 pairs of channel updates carry actual fee and/or HTLC > value change. 85% of the time, we just queried information that we > already had! Note that this can happen in two legitimate cases: 1. The weekly refresh of channel_update. 2. A node updated too fast (A->B->A) and the ->A update caught up with the ->B update. Fortunately, this seems fairly easy to handle: discard the newer duplicate (unless > 1 week old). For future more advanced reconstruction schemes (eg. INV or minisketch), we could remember the latest timestamp of the duplicate, so we can avoid requesting it again. > Adding a basic checksum (4 bytes for example) that covers fees and > HTLC min/max value to our channel range queries would be a significant > improvement and I will add this the open BOLT 1.1 proposal to extend > queries with timestamps. > > I also think that such a checksum could also be used > - in “inventory” based gossip messages > - in set reconciliation schemes: we could reconcile [channel id | > timestamp | checksum] first I think this is overkill? Thanks, Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
On Fri, 4 Jan 2019 at 04:43, ZmnSCPxj wrote: > > - in set reconciliation schemes: we could reconcile [channel id | > > timestamp | checksum] first > > Perhaps I misunderstand how set reconciliation works, but --- if timestamp is > changed while checksum is not, then it would still be seen as a set > difference and still require further communication rounds to discover that > the channel parameters have not actually changed. > > Perhaps it is better to reconcile [channel_id | checksum] instead, and if > there is a different set of channel parameters, share the set difference and > sort out which timestamp is later at that point. Ah yes of course, the `timestamp` should not be included. Cheers, Fabrice ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Good morning, > - in set reconciliation schemes: we could reconcile [channel id | > timestamp | checksum] first Perhaps I misunderstand how set reconciliation works, but --- if timestamp is changed while checksum is not, then it would still be seen as a set difference and still require further communication rounds to discover that the channel parameters have not actually changed. Perhaps it is better to reconcile [channel_id | checksum] instead, and if there is a different set of channel parameters, share the set difference and sort out which timestamp is later at that point. Regards, ZmnSCPxj ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Follow-up: here's more detailed info on the data I collected and potential savings we could achieve: I made hourly routing table backups for 12 days, and collected routing information for 17 000 channel ids. There are 130 000 different channel updates :on average each channel has been updated 8 times. Here, “different” means that at least the timestamp has changed, and a node would have queried this channel update during its syncing process. But only 18 000 pairs of channel updates carry actual fee and/or HTLC value change. 85% of the time, we just queried information that we already had! Adding a basic checksum (4 bytes for example) that covers fees and HTLC min/max value to our channel range queries would be a significant improvement and I will add this the open BOLT 1.1 proposal to extend queries with timestamps. I also think that such a checksum could also be used - in “inventory” based gossip messages - in set reconciliation schemes: we could reconcile [channel id | timestamp | checksum] first Cheers, Fabrice ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
On Wed, 2 Jan 2019 at 18:26, Christian Decker wrote: > > For the ones that flap with a period that is long enough for the > disabling and enabling updates being flushed, we are presented with a > tradeoff. IIRC we (c-lightning) currently hold back disabling > `channel_update`s until someone actually attempts to use the channel at > which point we fail the HTLC and send out the stashed `channel_update` > thus reducing the publicly visible flapping. For the enabling we can't > do that, but we could think about a local policy on how much to delay a > `channel_update` depending on the past stability of that peer. Again > this is local policy and doesn't warrant a spec change. > > I think we should probably try out some policies related to when to send > `channel_update`s and how to hide redundant updates, and then we can see > which ones work best :-) > Yes, I haven't looked at how to handle this with local policies. My hypothesis is that when you're syncing a routing table that is say one day old, you end up querying and downloading a lot of information that you already have, and that adding a basic checksum to our channel queries may greatly improve this. Of course this would be much more actionable with stats and hard numbers which I'll provide ASAP. Cheers, Fabrice ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Quick analysis of channel_update data
Hi Fabrice, happy new year to you too :-) Thanks for taking the time to collect that information. It's very much in line with what we were expecting in that most of the updates come from flapping channels. Your second observation that some updates only change the timestamp is likely due to the staggered broadcast merging multiple updates, e.g., one disabling and one enabling the channel, that are sent very close to each other. This is the very reason we introduced the staggering back in the days, as it limits the maximum rate of updates a single node may produce for each of its channels. In the second case we can probably get away with not forwarding the update, but updating the timestamp and signature for the old `channel_update` locally, so that we don't then flap back to an older one should we get that in a roundabout way. That's purely a local decision and does not warrant a spec change imho. For the ones that flap with a period that is long enough for the disabling and enabling updates being flushed, we are presented with a tradeoff. IIRC we (c-lightning) currently hold back disabling `channel_update`s until someone actually attempts to use the channel at which point we fail the HTLC and send out the stashed `channel_update` thus reducing the publicly visible flapping. For the enabling we can't do that, but we could think about a local policy on how much to delay a `channel_update` depending on the past stability of that peer. Again this is local policy and doesn't warrant a spec change. I think we should probably try out some policies related to when to send `channel_update`s and how to hide redundant updates, and then we can see which ones work best :-) Cheers, Christian Fabrice Drouin writes: > Hello All, and Happy New Year! > > To understand why there is a steady stream of channel updates, even > when fee parameters don't seem to actually change, I made hourly > backups of the routing table of one of our nodes, and compared these > routing tables to see what exactly was being modified. > > It turns out that: > - there are a lot of disable/enable/disable etc…. updates which are > just sent when a channel is disabled then enabled again (when nodes go > offline for example ?). This can happen > there are also a lot of updates that don’t change anything (just a new > timestamp and signatures but otherwise same info), up to several times > a day for the same channel id > > In both cases we end up syncing info that we already have. > I don’t know yet how best to use this when syncing routing tables, but > I thought it was worth sharing anyway. A basic checksum that does not > cover all fields, but only fees and HTLC min/max values could probably > be used to improve routing table sync ? > > Cheers, > > Fabrice > ___ > Lightning-dev mailing list > Lightning-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
[Lightning-dev] Quick analysis of channel_update data
Hello All, and Happy New Year! To understand why there is a steady stream of channel updates, even when fee parameters don't seem to actually change, I made hourly backups of the routing table of one of our nodes, and compared these routing tables to see what exactly was being modified. It turns out that: - there are a lot of disable/enable/disable etc…. updates which are just sent when a channel is disabled then enabled again (when nodes go offline for example ?). This can happen there are also a lot of updates that don’t change anything (just a new timestamp and signatures but otherwise same info), up to several times a day for the same channel id In both cases we end up syncing info that we already have. I don’t know yet how best to use this when syncing routing tables, but I thought it was worth sharing anyway. A basic checksum that does not cover all fields, but only fees and HTLC min/max values could probably be used to improve routing table sync ? Cheers, Fabrice ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev