subject:"\[Lightning\-dev\] Quick analysis of channel

Re: [Lightning-dev] Quick analysis of channel_update data

2019-02-18 Thread Fabrice Drouin

I'll start collecting and checking data again, but from what I see now
using our checksum extension still significantly reduces gossip
traffic.

I'm not saying that heuristics to reduce the number of updates cannot
help, but I just don't think it should be our primary way of handling
such traffic. If you've opened channels to nodes that are unreliable
then you should eventually close these channels, but delaying how you
publish updates that disable/enable them has an impact on everyone,
especially if they mostly send payments (as opposed to relaying or
receiving them).

Cheers,

Fabrice

On Mon, 18 Feb 2019 at 13:10, Rusty Russell  wrote:
>
> BTW, I took a snapshot of our gossip store from two weeks back, which
> simply stores all gossip in order (compacting every week or so).
>
> channel_updates which updated existing channels: 17766
> ... which changed *only* the timestamps: 12644
> ... which were a week since the last: 7233
> ... which only changed the disable/enable: 4839
>
> So there are about 5100 timestamp-only updates less than a week apart
> (about 2000 are 1036 seconds apart, who is this?).
>
> 1. I'll look at getting even more conservative with flapping (120second
>delay if we've just sent an update) but that doesn't seem to be the
>majority of traffic.
> 2. I'll also slow down refreshes to every 12 days, rather than 7, but
>again it's only a marginal change.
>
> But basically, the majority of updates I saw two weeks ago are actually
> refreshes, not spam.
>
> Hope that adds something?
> Rusty.
>
> Fabrice Drouin  writes:
> > Additional info on channel_update traffic:
> >
> > Comparing daily backups of routing tables over the last 2 weeks shows
> > that nearly all channels get at least a new update every day. This
> > means that channel_update traffic is not primarily cause by nodes
> > publishing new updates when channel are about to become stale:
> > otherwise we would see 1/14th of our channels getting a new update on
> > the first day, then another 1/14th on the second day and so on.This is
> > confirmed by comparing routing table backups over a single day: nearly
> > all channels were updated, one average once, with an update that
> > almost always does not include new information.
> >
> > It could be caused by "flapping" channels, probably because the hosts
> > that are hosting them are not reliable (as in is often offline).
> >
> > Heuristics can be used to improve traffic but it's orhtogonal to the
> > problem of improving our current sync protocol.
> > Also, these heuristics would probaly be used to close channels to
> > unreliable nodes instead of filtering/delaying publishing updates for
> > them.
> >
> > Finally, this is not just obsessing over bandwidth (though bandwidth
> > is a real issue for most mobile users). I'm also over obsessing over
> > startup time and payment UX :), because they do matter a lot for
> > mobile users, and would like to push the current gossip design as far
> > as it can go. I also think that we'll face the same issue when
> > designing inventory messages for channel_update messages.
> >
> > Cheers,
> >
> > Fabrice
> >
> >
> >
> > On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
> >>
> >> Fabrice Drouin  writes:
> >> > I think there may even be a simpler case where not replacing updates
> >> > will result in nodes not knowing that a channel has been re-enabled:
> >> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> >> > it, U3 enables it again and is the same as U1. If you discard it and
> >> > just keep U1, and your peer has U2, how will you tell them that the
> >> > channel has been enabled again ? Unless "discard" here means keep the
> >> > update but don't broadcast it ?
> >>
> >> This can only happen if you happen to lose connection to the peer(s)
> >> which sent U2 before it sends U3.
> >>
> >> Again, this corner case penalizes flapping channels.  If we also
> >> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
> >>
> >> > But then there's a risk that nodes would discard channels as stale
> >> > because they don't get new updates when they reconnect.
> >>
> >> You need to accept redundant updates after 1 week, I think.
> >>
> >> Cheers,
> >> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-02-18 Thread Rusty Russell

BTW, I took a snapshot of our gossip store from two weeks back, which
simply stores all gossip in order (compacting every week or so).

channel_updates which updated existing channels: 17766
... which changed *only* the timestamps: 12644
... which were a week since the last: 7233
... which only changed the disable/enable: 4839

So there are about 5100 timestamp-only updates less than a week apart
(about 2000 are 1036 seconds apart, who is this?).

1. I'll look at getting even more conservative with flapping (120second
   delay if we've just sent an update) but that doesn't seem to be the
   majority of traffic.
2. I'll also slow down refreshes to every 12 days, rather than 7, but
   again it's only a marginal change.

But basically, the majority of updates I saw two weeks ago are actually
refreshes, not spam.

Hope that adds something?
Rusty.

Fabrice Drouin  writes:
> Additional info on channel_update traffic:
>
> Comparing daily backups of routing tables over the last 2 weeks shows
> that nearly all channels get at least a new update every day. This
> means that channel_update traffic is not primarily cause by nodes
> publishing new updates when channel are about to become stale:
> otherwise we would see 1/14th of our channels getting a new update on
> the first day, then another 1/14th on the second day and so on.This is
> confirmed by comparing routing table backups over a single day: nearly
> all channels were updated, one average once, with an update that
> almost always does not include new information.
>
> It could be caused by "flapping" channels, probably because the hosts
> that are hosting them are not reliable (as in is often offline).
>
> Heuristics can be used to improve traffic but it's orhtogonal to the
> problem of improving our current sync protocol.
> Also, these heuristics would probaly be used to close channels to
> unreliable nodes instead of filtering/delaying publishing updates for
> them.
>
> Finally, this is not just obsessing over bandwidth (though bandwidth
> is a real issue for most mobile users). I'm also over obsessing over
> startup time and payment UX :), because they do matter a lot for
> mobile users, and would like to push the current gossip design as far
> as it can go. I also think that we'll face the same issue when
> designing inventory messages for channel_update messages.
>
> Cheers,
>
> Fabrice
>
>
>
> On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
>>
>> Fabrice Drouin  writes:
>> > I think there may even be a simpler case where not replacing updates
>> > will result in nodes not knowing that a channel has been re-enabled:
>> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
>> > it, U3 enables it again and is the same as U1. If you discard it and
>> > just keep U1, and your peer has U2, how will you tell them that the
>> > channel has been enabled again ? Unless "discard" here means keep the
>> > update but don't broadcast it ?
>>
>> This can only happen if you happen to lose connection to the peer(s)
>> which sent U2 before it sends U3.
>>
>> Again, this corner case penalizes flapping channels.  If we also
>> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
>>
>> > But then there's a risk that nodes would discard channels as stale
>> > because they don't get new updates when they reconnect.
>>
>> You need to accept redundant updates after 1 week, I think.
>>
>> Cheers,
>> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-20 Thread Fabrice Drouin

Additional info on channel_update traffic:

Comparing daily backups of routing tables over the last 2 weeks shows
that nearly all channels get at least a new update every day. This
means that channel_update traffic is not primarily cause by nodes
publishing new updates when channel are about to become stale:
otherwise we would see 1/14th of our channels getting a new update on
the first day, then another 1/14th on the second day and so on.This is
confirmed by comparing routing table backups over a single day: nearly
all channels were updated, one average once, with an update that
almost always does not include new information.

It could be caused by "flapping" channels, probably because the hosts
that are hosting them are not reliable (as in is often offline).

Heuristics can be used to improve traffic but it's orhtogonal to the
problem of improving our current sync protocol.
Also, these heuristics would probaly be used to close channels to
unreliable nodes instead of filtering/delaying publishing updates for
them.

Finally, this is not just obsessing over bandwidth (though bandwidth
is a real issue for most mobile users). I'm also over obsessing over
startup time and payment UX :), because they do matter a lot for
mobile users, and would like to push the current gossip design as far
as it can go. I also think that we'll face the same issue when
designing inventory messages for channel_update messages.

Cheers,

Fabrice

On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
>
> Fabrice Drouin  writes:
> > I think there may even be a simpler case where not replacing updates
> > will result in nodes not knowing that a channel has been re-enabled:
> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> > it, U3 enables it again and is the same as U1. If you discard it and
> > just keep U1, and your peer has U2, how will you tell them that the
> > channel has been enabled again ? Unless "discard" here means keep the
> > update but don't broadcast it ?
>
> This can only happen if you happen to lose connection to the peer(s)
> which sent U2 before it sends U3.
>
> Again, this corner case penalizes flapping channels.  If we also
> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
>
> > But then there's a risk that nodes would discard channels as stale
> > because they don't get new updates when they reconnect.
>
> You need to accept redundant updates after 1 week, I think.
>
> Cheers,
> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Rusty Russell

Christian Decker  writes:
> Assume that we have a network in which a node D receives the updates
> from a node A through two or more separate paths:
>
> A --- B --- D
>  \--- C ---/
>
> And let's assume that some channel of A (c_A) is flapping (not the ones
> to B and C). A will send out two updates, one disables and the other one
> re-enables c_A, otherwise they are identical (timestamp and signature
> are different as well of course).

> The flush interval in B is sufficient
> to see both updates before flushing, hence both updates get dropped and
> nothing apparently changed (D doesn't get told about anything from
> B). The flush interval of C triggers after getting the re-enable, and D
> gets the disabling update, followed by the enabling update once C's
> flush interval triggers again.

Yes, we save gossip from B->D, but not C->D.  That's OK.

In general we won't get coalescing if the DOWN/UP combo spans gossip
flush.  If everyone is the same 60 second timers this will continue to
happen across the network AFAICT?  We should probably change our gossip
timer to 90 +/- 30 seconds which would (I think?) give more chance of
flap suppression.

> Worse if the connection A-C gets severed
> between the updates, now C and D learned that the channel is disabled
> and will not get the re-enabling update since B has dropped that one
> altogether. If B now gets told by D about the disable, it'll also go
> "ok, I'll disable it as well", leaving the entire network believing that
> the channel is disabled.

You're right; B needs to remember the last timestamp of the update it
discarded, and ignore ones prior.

So, in this (fairly obscure) scenario, the flapping channel gets
penalized.  But network is happier, and this suppression is a nice local
policy.

> If the routing
> protocol is too chatty, we should make efforts towards local policies at
> the senders of the update to reduce the number of flapping updates, not
> build in-network deduplications. Maybe something like "eager-disable"
> and "lazy-enable" is what we should go for, in which disables are sent
> right away, and enables are put on an exponential backoff timeout (after
> all what use are flappy nodes for routing?).

Well, we lazy-disable because we assume it's still advertised as
available.  We eager-enable (iff we sent a disable) because we assume
it's advertised as unavailable so we won't get traffic through it.
Though we could set delay of 30 seconds on the enable, I think we're
already current best practice?

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Rusty Russell

Fabrice Drouin  writes:
> I think there may even be a simpler case where not replacing updates
> will result in nodes not knowing that a channel has been re-enabled:
> suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> it, U3 enables it again and is the same as U1. If you discard it and
> just keep U1, and your peer has U2, how will you tell them that the
> channel has been enabled again ? Unless "discard" here means keep the
> update but don't broadcast it ?

This can only happen if you happen to lose connection to the peer(s)
which sent U2 before it sends U3.

Again, this corner case penalizes flapping channels.  If we also
ratelimit our own enables to 1 per 120 seconds, you won't hit this case?

> But then there's a risk that nodes would discard channels as stale
> because they don't get new updates when they reconnect.

You need to accept redundant updates after 1 week, I think.

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Fabrice Drouin

On Tue, 8 Jan 2019 at 17:11, Christian Decker
 wrote:
>
> Rusty Russell  writes:
> > Fortunately, this seems fairly easy to handle: discard the newer
> > duplicate (unless > 1 week old).  For future more advanced
> > reconstruction schemes (eg. INV or minisketch), we could remember the
> > latest timestamp of the duplicate, so we can avoid requesting it again.
>
> Unfortunately this assumes that you have a single update partner, and
> still results in flaps, and might even result in a stuck state for some
> channels.
>
> Assume that we have a network in which a node D receives the updates
> from a node A through two or more separate paths:
>
> A --- B --- D
>  \--- C ---/
>
> And let's assume that some channel of A (c_A) is flapping (not the ones
> to B and C). A will send out two updates, one disables and the other one
> re-enables c_A, otherwise they are identical (timestamp and signature
> are different as well of course). The flush interval in B is sufficient
> to see both updates before flushing, hence both updates get dropped and
> nothing apparently changed (D doesn't get told about anything from
> B). The flush interval of C triggers after getting the re-enable, and D
> gets the disabling update, followed by the enabling update once C's
> flush interval triggers again. Worse if the connection A-C gets severed
> between the updates, now C and D learned that the channel is disabled
> and will not get the re-enabling update since B has dropped that one
> altogether. If B now gets told by D about the disable, it'll also go
> "ok, I'll disable it as well", leaving the entire network believing that
> the channel is disabled.
>
> This is really hard to debug, since A has sent a re-enabling
> channel_update, but everybody is stuck in the old state.

I think there may even be a simpler case where not replacing updates
will result in nodes not knowing that a channel has been re-enabled:
suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
it, U3 enables it again and is the same as U1. If you discard it and
just keep U1, and your peer has U2, how will you tell them that the
channel has been enabled again ? Unless "discard" here means keep the
update but don't broadcast it ?


> At least locally updating timestamp and signature for identical updates
> and then not broadcasting if they were the only changes would at least
> prevent the last issue of overriding a dropped state with an earlier
> one, but it'd still leave C and D in an inconsistent state until we have
> some sort of passive sync that compares routing tables and fixes these
> issues.

But then there's a risk that nodes would discard channels as stale
because they don't get new updates when they reconnect.

> I think all the bolted on things are pretty much overkill at this point,
> it is unlikely that we will get any consistency in our views of the
> routing table, but that's actually not needed to route, and we should
> consider this a best effort gossip protocol anyway. If the routing
> protocol is too chatty, we should make efforts towards local policies at
> the senders of the update to reduce the number of flapping updates, not
> build in-network deduplications. Maybe something like "eager-disable"
> and "lazy-enable" is what we should go for, in which disables are sent
> right away, and enables are put on an exponential backoff timeout (after
> all what use are flappy nodes for routing?).

Yes there are probably heuristics that would help reducing gossip
traffic, and I see your point but I was thinking about doing the
opposite: "eager-enable" and "lazy-disable", because from a sender's
p.o.v trying to use a disabled channel is better than ignoring an
enabled channel.

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Matt Corallo

I mentioned this on IRC, but note that the flapping is not just useless 
information to be discarded without consideration. An important use of routing 
data is providing a "good" subset to nodes like mobile clients that don't want 
all the bandwidth to stay fully in sync. A pretty good indicator of a useless 
channel would be flapping, given its probably not very reliable for routing.

I'm somewhat unconvinced that we should be optimizing for as little bandwidth 
use as possible here, though wins that don't lose information are nice.

Matt

> On Jan 8, 2019, at 16:28, Christian Decker  wrote:
> 
> Fabrice Drouin  writes:
> 
>> I think there may even be a simpler case where not replacing updates
>> will result in nodes not knowing that a channel has been re-enabled:
>> suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
>> it, U3 enables it again and is the same as U1. If you discard it and
>> just keep U1, and your peer has U2, how will you tell them that the
>> channel has been enabled again ? Unless "discard" here means keep the
>> update but don't broadcast it ?
> 
> Excellent point, that's a simpler example of how it could break down.
> 
>>> I think all the bolted on things are pretty much overkill at this point,
>>> it is unlikely that we will get any consistency in our views of the
>>> routing table, but that's actually not needed to route, and we should
>>> consider this a best effort gossip protocol anyway. If the routing
>>> protocol is too chatty, we should make efforts towards local policies at
>>> the senders of the update to reduce the number of flapping updates, not
>>> build in-network deduplications. Maybe something like "eager-disable"
>>> and "lazy-enable" is what we should go for, in which disables are sent
>>> right away, and enables are put on an exponential backoff timeout (after
>>> all what use are flappy nodes for routing?).
>> 
>> Yes there are probably heuristics that would help reducing gossip
>> traffic, and I see your point but I was thinking about doing the
>> opposite: "eager-enable" and "lazy-disable", because from a sender's
>> p.o.v trying to use a disabled channel is better than ignoring an
>> enabled channel.
> 
> That depends on what you are trying to optimize. Your solution keeps
> more channels in enabled mode, potentially increasing failures due to
> channels being unavailable. I was approaching it from the other side,
> since failures are on the critical path in the payment flow, they'd
> result in longer delays and many more retries, which I think is annoying
> too. It probably depends on the network structure, i.e., if the fanout
> from the endpoints is large, missing some channels shouldn't be a
> problem, in which case the many failures delaying your payment weighs
> more than not finding a route (eager-disable & lazy-enable). If on the
> other hand we are really relying on a huge number of flaky connections
> then eager-enable & lazy-disable might get lucky and get the payment
> through. I'm hoping the network will have the latter structure, because
> we'd have really unpredictable behavior anyway.
> 
> We'll probably gain more insight once we start probing the network. My
> expectation is that today's network is a baseline, whose resiliency and
> redundancy will improve over time, hopefully swinging in favor of
> trading off the speed gains over bare routability.
> 
> Cheers,
> Christian
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Christian Decker

Fabrice Drouin  writes:

> I think there may even be a simpler case where not replacing updates
> will result in nodes not knowing that a channel has been re-enabled:
> suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> it, U3 enables it again and is the same as U1. If you discard it and
> just keep U1, and your peer has U2, how will you tell them that the
> channel has been enabled again ? Unless "discard" here means keep the
> update but don't broadcast it ?

Excellent point, that's a simpler example of how it could break down.

>> I think all the bolted on things are pretty much overkill at this point,
>> it is unlikely that we will get any consistency in our views of the
>> routing table, but that's actually not needed to route, and we should
>> consider this a best effort gossip protocol anyway. If the routing
>> protocol is too chatty, we should make efforts towards local policies at
>> the senders of the update to reduce the number of flapping updates, not
>> build in-network deduplications. Maybe something like "eager-disable"
>> and "lazy-enable" is what we should go for, in which disables are sent
>> right away, and enables are put on an exponential backoff timeout (after
>> all what use are flappy nodes for routing?).
>
> Yes there are probably heuristics that would help reducing gossip
> traffic, and I see your point but I was thinking about doing the
> opposite: "eager-enable" and "lazy-disable", because from a sender's
> p.o.v trying to use a disabled channel is better than ignoring an
> enabled channel.

That depends on what you are trying to optimize. Your solution keeps
more channels in enabled mode, potentially increasing failures due to
channels being unavailable. I was approaching it from the other side,
since failures are on the critical path in the payment flow, they'd
result in longer delays and many more retries, which I think is annoying
too. It probably depends on the network structure, i.e., if the fanout
from the endpoints is large, missing some channels shouldn't be a
problem, in which case the many failures delaying your payment weighs
more than not finding a route (eager-disable & lazy-enable). If on the
other hand we are really relying on a huge number of flaky connections
then eager-enable & lazy-disable might get lucky and get the payment
through. I'm hoping the network will have the latter structure, because
we'd have really unpredictable behavior anyway.

We'll probably gain more insight once we start probing the network. My
expectation is that today's network is a baseline, whose resiliency and
redundancy will improve over time, hopefully swinging in favor of
trading off the speed gains over bare routability.

Cheers,
Christian
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Christian Decker

Rusty Russell  writes:
>> But only 18 000 pairs of channel updates carry actual fee and/or HTLC
>> value change. 85% of the time, we just queried information that we
>> already had!
>
> Note that this can happen in two legitimate cases:
> 1. The weekly refresh of channel_update.
> 2. A node updated too fast (A->B->A) and the ->A update caught up with the
>->B update.
>  
> Fortunately, this seems fairly easy to handle: discard the newer
> duplicate (unless > 1 week old).  For future more advanced
> reconstruction schemes (eg. INV or minisketch), we could remember the
> latest timestamp of the duplicate, so we can avoid requesting it again.

Unfortunately this assumes that you have a single update partner, and
still results in flaps, and might even result in a stuck state for some
channels.

Assume that we have a network in which a node D receives the updates
from a node A through two or more separate paths:

A --- B --- D
 \--- C ---/

And let's assume that some channel of A (c_A) is flapping (not the ones
to B and C). A will send out two updates, one disables and the other one
re-enables c_A, otherwise they are identical (timestamp and signature
are different as well of course). The flush interval in B is sufficient
to see both updates before flushing, hence both updates get dropped and
nothing apparently changed (D doesn't get told about anything from
B). The flush interval of C triggers after getting the re-enable, and D
gets the disabling update, followed by the enabling update once C's
flush interval triggers again. Worse if the connection A-C gets severed
between the updates, now C and D learned that the channel is disabled
and will not get the re-enabling update since B has dropped that one
altogether. If B now gets told by D about the disable, it'll also go
"ok, I'll disable it as well", leaving the entire network believing that
the channel is disabled.

This is really hard to debug, since A has sent a re-enabling
channel_update, but everybody is stuck in the old state.

At least locally updating timestamp and signature for identical updates
and then not broadcasting if they were the only changes would at least
prevent the last issue of overriding a dropped state with an earlier
one, but it'd still leave C and D in an inconsistent state until we have
some sort of passive sync that compares routing tables and fixes these
issues.

>> Adding a basic checksum (4 bytes for example) that covers fees and
>> HTLC min/max value to our channel range queries would be a significant
>> improvement and I will add this the open BOLT 1.1 proposal to extend
>> queries with timestamps.
>>
>> I also think that such a checksum could also be used
>> - in “inventory” based gossip messages
>> - in set reconciliation schemes: we could reconcile [channel id |
>> timestamp | checksum] first
>
> I think this is overkill?

I think all the bolted on things are pretty much overkill at this point,
it is unlikely that we will get any consistency in our views of the
routing table, but that's actually not needed to route, and we should
consider this a best effort gossip protocol anyway. If the routing
protocol is too chatty, we should make efforts towards local policies at
the senders of the update to reduce the number of flapping updates, not
build in-network deduplications. Maybe something like "eager-disable"
and "lazy-enable" is what we should go for, in which disables are sent
right away, and enables are put on an exponential backoff timeout (after
all what use are flappy nodes for routing?).

Cheers,
Christian
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-07 Thread Rusty Russell

Fabrice Drouin  writes:
> Follow-up: here's more detailed info on the data I collected and
> potential savings we could achieve:
>
> I made hourly routing table backups for 12 days, and collected routing
> information for 17 000 channel ids.
>
> There are 130 000 different channel updates :on average each channel
> has been updated 8 times. Here, “different” means that at least the
> timestamp has changed, and a node would have queried this channel
> update during its syncing process.

Side note: some implementations are also sending out updates with the
*same* timestamp.  This is not allowed...

> But only 18 000 pairs of channel updates carry actual fee and/or HTLC
> value change. 85% of the time, we just queried information that we
> already had!

Note that this can happen in two legitimate cases:
1. The weekly refresh of channel_update.
2. A node updated too fast (A->B->A) and the ->A update caught up with the
   ->B update.

Fortunately, this seems fairly easy to handle: discard the newer
duplicate (unless > 1 week old).  For future more advanced
reconstruction schemes (eg. INV or minisketch), we could remember the
latest timestamp of the duplicate, so we can avoid requesting it again.

> Adding a basic checksum (4 bytes for example) that covers fees and
> HTLC min/max value to our channel range queries would be a significant
> improvement and I will add this the open BOLT 1.1 proposal to extend
> queries with timestamps.
>
> I also think that such a checksum could also be used
> - in “inventory” based gossip messages
> - in set reconciliation schemes: we could reconcile [channel id |
> timestamp | checksum] first

I think this is overkill?

Thanks,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-04 Thread Fabrice Drouin

On Fri, 4 Jan 2019 at 04:43, ZmnSCPxj  wrote:
> > -   in set reconciliation schemes: we could reconcile [channel id |
> > timestamp | checksum] first
>
> Perhaps I misunderstand how set reconciliation works, but --- if timestamp is 
> changed while checksum is not, then it would still be seen as a set 
> difference and still require further communication rounds to discover that 
> the channel parameters have not actually changed.
>
> Perhaps it is better to reconcile [channel_id | checksum] instead, and if 
> there is a different set of channel parameters, share the set difference and 
> sort out which timestamp is later at that point.

Ah yes of course, the `timestamp` should not be included.

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-03 Thread ZmnSCPxj via Lightning-dev

Good morning,

> -   in set reconciliation schemes: we could reconcile [channel id |
> timestamp | checksum] first

Perhaps I misunderstand how set reconciliation works, but --- if timestamp is 
changed while checksum is not, then it would still be seen as a set difference 
and still require further communication rounds to discover that the channel 
parameters have not actually changed.

Perhaps it is better to reconcile [channel_id | checksum] instead, and if there 
is a different set of channel parameters, share the set difference and sort out 
which timestamp is later at that point.

Regards,
ZmnSCPxj
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-03 Thread Fabrice Drouin

Follow-up: here's more detailed info on the data I collected and
potential savings we could achieve:

I made hourly routing table backups for 12 days, and collected routing
information for 17 000 channel ids.

There are 130 000 different channel updates :on average each channel
has been updated 8 times. Here, “different” means that at least the
timestamp has changed, and a node would have queried this channel
update during its syncing process.

But only 18 000 pairs of channel updates carry actual fee and/or HTLC
value change. 85% of the time, we just queried information that we
already had!

Adding a basic checksum (4 bytes for example) that covers fees and
HTLC min/max value to our channel range queries would be a significant
improvement and I will add this the open BOLT 1.1 proposal to extend
queries with timestamps.

I also think that such a checksum could also be used
- in “inventory” based gossip messages
- in set reconciliation schemes: we could reconcile [channel id |
timestamp | checksum] first

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-02 Thread Fabrice Drouin

On Wed, 2 Jan 2019 at 18:26, Christian Decker
 wrote:
>
> For the ones that flap with a period that is long enough for the
> disabling and enabling updates being flushed, we are presented with a
> tradeoff. IIRC we (c-lightning) currently hold back disabling
> `channel_update`s until someone actually attempts to use the channel at
> which point we fail the HTLC and send out the stashed `channel_update`
> thus reducing the publicly visible flapping. For the enabling we can't
> do that, but we could think about a local policy on how much to delay a
> `channel_update` depending on the past stability of that peer. Again
> this is local policy and doesn't warrant a spec change.
>
> I think we should probably try out some policies related to when to send
> `channel_update`s and how to hide redundant updates, and then we can see
> which ones work best :-)
>
Yes, I haven't looked at how to handle this with local policies. My
hypothesis is that when you're syncing a routing table that is say one
day old, you end up querying and downloading a lot of information that
you already have, and that adding a basic checksum to our channel
queries may greatly improve this. Of course this would be much more
actionable with stats and hard numbers which I'll provide ASAP.

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-02 Thread Christian Decker

Hi Fabrice,

happy new year to you too :-)

Thanks for taking the time to collect that information. It's very much
in line with what we were expecting in that most of the updates come
from flapping channels. Your second observation that some updates only
change the timestamp is likely due to the staggered broadcast merging
multiple updates, e.g., one disabling and one enabling the channel, that
are sent very close to each other. This is the very reason we introduced
the staggering back in the days, as it limits the maximum rate of
updates a single node may produce for each of its channels.

In the second case we can probably get away with not forwarding the
update, but updating the timestamp and signature for the old
`channel_update` locally, so that we don't then flap back to an older
one should we get that in a roundabout way. That's purely a local
decision and does not warrant a spec change imho.

For the ones that flap with a period that is long enough for the
disabling and enabling updates being flushed, we are presented with a
tradeoff. IIRC we (c-lightning) currently hold back disabling
`channel_update`s until someone actually attempts to use the channel at
which point we fail the HTLC and send out the stashed `channel_update`
thus reducing the publicly visible flapping. For the enabling we can't
do that, but we could think about a local policy on how much to delay a
`channel_update` depending on the past stability of that peer. Again
this is local policy and doesn't warrant a spec change.

I think we should probably try out some policies related to when to send
`channel_update`s and how to hide redundant updates, and then we can see
which ones work best :-)

Cheers,
Christian

Fabrice Drouin  writes:
> Hello All, and Happy New Year!
>
> To understand why there is a steady stream of channel updates, even
> when fee parameters don't seem to actually change, I made hourly
> backups of the routing table of one of our nodes, and compared these
> routing tables to see what exactly was being modified.
>
> It turns out that:
> - there are a lot of disable/enable/disable etc…. updates which are
> just sent when a channel is disabled then enabled again (when nodes go
> offline for example ?). This can happen
> there are also a lot of updates that don’t change anything (just a new
> timestamp and signatures but otherwise same info), up to several times
> a day for the same channel id
>
> In both cases we end up syncing info that we already have.
> I don’t know yet how best to use this when syncing routing tables, but
> I thought it was worth sharing anyway. A basic checksum that does not
> cover all fields, but only fees and HTLC min/max values could probably
> be used to improve routing table sync ?
>
> Cheers,
>
> Fabrice
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

[Lightning-dev] Quick analysis of channel_update data

2019-01-02 Thread Fabrice Drouin

Hello All, and Happy New Year!

To understand why there is a steady stream of channel updates, even
when fee parameters don't seem to actually change, I made hourly
backups of the routing table of one of our nodes, and compared these
routing tables to see what exactly was being modified.

It turns out that:
- there are a lot of disable/enable/disable etc…. updates which are
just sent when a channel is disabled then enabled again (when nodes go
offline for example ?). This can happen
there are also a lot of updates that don’t change anything (just a new
timestamp and signatures but otherwise same info), up to several times
a day for the same channel id

In both cases we end up syncing info that we already have.
I don’t know yet how best to use this when syncing routing tables, but
I thought it was worth sharing anyway. A basic checksum that does not
cover all fields, but only fees and HTLC min/max values could probably
be used to improve routing table sync ?

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

Re: [Lightning-dev] Quick analysis of channel_update data

[Lightning-dev] Quick analysis of channel_update data

16 matches

Site Navigation

Mail list logo

Footer information