Re: [Lightning-dev] SURBs as a Solution for Protocol-Level Payment ACKs

2019-02-18 Thread Rusty Russell
Olaoluwa Osuntokun  writes:
> Hi y'all,
>
> Recently we've started to do more design work related to the Sphinx packet
> (EOB format, rendezvous protocol). This prompted me to re-visit the original
> Sphinx paper to refresh my memory w.r.t some of the finer details of the
> protocol.  While I was re-reading the paper, I realized that we may be able
> to use use SURBs (single-use-reply-blocks) to implement a "payment ACK" for
> each sent HTLC.
>
> (it's worth mentioning that switching to HORNET down the line would solve
> this problem as well since the receiver already gets a multi-use backwards
> route that they can use to send information back to the receiver)

I think HORNET is a better way forward for soft errors, since using the
same circuit is *way* more reliable (Christian indicated most probe
failures are due to disconnected nodes, not capacity).

I'd like to see us work towards that instead, at least in baby steps.

> Right now HTLC routing is mainly a game of "send and hope it arrives", as
> you have no clear indication of the _arrival_ of an HTLC at the destination.
> Instead, you only receive a protocol level message if the HTLC failed for
> w/e reason, or if it was successfully redeemed.  As part of BOLT 1.1, it was
> agreed upon that we should implement some sort of "payment ACK" feature. A
> payment ACK scheme is strongly desired as it:
>
>   * Allows the sender to actually know when a payment has reached the
> receiver which is useful for many higher level protocols. Atm, the
> sender is unable to distinguish an HTLC being "black holed" from one
> that's actually reached the sender, and they're just holding on to it.

Agreed, though in the long run we'll have to do something about that.

>   * AMP implementations would be aided by being able to receive feedback on
> successfully routed splits. If we're able to have the receiver ACK each
> partial payment, then implementations can more aggressively split
> payments as they're able to gain assurance that the first 2 BTC of 5
> total have actually reached the sender, and weren't black holed.

Yes, I suspect this will quickly get messy.  Sender wants longer
timeouts for AMP, network definitely doesn't.  In my current draft I
chose 60 seconds for the timeout, but that's a compromise.

>   * Enforcing and relying on ACKs may also thwart silly games receivers
> might play, claiming that the HTLC "didn't actually arrive".

And general debugging and diag as the network gets larger.

> Some also call this feature a "soft error" as a possible implementation
> might to re-use the existing onion error protocol we've deployed today.  For
> reference, in order to send back errors back long the route in a way that
> doesn't reveal the sender of the HTLC to the receiver (or any of the
> intermediate nodes) we re-use the shared secret each hop has derived, and
> onion wrap a MAC'd error to the sender. Each hop can't actually check that
> they've received a well formed error, but the sender is able to attribute an
> error to a node in the route based on which shared secret they're able to
> check the MAC with.

Either way, someone should spec that :)

Cheers,
Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


[Lightning-dev] Multi-frame sphinx onion format

2019-02-18 Thread Christian Decker
Heya everybody,

during the spec meeting in Adelaide we decided that we'd like to extend
our current onion-routing capabilities with a couple of new features,
such as rendez-vous routing, spontaneous payments, multi-part payments,
etc. These features rely on two changes to the current onion format:
bigger per-hop payloads (in the form of multi-frame payloads) and a more
modern encoding (given by the TLV encoding).

In the following I will explain my proposal on how to extend the per-hop
payload from the current 65 bytes (which include realm and HMAC) to
multiples.

Until now we had a 1-to-1 relationship between a 65 byte segment of
payload and a hop in the route. Since this is no longer the case, I
propose we call the 65 byte segment a frame, to differentiate it from a
hop in the route, hence the name multi-frame onion. The creation and
decoding process doesn't really change at all, only some of the
parameters.

When constructing the onion, the sender currently always right-shifts by
a single 65 byte frame, serializes the payload, and encrypts using the
ChaCha20 stream. In parallel it also generates the fillers (basically 0s
that get appended and encrypted by the processing nodes, in order to get
matching HMACs), these are also shifted by a single 65 byte frame on
each hop. The change in the generation comes in the form of variable
shifts for both the payload serialization and filler generation,
depending on the payload size. So if the payload fits into 32 bytes
nothing changes, if the payload is bigger, we just use additional frames
until it fits. The payload is padded with 0s, the HMAC remains as the
last 32 bytes of the payload, and the realm stays at the first
byte. This gives us

> payload_size = num_frames * 65 byte - 1 byte (realm) - 32 bytes (hmac)

The realm byte encodes both the payload format as well as how many
additional frames were used to encode the payload. The MSB 4 bits encode
the number of frames used, while the 4 LSB bits encode the realm/payload
format.

The decoding of an onion packet pretty much stays the same, the
receiving node generates the shared secret, then generates the ChaCha20
stream, and decrypts the packet (and additional padding that matches the
filler the sender generated for HMACs). It can then read the realm byte,
and knows how many frames to read, and how many frames it needs to left-
shift in order to derive the next onion.

This is a competing proposal with the proposal by roasbeef on the
lightning-onion repo [1], but I think it is superior in a number of
ways. The major advantage of this proposal is that the payload is in one
contiguous memory region after the decryption, avoiding re-assembly of
multiple parts and allowing zero-copy processing of the data. It also
avoids multiple decryption steps, and does not waste space on multiple,
useless, HMACs. I also believe that this proposal is simpler than [1],
since it doesn't require re-assembly, and creates a clear distinction
between payload units and hops.

To show that this proposal actually works, and is rather simple, I went
ahead and implemented it for c-lightning [2] and lnd [3] (sorry ACINQ,
my scala is not sufficient to implement if for eclair). Most of the code
changes are preparation for variable size payloads alongside the legacy
v0 payloads we used so far, the relevant commits that actually change
the generation of the onion are [4] and [5] for c-lightning and lnd
respectively.

I'm hoping that this proposal proves to be useful, and that you agree
about the advantages I outlined above. I'd also like to mention that,
while this is working, I'm open to suggestions :-)

Cheers,
Christian

[1] https://github.com/lightningnetwork/lightning-onion/pull/31
[2] https://github.com/ElementsProject/lightning/pull/2363
[3] https://github.com/lightningnetwork/lightning-onion/pull/33
[4] 
https://github.com/ElementsProject/lightning/pull/2363/commits/aac29daeeb5965ae407b9588cd599f38291c0c1f
[5] 
https://github.com/lightningnetwork/lightning-onion/pull/33/commits/216c09c257d1a342c27c1e85ef6653559ef39314
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-02-18 Thread Rusty Russell
BTW, I took a snapshot of our gossip store from two weeks back, which
simply stores all gossip in order (compacting every week or so).

channel_updates which updated existing channels: 17766
... which changed *only* the timestamps: 12644
... which were a week since the last: 7233
... which only changed the disable/enable: 4839

So there are about 5100 timestamp-only updates less than a week apart
(about 2000 are 1036 seconds apart, who is this?).

1. I'll look at getting even more conservative with flapping (120second
   delay if we've just sent an update) but that doesn't seem to be the
   majority of traffic.
2. I'll also slow down refreshes to every 12 days, rather than 7, but
   again it's only a marginal change.

But basically, the majority of updates I saw two weeks ago are actually
refreshes, not spam.

Hope that adds something?
Rusty.

Fabrice Drouin  writes:
> Additional info on channel_update traffic:
>
> Comparing daily backups of routing tables over the last 2 weeks shows
> that nearly all channels get at least a new update every day. This
> means that channel_update traffic is not primarily cause by nodes
> publishing new updates when channel are about to become stale:
> otherwise we would see 1/14th of our channels getting a new update on
> the first day, then another 1/14th on the second day and so on.This is
> confirmed by comparing routing table backups over a single day: nearly
> all channels were updated, one average once, with an update that
> almost always does not include new information.
>
> It could be caused by "flapping" channels, probably because the hosts
> that are hosting them are not reliable (as in is often offline).
>
> Heuristics can be used to improve traffic but it's orhtogonal to the
> problem of improving our current sync protocol.
> Also, these heuristics would probaly be used to close channels to
> unreliable nodes instead of filtering/delaying publishing updates for
> them.
>
> Finally, this is not just obsessing over bandwidth (though bandwidth
> is a real issue for most mobile users). I'm also over obsessing over
> startup time and payment UX :), because they do matter a lot for
> mobile users, and would like to push the current gossip design as far
> as it can go. I also think that we'll face the same issue when
> designing inventory messages for channel_update messages.
>
> Cheers,
>
> Fabrice
>
>
>
> On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
>>
>> Fabrice Drouin  writes:
>> > I think there may even be a simpler case where not replacing updates
>> > will result in nodes not knowing that a channel has been re-enabled:
>> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
>> > it, U3 enables it again and is the same as U1. If you discard it and
>> > just keep U1, and your peer has U2, how will you tell them that the
>> > channel has been enabled again ? Unless "discard" here means keep the
>> > update but don't broadcast it ?
>>
>> This can only happen if you happen to lose connection to the peer(s)
>> which sent U2 before it sends U3.
>>
>> Again, this corner case penalizes flapping channels.  If we also
>> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
>>
>> > But then there's a risk that nodes would discard channels as stale
>> > because they don't get new updates when they reconnect.
>>
>> You need to accept redundant updates after 1 week, I think.
>>
>> Cheers,
>> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev