Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
> That's not 100% reliable at all. How long to you want for the new gossip? So you know it's a new channel, with a new capacity (look at the on-chain output), between the same parties (assuming ppl use that multi-sig signal). If you attempt to route over it and have a stale policy, you'll get the latest policy. Therefore, it doesn't really matter how long you wait, as you aren't removing the channel from your graph, as you know it didn't really close. If you don't see a message after 2 weeks or w/e, then you mark it as a zombie just like any other channel. -- Laolu On Wed, Jun 29, 2022 at 5:35 PM Rusty Russell wrote: > Olaoluwa Osuntokun writes: > > Hi Rusty, > > > > Thanks for the feedback! > > > >> This is over-design: if you fail to get reliable gossip, your routing > will > >> suffer anyway. Nothing new here. > > > > Idk, it's pretty simple: you're already watching for closes, so if a > close > > looks a certain way, it's a splice. When you see that, you can even take > > note of the _new_ channel size (funds added/removed) and update your > > pathfinding/blindedpaths/hophints accordingly. > > Why spam the chain? > > > If this is an over-designed solution, that I'd categorize _only_ waiting > N > > blocks as wishful thinking, given we have effectively no guarantees w.r.t > > how long it'll take a message to propagate. > > Sure, it's a simplification on "wait 6 blocks plus 30 minutes". > > > If by routing you mean a sender, then imo still no: you don't necessarily > > need _all_ gossip, just the latest policies of the nodes you route most > > frequently to. On top of that, since you can get the latest policy each > time > > you incur a routing failure, as you make payments, you'll get the latest > > policies of the nodes you care about over time. Also consider that you > might > > fail to get "reliable" gossip, simply just due to your peer neighborhood > > aggressively rate limiting gossip (they only allow 1 update a day for a > > node, you updated your fee, oops, no splice msg for you). > > There's no ratelimiting on new channel announcements? > > > So it appears you don't agree that the "wait N blocks before you close > your > > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? > > Because it's simple. > > >>From my PoV, the whole point of even signalling that a splice is on > going, > > is for the sender's/receivers: they can continue to send/recv payments > over > > the channel while the splice is in process. It isn't that a node isn't > > getting any gossip, it's that if the node fails to obtain the gossip > message > > within the N block period of time, then the channel has effectively > closed > > from their PoV, and it may be an hour+ until it's seen as a usable (new) > > channel again. > > Sure. If you want to not forget channels at all on close, that works too. > > > If there isn't a 100% reliable way to signal that a splice is in > progress, > > then this disincentives its usage, as routers can lose out on potential > fee > > revenue, and sends/receivers may grow to favor only very long lived > > channels. IMO _only_ having a gossip message simply isn't enough: > there're > > no real guarantees w.r.t _when_ all relevant parties will get your gossip > > message. So why not give them a 100% reliable on chain signal that: > > something is in progress here, stay tuned for the gossip message, > whenever > > you receive that. > > That's not 100% reliable at all. How long to you want for the new > gossip? > > Just treat every close as signalling "stay tuned for the gossip > message". That's reliable. And simple. > > Cheers, > Rusty. > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Hi Lisa, > Adding a noticeable on-chain signal runs counter to the goal of the move > to taproot / gossip v2, which is to make lightning's onchain footprint > indistinguishable from any other onchain usage My model of gossip v2 is something like: * there's no longer a 1:1 mapping of channels and UTXOs * verifiers don't actually care if the advertised UTXO is actually a channel or not * verifiers aren't watching the chain for spends, as channel advertisements expire after 2 weeks or w/e * there might be a degree of "leverage" allowing someone to advertise a 1 BTC UTXO as having 10 BTC capacity (or w/e) So in this model, splicing on the gossip network wouldn't really be an explicit event. Since I'm free to advertise a series of channels that might not actually exist, I can just say: ok, this set of 5 channels is now actually 2 channels, and you can route a bit more over them. In this world, re-organizing by a little corner of the channel graph isn't necessarily tied to making a series of on-chain transactions. In the realm of the gossip network as it's defined today, the act of splicing is already itself a noticeable chain signal: I see a channel close, then another one advertised that uses that old channel as inputs, and the closing and opening transactions are the same. As a result, for _public_ channels any of the chain signals I listed above don't actually give away any additional information: splices are already identifiable (in theory). I don't disagree that waiting N blocks is probably "good enough" for most cases (ignoring block storms, rare long intervals between blocks, etc, etc). Instead this is suggested in the spirit of a belt-and-suspenders approach: if I can do something to make the signal 100% reliable, that doesn't add extra bytes to the chain, and doesn't like additional information for public channels (the only case where the message even matters), then why not? -- Laolu On Wed, Jun 29, 2022 at 5:43 PM lisa neigut wrote: > Adding a noticeable on-chain signal runs counter to the goal of the move > to taproot / gossip v2, which is to make lightning's onchain footprint > indistinguishable from > any other onchain usage. > > I'm admittedly a bit confused as to why onchain signals are even being > seriously > proposed. Aside from "infallibility", is there another reason for > suggesting > we add an onchain detectable signal for this? Seems heavy handed imo, > given > that the severity of a comms failure is pretty minimal (*potential* for > lost routing fees). > > > So it appears you don't agree that the "wait N blocks before you close > your > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? > > fwiw I seem to remember seeing that it takes ~an hour for gossip to > propagate > (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first > estimate. > I trust we'll have time to tune this after we've had some real-world > experience with them. > > Further, we can always add more robust signaling later, if lost routing > fees turns > out to be a huge issue. > > Finally, worth noting that Alex Myer's minisketch project may well > help/improve gossip > reconciliation efficiency to the point where gossip reliability is less > of an issue. > > ~nifty > > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Had another thought: if you've seen a chain close but also have a gossip message that indicates this is a splice, you SHOULD propagate that gossip more urgently/widely than any other gossip you've got. Adding an urgency metric to gossip is fuzzy to enforce... *handwaves*. You *do* get the onchain signal, we just change the behavior of the secondary information system instead of embedding the info into the chain.. "Spamming" gossip with splices expensive -- there's a real-world cost (onchain fees) to closing a channel (the signal to promote/prioritize a gossip msg) which cuts down on the ability to send out these 'urgent' messages with any frequency. ~nifty On Wed, Jun 29, 2022 at 7:43 PM lisa neigut wrote: > Adding a noticeable on-chain signal runs counter to the goal of the move > to taproot / gossip v2, which is to make lightning's onchain footprint > indistinguishable from > any other onchain usage. > > I'm admittedly a bit confused as to why onchain signals are even being > seriously > proposed. Aside from "infallibility", is there another reason for > suggesting > we add an onchain detectable signal for this? Seems heavy handed imo, > given > that the severity of a comms failure is pretty minimal (*potential* for > lost routing fees). > > > So it appears you don't agree that the "wait N blocks before you close > your > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? > > fwiw I seem to remember seeing that it takes ~an hour for gossip to > propagate > (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first > estimate. > I trust we'll have time to tune this after we've had some real-world > experience with them. > > Further, we can always add more robust signaling later, if lost routing > fees turns > out to be a huge issue. > > Finally, worth noting that Alex Myer's minisketch project may well > help/improve gossip > reconciliation efficiency to the point where gossip reliability is less > of an issue. > > ~nifty > > ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Adding a noticeable on-chain signal runs counter to the goal of the move to taproot / gossip v2, which is to make lightning's onchain footprint indistinguishable from any other onchain usage. I'm admittedly a bit confused as to why onchain signals are even being seriously proposed. Aside from "infallibility", is there another reason for suggesting we add an onchain detectable signal for this? Seems heavy handed imo, given that the severity of a comms failure is pretty minimal (*potential* for lost routing fees). > So it appears you don't agree that the "wait N blocks before you close your channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? fwiw I seem to remember seeing that it takes ~an hour for gossip to propagate (no link sorry). Given that, 2x an hour or 12 blocks is a reasonable first estimate. I trust we'll have time to tune this after we've had some real-world experience with them. Further, we can always add more robust signaling later, if lost routing fees turns out to be a huge issue. Finally, worth noting that Alex Myer's minisketch project may well help/improve gossip reconciliation efficiency to the point where gossip reliability is less of an issue. ~nifty ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Olaoluwa Osuntokun writes: > Hi Rusty, > > Thanks for the feedback! > >> This is over-design: if you fail to get reliable gossip, your routing will >> suffer anyway. Nothing new here. > > Idk, it's pretty simple: you're already watching for closes, so if a close > looks a certain way, it's a splice. When you see that, you can even take > note of the _new_ channel size (funds added/removed) and update your > pathfinding/blindedpaths/hophints accordingly. Why spam the chain? > If this is an over-designed solution, that I'd categorize _only_ waiting N > blocks as wishful thinking, given we have effectively no guarantees w.r.t > how long it'll take a message to propagate. Sure, it's a simplification on "wait 6 blocks plus 30 minutes". > If by routing you mean a sender, then imo still no: you don't necessarily > need _all_ gossip, just the latest policies of the nodes you route most > frequently to. On top of that, since you can get the latest policy each time > you incur a routing failure, as you make payments, you'll get the latest > policies of the nodes you care about over time. Also consider that you might > fail to get "reliable" gossip, simply just due to your peer neighborhood > aggressively rate limiting gossip (they only allow 1 update a day for a > node, you updated your fee, oops, no splice msg for you). There's no ratelimiting on new channel announcements? > So it appears you don't agree that the "wait N blocks before you close your > channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? Because it's simple. >>From my PoV, the whole point of even signalling that a splice is on going, > is for the sender's/receivers: they can continue to send/recv payments over > the channel while the splice is in process. It isn't that a node isn't > getting any gossip, it's that if the node fails to obtain the gossip message > within the N block period of time, then the channel has effectively closed > from their PoV, and it may be an hour+ until it's seen as a usable (new) > channel again. Sure. If you want to not forget channels at all on close, that works too. > If there isn't a 100% reliable way to signal that a splice is in progress, > then this disincentives its usage, as routers can lose out on potential fee > revenue, and sends/receivers may grow to favor only very long lived > channels. IMO _only_ having a gossip message simply isn't enough: there're > no real guarantees w.r.t _when_ all relevant parties will get your gossip > message. So why not give them a 100% reliable on chain signal that: > something is in progress here, stay tuned for the gossip message, whenever > you receive that. That's not 100% reliable at all. How long to you want for the new gossip? Just treat every close as signalling "stay tuned for the gossip message". That's reliable. And simple. Cheers, Rusty. ___ Lightning-dev mailing list Lightning-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Hi Rusty, Thanks for the feedback! > This is over-design: if you fail to get reliable gossip, your routing will > suffer anyway. Nothing new here. Idk, it's pretty simple: you're already watching for closes, so if a close looks a certain way, it's a splice. When you see that, you can even take note of the _new_ channel size (funds added/removed) and update your pathfinding/blindedpaths/hophints accordingly. If this is an over-designed solution, that I'd categorize _only_ waiting N blocks as wishful thinking, given we have effectively no guarantees w.r.t how long it'll take a message to propagate. If by routing you mean a routing node then: no, a routing node doesn't even really need the graph at all to do their job. If by routing you mean a sender, then imo still no: you don't necessarily need _all_ gossip, just the latest policies of the nodes you route most frequently to. On top of that, since you can get the latest policy each time you incur a routing failure, as you make payments, you'll get the latest policies of the nodes you care about over time. Also consider that you might fail to get "reliable" gossip, simply just due to your peer neighborhood aggressively rate limiting gossip (they only allow 1 update a day for a node, you updated your fee, oops, no splice msg for you). So it appears you don't agree that the "wait N blocks before you close your channels" isn't a fool proof solution? Why 12 blocks, why not 15? Or 144? >From my PoV, the whole point of even signalling that a splice is on going, is for the sender's/receivers: they can continue to send/recv payments over the channel while the splice is in process. It isn't that a node isn't getting any gossip, it's that if the node fails to obtain the gossip message within the N block period of time, then the channel has effectively closed from their PoV, and it may be an hour+ until it's seen as a usable (new) channel again. If there isn't a 100% reliable way to signal that a splice is in progress, then this disincentives its usage, as routers can lose out on potential fee revenue, and sends/receivers may grow to favor only very long lived channels. IMO _only_ having a gossip message simply isn't enough: there're no real guarantees w.r.t _when_ all relevant parties will get your gossip message. So why not give them a 100% reliable on chain signal that: something is in progress here, stay tuned for the gossip message, whenever you receive that. -- Laolu On Tue, Jun 28, 2022 at 6:40 PM Rusty Russell wrote: > Hi Roasbeef, > > This is over-design: if you fail to get reliable gossip, your routing > will suffer anyway. Nothing new here. > > And if you *know* you're missing gossip, you can simply delay onchain > closures for longer: since nodes should respect the old channel ids for > a while anyway. > > Matt's proposal to simply defer treating onchain closes is elegant and > minimal. We could go further and relax requirements to detect onchain > closes at all, and optionally add a perm close message. > > Cheers, > Rusty. > > Olaoluwa Osuntokun writes: > > Hi y'all, > > > > This mail was inspired by this [1] spec PR from Lisa. At a high level, it > > proposes the nodes add a delay between the time they see a channel > closed on > > chain, to when they remove it from their local channel graph. The motive > > here is to give the gossip message that indicates a splice is in process, > > "enough" time to propagate through the network. If a node can see this > > message before/during the splicing operation, then they'll be able relate > > the old and the new channels, meaning it's usable again by > senders/receiver > > _before_ the entire chain of transactions confirms on chain. > > > > IMO, this sort of arbitrary delay (expressed in blocks) won't actually > > address the issue in practice. The proposal suffers from the following > > issues: > > > > 1. 12 blocks is chosen arbitrarily. If for w/e reason an announcement > > takes longer than 2 hours to reach the "economic majority" of > > senders/receivers, then the channel won't be able to mask the splicing > > downtime. > > > > 2. Gossip propagation delay and offline peers. These days most nodes > > throttle gossip pretty aggressively. As a result, a pair of nodes doing > > several in-flight splices (inputs become double spent or something, so > > they need to try a bunch) might end up being rate limited within the > > network, causing the splice update msg to be lost or delayed > significantly > > (IIRC CLN resets these values after 24 hours). On top of that, if a > peer > > is offline for too long (think mobile senders), then they may miss the > > update all together as most nodes don't do a full historical > > _channel_update_ dump anymore. > > > > In order to resolve these issues, I think instead we need to rely on the > > primary splicing signal being sourced from the chain itself. In other > words, > > if I see a channel close, and a closing transaction "looks"
Re: [Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Hi Roasbeef, This is over-design: if you fail to get reliable gossip, your routing will suffer anyway. Nothing new here. And if you *know* you're missing gossip, you can simply delay onchain closures for longer: since nodes should respect the old channel ids for a while anyway. Matt's proposal to simply defer treating onchain closes is elegant and minimal. We could go further and relax requirements to detect onchain closes at all, and optionally add a perm close message. Cheers, Rusty. Olaoluwa Osuntokun writes: > Hi y'all, > > This mail was inspired by this [1] spec PR from Lisa. At a high level, it > proposes the nodes add a delay between the time they see a channel closed on > chain, to when they remove it from their local channel graph. The motive > here is to give the gossip message that indicates a splice is in process, > "enough" time to propagate through the network. If a node can see this > message before/during the splicing operation, then they'll be able relate > the old and the new channels, meaning it's usable again by senders/receiver > _before_ the entire chain of transactions confirms on chain. > > IMO, this sort of arbitrary delay (expressed in blocks) won't actually > address the issue in practice. The proposal suffers from the following > issues: > > 1. 12 blocks is chosen arbitrarily. If for w/e reason an announcement > takes longer than 2 hours to reach the "economic majority" of > senders/receivers, then the channel won't be able to mask the splicing > downtime. > > 2. Gossip propagation delay and offline peers. These days most nodes > throttle gossip pretty aggressively. As a result, a pair of nodes doing > several in-flight splices (inputs become double spent or something, so > they need to try a bunch) might end up being rate limited within the > network, causing the splice update msg to be lost or delayed significantly > (IIRC CLN resets these values after 24 hours). On top of that, if a peer > is offline for too long (think mobile senders), then they may miss the > update all together as most nodes don't do a full historical > _channel_update_ dump anymore. > > In order to resolve these issues, I think instead we need to rely on the > primary splicing signal being sourced from the chain itself. In other words, > if I see a channel close, and a closing transaction "looks" a certain way, > then I know it's a splice. This would be used in concert w/ any new gossip > messages, as the chain signal is a 100% foolproof way of letting an aware > peer know that a splice is actually happening (not a normal close). A chain > signal doesn't suffer from any of the gossip/time related issues above, as > the signal is revealed at the same time a peer learns of a channel > close/splice. > > Assuming, we agree that a chain signal has some sort of role in the ultimate > plans for splicing, we'd need to decide on exactly _what_ such a signal > looks like. Off the top, a few options are: > > 1. Stuff something in the annex. Works in theory, but not in practice, as > bitcoind (being the dominant full node implementation on the p2p network, > as well as what all the miners use) treats annexes as non-standard. Also > the annex itself might have some fundamental issues that get in the way of > its use all together [2]. > > 2. Re-use the anchors for this purpose. Anchor are nice as they allow for > 1st/2nd/3rd party CPFP. As a splice might have several inputs and outputs, > both sides will want to make sure it gets confirmed in a timely manner. > Ofc, RBF can be used here, but that requires both sides to be online to > make adjustments. Pre-signing can work too, but the effectiveness > (minimizing chain cost while expediting confirmation) would be dependent > on the fee step size. > > In this case, we'd use a different multi-sig output (both sides can rotate > keys if they want to), and then roll the anchors into this splicing > transaction. Given that all nodes on the network know what the anchor size > is (assuming feature bit understanding), they're able to realize that it's > actually a splice, and they don't need to remove it from the channel graph > (yet). > > 3. Related to the above: just re-use the same multi-sig output. If nodes > don't care all that much about rotating these keys, then they can just use > the same output. This is trivially recognizable by nodes, as they already > know the funding keys used, as they're in the channel_announcement. > > 4. OP_RETURN (yeh, I had to list it). Self explanatory, push some bytes in > an OP_RETURN and use that as the marker. > > 5. Fiddle w/ the locktime+sequence somehow to make it identifiable to > verifiers. This might run into some unintended interactions if the inputs > provided have either relative or absolute lock times. There might also be > some interaction w/ the main constructing for eltoo (uses the locktime). > > Of all the options, I think #2 makes the
[Lightning-dev] Achieving Zero Downtime Splicing in Practice via Chain Signals
Hi y'all, This mail was inspired by this [1] spec PR from Lisa. At a high level, it proposes the nodes add a delay between the time they see a channel closed on chain, to when they remove it from their local channel graph. The motive here is to give the gossip message that indicates a splice is in process, "enough" time to propagate through the network. If a node can see this message before/during the splicing operation, then they'll be able relate the old and the new channels, meaning it's usable again by senders/receiver _before_ the entire chain of transactions confirms on chain. IMO, this sort of arbitrary delay (expressed in blocks) won't actually address the issue in practice. The proposal suffers from the following issues: 1. 12 blocks is chosen arbitrarily. If for w/e reason an announcement takes longer than 2 hours to reach the "economic majority" of senders/receivers, then the channel won't be able to mask the splicing downtime. 2. Gossip propagation delay and offline peers. These days most nodes throttle gossip pretty aggressively. As a result, a pair of nodes doing several in-flight splices (inputs become double spent or something, so they need to try a bunch) might end up being rate limited within the network, causing the splice update msg to be lost or delayed significantly (IIRC CLN resets these values after 24 hours). On top of that, if a peer is offline for too long (think mobile senders), then they may miss the update all together as most nodes don't do a full historical _channel_update_ dump anymore. In order to resolve these issues, I think instead we need to rely on the primary splicing signal being sourced from the chain itself. In other words, if I see a channel close, and a closing transaction "looks" a certain way, then I know it's a splice. This would be used in concert w/ any new gossip messages, as the chain signal is a 100% foolproof way of letting an aware peer know that a splice is actually happening (not a normal close). A chain signal doesn't suffer from any of the gossip/time related issues above, as the signal is revealed at the same time a peer learns of a channel close/splice. Assuming, we agree that a chain signal has some sort of role in the ultimate plans for splicing, we'd need to decide on exactly _what_ such a signal looks like. Off the top, a few options are: 1. Stuff something in the annex. Works in theory, but not in practice, as bitcoind (being the dominant full node implementation on the p2p network, as well as what all the miners use) treats annexes as non-standard. Also the annex itself might have some fundamental issues that get in the way of its use all together [2]. 2. Re-use the anchors for this purpose. Anchor are nice as they allow for 1st/2nd/3rd party CPFP. As a splice might have several inputs and outputs, both sides will want to make sure it gets confirmed in a timely manner. Ofc, RBF can be used here, but that requires both sides to be online to make adjustments. Pre-signing can work too, but the effectiveness (minimizing chain cost while expediting confirmation) would be dependent on the fee step size. In this case, we'd use a different multi-sig output (both sides can rotate keys if they want to), and then roll the anchors into this splicing transaction. Given that all nodes on the network know what the anchor size is (assuming feature bit understanding), they're able to realize that it's actually a splice, and they don't need to remove it from the channel graph (yet). 3. Related to the above: just re-use the same multi-sig output. If nodes don't care all that much about rotating these keys, then they can just use the same output. This is trivially recognizable by nodes, as they already know the funding keys used, as they're in the channel_announcement. 4. OP_RETURN (yeh, I had to list it). Self explanatory, push some bytes in an OP_RETURN and use that as the marker. 5. Fiddle w/ the locktime+sequence somehow to make it identifiable to verifiers. This might run into some unintended interactions if the inputs provided have either relative or absolute lock times. There might also be some interaction w/ the main constructing for eltoo (uses the locktime). Of all the options, I think #2 makes the most sense: we already use anchors to be able to do fee bumping after-the-fact for closing transactions, so why not inherit them here. They make the splicing transaction slightly larger, so maybe #3 (or something else) is a better choice. The design space for spicing is preeetty large, so I figure the most productive route might be discussing isolated aspects of it at a time. Personally, I'm not suuuper caught up w/ what the latest design drafts are (aside from convos at the recent LN Dev Summit), but from my PoV, how to communicate the splice to other peers has been an outstanding design question. [1]: https://github.com/lightning/bolts/pull/1004 [2]: