Re: [Lightning-dev] Improving the initial gossip sync

2018-02-05 Thread Fabrice Drouin
Hi,

On 5 February 2018 at 14:02, Christian Decker
 wrote:
> Hi everyone
>
> The feature bit is even, meaning that it is required from the peer,
> since we extend the `init` message itself, and a peer that does not
> support this feature would be unable to parse any future extensions to
> the `init` message. Alternatively we could create a new
> `set_gossip_timestamp` message that is only sent if both endpoints
> support this proposal, but that could result in duplicate messages being
> delivered between the `init` and the `set_gossip_timestamp` message and
> it'd require additional messages.

We chose the other aproach and propose to use an optional feature

> The reason I'm using timestamp and not the blockheight in the short
> channel ID is that we already use the timestamp for pruning. In the
> blockheight based timestamp we might ignore channels that were created,
> then not announced or forgotten, and then later came back and are now
> stable.

Just to be clear, you propose to use the timestamp of the most recent
channel updates to filter
the associated channel announcements ?

> I hope this rather simple proposal is sufficient to fix the short-term
> issues we are facing with the initial sync, while we wait for a real
> sync protocol. It is definitely not meant to allow perfect
> synchronization of the topology between peers, but then again I don't
> believe that is strictly necessary to make the routing successful.
>
> Please let me know what you think, and I'd love to discuss Pierre's
> proposal as well.
>
> Cheers,
> Christian

Our idea is to group channel announcements by "buckets", create a
filter for each bucket, exchange and use them to filter out channel
announcements.

We would add a new `use_channel_announcement_filters` optional feature
bit (7 for example), and a new `channel_announcement_filters` message.

When a node that supports channel announcement filters receives an
`init` message with the `use_channel_announcement_filters` bit set, it
sends back its channel filters.

When a node that supports channel announcement filters receives
a`channel_announcement_filters` message, it uses it to filter channel
announcements (and, implicitly ,channel updates) before sending them.

The filters we have in mind are simple:
- Sort announcements by short channel id
- Compute a marker height, which is `144 * ((now - 7 * 144) / 144)`
(we round to multiples of 144 to make sync easier)
- Group channel announcements that were created before this marker by
groups of 144 blocks
- Group channel announcements that were created after this marker by
groups of 1 block
- For each group, sort and concatenate all channel announcements short
channel ids and hash the result (we could use sha256, or the first 16
bytes of the sha256 hash)

The new `channel_announcement_filters` would then be a list of
(height, hash) pairs ordered by increasing heights.

This implies that implementation can easily sort announcements by
short channel id, which should not be very difficult.
An additional step could be to send all short channel ids for all
groups for which the group hash did not match. Alternatively we could
use smarter filters

The use case we have in mind is mobile nodes, or more generally nodes
which are often offline and need to resync very often.

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Improving the initial gossip sync

2018-02-07 Thread Fabrice Drouin
Hi,

Suppose you partition nodes into 3 generic roles:
- payers: they mostly send payments, are typically small and operated
by end users, and are offline quite a lot
- relayers: they mostly relay payments, and would be online most of
the time (if they're too unreliable other nodes will eventually close
their channels with them)
- payees: they mostly receive payments, how often they can be online
is directly link to their particular mode of operations (since you
need to be online to receive payments)

Of course most nodes would play more or less all roles. However,
mobile nodes would probably be mostly "payers", and they have specific
properties:
- if they don't relay payments they don't have to be announced. There
could be millions of mobile nodes that would have no impact on the
size of the routing table
- it does not impact the network when they're offline
- but they need an accurate routing table. This is very different from
nodes who mostly relay or accept payements
- they would be connected to a very small number of nodes
- they would typically be online for just  a few hours every day, but
could be stopped/paused/restarted many times a day

Laolu wrote:
> So I think the primary distinction between y'alls proposals is that
> cdecker's proposal focuses on eventually synchronizing all the set of
> _updates_, while Fabrice's proposal cares *only* about the newly created
> channels. It only cares about new channels as the rationale is that if once
> tries to route over a channel with a state channel update for it, then
> you'll get an error with the latest update encapsulated.

If you have one filter per day and they don't match (because your peer
has channels that you missed, or
 have been closed and you were not aware of it) then you will receive
all channel announcements for
this particular day, and the associated updates

Laolu wrote:
> I think he's actually proposing just a general update horizon in which
> vertexes+edges with a lower time stamp just shouldn't be set at all. In the
> case of an old zombie channel which was resurrected, it would eventually be
> re-propagated as the node on either end of the channel should broadcast a
> fresh update along with the original chan ann.

Yes but it could take a long time. It may be worse on testnet since it
seems that nodes
don't change their fees very often. "Payer nodes" need a good routing
table (as opposed
to "relayers" which could work without one if they never initiate payments)

Laolu wrote:
> This seems to assume that both nodes have a strongly synchronized view of
> the network. Otherwise, they'll fall back to sending everything that went on
> during the entire epoch regularly. It also doesn't address the zombie churn
> issue as they may eventually send you very old channels you'll have to deal
> with (or discard).

Yes I agree that for nodes which have connections to a lot of peers,
strongly synchronized routing tables is
harder to achieve since a small change may invalidate an entire
bucket. Real queryable filters would be much
better, but worst case scenario is we've sent an additionnal 30 Kb or
o of sync messages.
(A very naive filter would be sort + pack all short ids for example)

But we focus on nodes which are connected to a very small number of
peers, and in this particular
case it is not an unrealistic expectation.
We have built a prototype and on testnet it works fairly well. I also
found nodes which have no direct
channel betweem them but produce the same filters for 75% of the
buckets ("produce" here means
that I opened a simple gossip connection to them, got their routing
table and used it to generate filters).


Laolu wrote:
> How far back would this go? Weeks, months, years?
Since forever :)
One filter per day for all annoucements that are older than now - 1
week (modulo 144)
One filter per block for recent announcements

>
> FWIW this approach optimizes for just learning of new channels instead of
> learning of the freshest state you haven't yet seen.

I'd say it optimizes the case where you are connected to very few
peers, and are online a few times every day (?)

>
> -- Laolu
>
>
> On Mon, Feb 5, 2018 at 7:08 AM Fabrice Drouin 
> wrote:
>>
>> Hi,
>>
>> On 5 February 2018 at 14:02, Christian Decker
>>  wrote:
>> > Hi everyone
>> >
>> > The feature bit is even, meaning that it is required from the peer,
>> > since we extend the `init` message itself, and a peer that does not
>> > support this feature would be unable to parse any future extensions to
>> > the `init` message. Alternatively we could create a new
>> > `set_gossip_timestamp` message that is only sent if both endpoints
>> > support this proposal, but that could r

Re: [Lightning-dev] Improving the initial gossip sync

2018-02-13 Thread Fabrice Drouin
On 12 February 2018 at 02:45, Rusty Russell  wrote:
> Christian Decker  writes:
>> Rusty Russell  writes:
>>> Finally catching up.  I prefer the simplicity of the timestamp
>>> mechanism, with a more ambitious mechanism TBA.
>>
>> Fabrice and I had a short chat a few days ago and decided that we'll
>> simulate both approaches and see what consumes less bandwidth. With
>> zombie channels and the chances for missing channels during a weak form
>> of synchronization, it's not that clear to us which one has the better
>> tradeoff. With some numbers behind it it may become easier to decide :-)
>
> Maybe; I think we'd be best off with an IBLT-approach similar to
> Fabrice's proposal.  An IBLT is better than a simple hash, since if your
> results are similar you can just extract the differences, and they're
> easier to maintain.  Even easier if we make the boundaries static rather
> than now-relative.  For node_announce and channel_update you'd probably
> want separate IBLTs (perhaps, though not necessarily, as a separate
> RTT).

Yes,
​real filters would be better, but
 the 'bucket hash' idea works (from what I've seen on testnet)
​for our​
specific
​target​
(nodes which are connected to very small number of peers and go offline
​
very often)
​.


> Note that this approach fits really well as a complement to the
> timestamp approach: you'd use this for older pre-timestamp, where you're
> likely to have a similar idea of channels.

Both approaches maybe needed because they may be solutions to different
problems (nodes which get
​ d
isconnected from
​
 a small set of peers vs nodes connected  to many peers, which remain
online but not some of their peers)

>>> Now, as to the proposal specifics.
>>>
>>> I dislike the re-transmission of all old channel_announcement and
>>> node_announcement messages, just because there's been a recent
>>> channel_update.  Simpler to just say 'send anything >=
>>> routing_sync_timestamp`.
>>
>> I'm afraid we can't really omit the `channel_announcement` since a
>> `channel_update` that isn't preceded by a `channel_announcement` is
>> invalid and will be dropped by peers (especially because the
>> `channel_update` doesn't contain the necessary information for
>> validation).
>
> OTOH this is a rare corner case which will eventually be fixed by weekly
> channel_announce retransmission.  In particular, the receiver should
> have already seen the channel_announce, since it preceeded the timestamp
> they asked for.
>
> Presumably IRL you'd ask for a timestamp sometime before you were last
> disconnected, say 30 minutes.
>
> "The perfect is the enemy of the good".

This is precisely what I think
​would
 not work very well with the timestamp approach:
​ ​
when you're missing an 'old' channel announcement, and only have a few
sources for them.
​ ​
It can have a huge impact on terminal nodes which won't be able to find
routes and waiting for a
​ ​
new channel update would take too long.
Yes, using just a few peers mean that you will be limited to the routing
table they will give you, but
​ ​
having  some kind of filter would let nodes connect
​ ​
to other peers just to retrieve
​them and check how far off they are from the rest of the nework. This
would not possible with a timestamp (you would need to download the entire
routing table again, which is what we're trying to avoid)

>>> Background: c-lightning internally keeps an tree of gossip in the order
>>> we received them, keeping a 'current' pointer for each peer.  This is
>>> very efficient (though we don't remember if a peer sent us a gossip msg
>>> already, so uses twice the bandwidth it could).

Ok so a peer would receive an announcement it has sent, but woud
immediately dismiss it ?

>>
>> We can solve that by keeping a filter of the messages we received from
>> the peer, it's more of an optimization than anything, other than the
>> bandwidth cost, it doesn't hurt.
>
> Yes, it's on the TODO somewhere... we should do this!
>
>>> But this isn't *quite* the same as timestamp order, so we can't just set
>>> the 'current' pointer based on the first entry >=
>>> `routing_sync_timestamp`; we need to actively filter.  This is still a
>>> simple traverse, however, skipping over any entry less than
>>> routing_sync_timestamp.
>>>
>>> OTOH, if we need to retransmit announcements, when do we stop
>>> retransmitting them?  If a new channel_update comes in during this time,
>>> are we still to dump the announcements?  Do we have to remember which
>>> ones we've sent to each peer?

>>
>> That's more of an implementation detail. In c-lightning we can just
>> remember the index at which the initial sync started, and send
>> announcements along until the index is larger than the initial sync
>> index.
>
> True.  It is an implementation detail which is critical to saving
> bandwidth though.
>
>> A more general approach would be to have 2 timestamps, one highwater and
>> one lowwater mark. Anything inbetween these marks will be forwarded
>> togeth

Re: [Lightning-dev] Improving the initial gossip sync

2018-02-19 Thread Fabrice Drouin
 server-side vs
client-side filtering for SPV clients)



On 16 February 2018 at 13:34, Fabrice Drouin  wrote:
> I like the IBLT idea very much but my understanding of how they work
> is that that the tricky part would be first to estimate the number of
> differences between "our" and "their" routing tables.
> So when we open a connection we would first exchange messages to
> estimate how far off we are (by sending a sample of shortids and
> extrapolate ?) then send filters which would be specific to each peer.
> This may become an issue if a node is connected to many peers, and is
> similar to the server-side vs client-side filtering issue for SPV
> clients.
> Basically, I fear that it would take some time before it is agreed
> upon and available, hindering the development of mobile nodes.
>
> The bucket hash idea is naive but is very simple to implement and
> could buy us enough time to implement IBLT filters properly. Imho the
> timestamp idea does not work for the mobile phone use case (but is
> probably simpler and better that bucket hashes for "centre" nodes
> which are never completely off the grid)
>
>
> On 14 February 2018 at 01:24, Rusty Russell  wrote:
>> Fabrice Drouin  writes:
>>> Yes, real filters would be better, but the 'bucket hash' idea works
>>> (from what I've seen on testnet) for our specific target (nodes which
>>> are connected to very small number of peers and go offline very
>>> often)
>>
>> What if we also add an 'announce_query' message: if you see a
>> 'channel_update' which you discard because you don't know the channel,
>> 'announce_query' asks them to send the 'channel_announce' for that
>> 'short_channel_id' followed by re-sending the 'channel_update'(s)?
>> (Immediately, rather than waiting for next gossip batch).
>>
>> I think we would want this for IBLT, too, since you'd want this to query
>> any short-channel-id you extract from that which you don't know about.
>
> Yes, unless it is part of the initial sync (compare filters. then send
> what they're missing)
>
>> I see.  (BTW, your formatting makes your post sounds very Zen!).
> Sorry about that, I've disabled the haiku mode :)
>
>> Yes, we can probably use difference encoding and use 1 bit for output
>> index (with anything which is greater appended) and get down to 1 byte
>> per channel_id at scale.
>>
>> But my rule-of-thumb for scaling today is 1M - 10M channels, and that
>> starts to get a little excessive.  Hence my interest in IBLTs, which are
>> still pretty trivial to implement.
>
> Yes, sending all shortids would also have been a temporary measure
> while a more sophisticated approach is being designed.
>>
>> Cheers,
>> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Improving the initial gossip sync

2018-02-21 Thread Fabrice Drouin
On 20 February 2018 at 02:08, Rusty Russell  wrote:
> Hi all,
>
> This consumed much of our lightning dev interop call today!  But
> I think we have a way forward, which is in three parts, gated by a new
> feature bitpair:

We've built a prototype with a new feature bit `channel_range_queries`
and the following logic:
When you receive their init message and check their local features
- if they set `initial_routing_sync` and `channel_range_queries` then
do nothing (they will send you a `query_channel_range`)
- if they set `initial_routing_sync` and not `channel_range_queries`
then send your routing table (as before)
- if you support `channel_range_queries` then send a
`query_channel_range` message

This way new and old nodes should be able to understand each other

> 1. query_short_channel_id
> =
>
> 1. type: 260 (`query_short_channel_id`)
> 2. data:
>* [`32`:`chain_hash`]
>* [`8`:`short_channel_id`]

We could add a `data` field which contains zipped ids like in
`reply_channel_range` so we can query several items with a single
message ?

> 1. type: 262 (`reply_channel_range`)
> 2. data:
>* [`32`:`chain_hash`]
>* [`4`:`first_blocknum`]
>* [`4`:`number_of_blocks`]
>* [`2`:`len`]
>* [`len`:`data`]

We could add an additional `encoding_type` field before `data` (or it
could be the first byte of `data`)

> Appendix A: Encoding Sizes
> ==
>
> I tried various obvious compression schemes, in increasing complexity
> order (see source below, which takes stdin and spits out stdout):
>
> Raw = raw 8-byte stream of ordered channels.
> gzip -9: gzip -9 of raw.
> splitgz: all blocknums first, then all txnums, then all outnums, then 
> gzip -9
> delta: CVarInt encoding: 
> blocknum_delta,num,num*txnum_delta,num*outnum.
> deltagz: delta, with gzip -9
>
> Corpus 1: LN mainnet dump, 1830 channels.[1]
>
> Raw: 14640 bytes
> gzip -9: 6717 bytes
> splitgz: 6464 bytes
> delta: 6624 bytes
> deltagz: 4171 bytes
>
> Corpus 2: All P2SH outputs between blocks 508000-508999 incl, 790844 
> channels.[2]
>
> Raw: 6326752 bytes
> gzip -9: 1861710 bytes
> splitgz: 964332 bytes
> delta: 1655255 bytes
> deltagz: 595469 bytes
>
> [1] http://ozlabs.org/~rusty/short_channels-mainnet.xz
> [2] http://ozlabs.org/~rusty/short_channels-all-p2sh-508000-509000.xz
>

Impressive!
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


[Lightning-dev] Proposal for syncing channel updates

2018-10-04 Thread Fabrice Drouin
Hi,

This a a proposal for an extension of our current “channel queries”
that should allow nodes to properly sync their outdated channel
updates. I already opened a issue on the RFC’s github repo
(https://github.com/lightningnetwork/lightning-rfc/issues/480) but
decided to post here too to, to have a less “constrained” discussion.
And it looks like a fairly standard synchronisation problem so maybe
someone will think of others similar schemes that have been used in a
different context.

Thanks,

Fabrice

Background: Routing Table Sync

(If you’re familiar with LN you can just skip this section)

LN is a p2p network of nodes, which can be represented as a graph
where nodes are vertices and channels are edges, and where you can pay
any node you can find a route to:
- each nodes maintains a routing table i.e. a full view of the LN graph
- to send a payment, nodes use their local routing table to compute a
route to the destination, and send a onion-like message to the first
node on that route, which will forward it to the next node and so on
until it reaches its destination

The routing table includes:
- “static” information: channel announcements
- “dynamic” information: channel updates (relay fees)
(It also includes node announcements, which are not needed for route
computation)
Using our graph analogy, channel updates would be edge parameters
(cost, max capacity, min payment amount, …). They can change often,
usually when nodes decide to change their relay fee policy, but also
to signify that a channel is temporarily unusable. A new channel
update will replace the previous one.

Channel ids are identified with an 8 bytes "short transaction id": we
use the blockchain coordinates of the funding tx: block height (4
bytes) + tx index (2 bytes) + output index (2 bytes)

Chanel updates include a channel id, a direction (for a channel
between Alice and Bob there are 2 channel updates: one for Alice->Bob
and one for Bob->Alice), fee parameters, and a 4 bytes timestamp.

To compute routes, nodes need a way to keep their routing table
up-to-date: we call it "routing table sync" or "routing sync".

There is something else to consider: route finding is only needed when
you're * sending * payments, not when you're relaying them or
receiving them. A node that sits in the "middle" of the LN network and
just keeps relaying payments would work even if it has no routing
information at all.

Likewise, a node that just creates payment requests and receives
payments does not need a routing table.

On the other end of the spectrum, a LN "wallet" that is mostly used to
send payments will not work very well it its routing table is missing
info or contains outdated info, so routing sync is a very important
issue for LN wallets, which are also typically offline more often than
other nodes.

If your wallet is missing channel announcements it may not be able to
find a route, and if its channel updates are outdated it may compute a
route that includes channels that are temporarily disabled, or use fee
rates that are too old and will be refused by relaying nodes. In this
case nodes can return errors that include their most recent channel
update, so that the sender can try again, but this will only work well
if just a few channel updates are outdated.

So far, these are the “routing table sync” schemes that have been
specified and implemented:

Step #1: just send everything

The first routing sync scheme was very simple: nodes would request
that peers they connect to send them a complete "dump" of their entire
routing table. It worked well at the beginning but was expensive for
both peers and quickly became impractical.

Step #2: synchronise channel announcements

New query messages where added to the LN protocol to improve routing
table sync: nodes can ask their peers for all their channel ids in a
given block range, compare that list to their own channel ids and
query the ones they're missing (as well as related channel updates).

Nodes can also send a timestamp-based filter to their peers ("only
send me  channel updates that match this timestamp filter").

It's a nice improvement but there are still issues with nodes that are
offline very often: they will be able to sync their channel
announcements, but not their channel updates.

Suppose that at T0 a node has 1000 channel updates that are outdated.
It comes back online, starts syncing its routing table, and goes
offline after a few minutes. It now has 900 channel updates that are
outdated.
At T1 = T0 + 8 hours it comes back online again. If it uses T0 to
filter out channel updates, it will never receive the info it is
missing for its 900 outdated channel updates. Using our "last time I
was online at" timestamp as a gossip filter does not work here.

=> Proposed solution: timestamp-based channel updates sync

We need a better method for syncing channel updates. And it is not
really a set reconciliation problem (like syncing channel
announcements for example): we’re not missing items, 

Re: [Lightning-dev] Proposal for syncing channel updates

2018-10-12 Thread Fabrice Drouin
Hi Zmn,

> It may be reduced to a set reconciliation problem if we consider the 
> timestamp and enable/disable state of channel updates as part of an item, 
> i.e. a channel update of 111:1:1 at 2018-10-04 state=enabled is not the same 
> as a channel update of 111:1:1 at 2018-10-05 state=disabled.
>
> Then both sides can use standard set reconciliation algorithms, and for 
> channel updates of the same short channel ID, we simply drop all items except 
> the one with latest timestamp.
>
> The above idea might be less efficient than your proposed extension.

Yes there may be some way use the structure of channel_update messages
to transpose this into a set reconciliation problem, and use smarter
tools like IBLTs. But we would need to have a good estimate for the
number of differences between the local and remote sets. This would be
the really hard part I think, and probably even harder to get right
with channel_updates than with channel_announcements. I had a quick
look at how this type of sync issue is handled in different contexts
and my impression is that exchanging and and comparing timestamps
would be the most natural solution ?

But mostly my point is that I think we missed something with the
current channel queries, so first I would like to know if other people
agree with that :) and propose something that is close to what we have
today, should be easy to implement if you already support channel
queries, and should fix the issue that I think we have.

Thanks,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Commitment Transaction Format Update Proposals?

2018-10-18 Thread Fabrice Drouin
Hello,

> 1.  Rather than trying to agree on what fees will be in the future, we
 > should use an OP_TRUE-style output to allow CPFP (Roasbeef)

We could also use SIGHASH_ANYONECANPAY|SIGHASH_SINGLE for HTLC txs, without
adding the "OP_TRUE" output to the commitment transaction. We would still
need the update_fee message to manage onchain fees for the commit tx (but
not the HTLC txs) but there would be no reason anymore to refuse fee rates
that are too high and channels would not get closed anymore when there's a
spike in onchain fees.

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Wireshark plug-in for Lightning Network(BOLT) protocol

2018-10-30 Thread Fabrice Drouin
Nice work, thank you!

On Fri, 26 Oct 2018 at 17:37,  wrote:
>
> Hello lightning network developers.
> Nayuta team is developing Wireshark plug-in for Lightning Network(BOLT) 
> protocol.
> https://github.com/nayutaco/lightning-dissector
>
> It’s alpha version, but it can decode some BOLT message.
> Currently, this software works for Nayuta’s implementation(ptarmigan) and 
> Éclair.
> When ptarmigan is compiled with some option, it write out key information 
> file. This Wireshark plug-in decode packet using that file.
> When you use Éclair, this software parse log file.
>
> Through our development experience, interoperability test is time consuming 
> task.
> If people can see communication log of BOLT message on same format (.pcap), 
> it will be useful for interoperability test.
>
> Our proposal:
> Every implementation has compile option which enable output key information 
> file.
>
> We are glad if this project is useful for lightning network eco-system.
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


[Lightning-dev] Improving payment UX with low-latency route probing

2018-10-31 Thread Fabrice Drouin
Context
==

Sent payments that remain pending, i.e. payments which have not yet
been failed or fulfilled, are currently a major UX challenge for LN
and a common source of complaints from end-users.
Why payments are not fulfilled quickly is not always easy to
investigate, but we've seen problems caused by intermediate nodes
which were stuck waiting for a revocation, and recipients who could
take a very long time to reply with a payment preimage.
It is already possible to partially mitigate this by disconnecting
from a node that is taking too long to send a revocation (after 30
seconds for example) and reconnecting immediately to the same node.
This way pending downstream HTLCs can be forgotten and the
corresponding upstream HTLCs failed.

Proposed changes
===

It should be possible to provide a faster "proceed/try another route"
answer to the sending node using probing with short timeout
requirements: before sending the actual payment it would first send a
"blank" probe request, along the same route. This request would be
similar to a payment request, with the same onion packet formatting
and processing, with the additional requirements that if the next node
in the route has not replied within the timeout period (typically a
few hundred milliseconds) then the current node will immediately send
back an error message.

There could be several options for the probe request:
- include the same amounts and fee constraints than the actual payment request.
- include no amount information, in which case we're just trying to
"ping" every node on the route.

Implementation


I would like to discuss the possibility of implementing this with a "0
satoshi" payment request that the receiving node would generate along
with the real one. The sender would first try to "pay" the "0 satoshi"
request using the route it computed with the actual payment
parameters. I think that it would not require many changes to the
existing protocol and implementations.
Not using the actual amount and fees means that the actual payment
could fail because of capacity issues but as long as this happens
quickly, and it should since we checked first that all nodes on the
route are alive and responsive, it still is much better than “stuck”
payments.
And it would not help if a node decides to misbehave, but would not
make things worse than they are now (?)

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


[Lightning-dev] Quick analysis of channel_update data

2019-01-02 Thread Fabrice Drouin
Hello All, and Happy New Year!

To understand why there is a steady stream of channel updates, even
when fee parameters don't seem to actually change, I made hourly
backups of the routing table of one of our nodes, and compared these
routing tables to see what exactly was being modified.

It turns out that:
- there are a lot of disable/enable/disable etc…. updates which are
just sent when a channel is disabled then enabled again (when nodes go
offline for example ?). This can happen
there are also a lot of updates that don’t change anything (just a new
timestamp and signatures but otherwise same info), up to several times
a day for the same channel id

In both cases we end up syncing info that we already have.
I don’t know yet how best to use this when syncing routing tables, but
I thought it was worth sharing anyway. A basic checksum that does not
cover all fields, but only fees and HTLC min/max values could probably
be used to improve routing table sync ?

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-02 Thread Fabrice Drouin
On Wed, 2 Jan 2019 at 18:26, Christian Decker
 wrote:
>
> For the ones that flap with a period that is long enough for the
> disabling and enabling updates being flushed, we are presented with a
> tradeoff. IIRC we (c-lightning) currently hold back disabling
> `channel_update`s until someone actually attempts to use the channel at
> which point we fail the HTLC and send out the stashed `channel_update`
> thus reducing the publicly visible flapping. For the enabling we can't
> do that, but we could think about a local policy on how much to delay a
> `channel_update` depending on the past stability of that peer. Again
> this is local policy and doesn't warrant a spec change.
>
> I think we should probably try out some policies related to when to send
> `channel_update`s and how to hide redundant updates, and then we can see
> which ones work best :-)
>
Yes, I haven't looked at how to handle this with local policies. My
hypothesis is that when you're syncing a routing table that is say one
day old, you end up querying and downloading a lot of information that
you already have, and that adding a basic checksum to our channel
queries may greatly improve this. Of course this would be much more
actionable with stats and hard numbers which I'll provide ASAP.

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-03 Thread Fabrice Drouin
Follow-up: here's more detailed info on the data I collected and
potential savings we could achieve:

I made hourly routing table backups for 12 days, and collected routing
information for 17 000 channel ids.

There are 130 000 different channel updates :on average each channel
has been updated 8 times. Here, “different” means that at least the
timestamp has changed, and a node would have queried this channel
update during its syncing process.

But only 18 000 pairs of channel updates carry actual fee and/or HTLC
value change. 85% of the time, we just queried information that we
already had!

Adding a basic checksum (4 bytes for example) that covers fees and
HTLC min/max value to our channel range queries would be a significant
improvement and I will add this the open BOLT 1.1 proposal to extend
queries with timestamps.

I also think that such a checksum could also be used
- in “inventory” based gossip messages
- in set reconciliation schemes: we could reconcile [channel id |
timestamp | checksum] first

Cheers,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-04 Thread Fabrice Drouin
On Fri, 4 Jan 2019 at 04:43, ZmnSCPxj  wrote:
> > -   in set reconciliation schemes: we could reconcile [channel id |
> > timestamp | checksum] first
>
> Perhaps I misunderstand how set reconciliation works, but --- if timestamp is 
> changed while checksum is not, then it would still be seen as a set 
> difference and still require further communication rounds to discover that 
> the channel parameters have not actually changed.
>
> Perhaps it is better to reconcile [channel_id | checksum] instead, and if 
> there is a different set of channel parameters, share the set difference and 
> sort out which timestamp is later at that point.

Ah yes of course, the `timestamp` should not be included.

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Lite client considerations for Lightning Implementations

2019-01-07 Thread Fabrice Drouin
Hi Chris,

What we've learned building a lite bitcoin/LN wallet is that there are
different things it must implement:
- a bitcoin wallet. We started with bitcoinj, but there are known
issues with Bloom Filters, which is one of the reasons why we ended up
building our own wallet that connect to Electrum Servers (and it seems
we're not the only ones). I'm not sure that a "better" implementation
of BIP37 is actually needed, if that's what you mean by "traditional
SPV". Client-side filters is a nice improvement, and we have a basic
Neutrino prototype that is up to date with the BIPs but not used in
our mobile app. We could collaborate on this ?
- monitoring your channels" part: detect that your peer is trying to
cheat and published an old commit tx, and publish a penalty tx. This
is fairly easy (the "detecting" part at least :))
- validating channels: you receive gossip message, and check that
channels actually exist, detect when they've been closed and remove
them from your routing table. This is much harder. Electrum servers
now have a method for retrieving a tx from its coordinates (height +
position), but as the number of channels grows it may become
impractical to watch every channel. With Bloom Filters and client-side
filters you probably end up having to download all blocks (but not
necessarily store them all).

I also think that it's very important the lite wallet support mobile
platforms, android in your case, and since it's basically stuck at
Java 7 you may wan to consider using plain Java (or Kotlin) instead of
Scala as much as possible.

Cheers,

Fabrice



On Sun, 6 Jan 2019 at 15:58, Chris Stewart  wrote:
>
> Hi all,
>
> Hope your 2019 is off to a fantastic start. I'm really excited for Lightning 
> in 2019.
>
> We are currently reviving a lite client project in bitcoin-s 
> (https://github.com/bitcoin-s/bitcoin-s-core/pull/280). The goal is to have a 
> modern replacement for bitcoinj that also can be used for L2 applications 
> like lightning. We also are planning on supporting multiple coins, hsms etc.
>
> The current plan is to implement traditional SPV, and then implement neutrino 
> when development is picking back up on that in bitcoin core. If that takes 
> too long, we will consider implementing neutrino against btcd.
>
> What I wanted to ask of the mailing list is to give us "things to consider" 
> when developing this lite client from a usability perspective for lightning 
> devs. How can we make your lives easier?
>
> One thing that seems logical is to adhere to the bitcoin core api when 
> possible, this means you can use bitcoin-s as a drop in lite client 
> replacement for bitcoin core.
>
> Thoughts?
>
> -Chris
>
>
>
>
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-08 Thread Fabrice Drouin
On Tue, 8 Jan 2019 at 17:11, Christian Decker
 wrote:
>
> Rusty Russell  writes:
> > Fortunately, this seems fairly easy to handle: discard the newer
> > duplicate (unless > 1 week old).  For future more advanced
> > reconstruction schemes (eg. INV or minisketch), we could remember the
> > latest timestamp of the duplicate, so we can avoid requesting it again.
>
> Unfortunately this assumes that you have a single update partner, and
> still results in flaps, and might even result in a stuck state for some
> channels.
>
> Assume that we have a network in which a node D receives the updates
> from a node A through two or more separate paths:
>
> A --- B --- D
>  \--- C ---/
>
> And let's assume that some channel of A (c_A) is flapping (not the ones
> to B and C). A will send out two updates, one disables and the other one
> re-enables c_A, otherwise they are identical (timestamp and signature
> are different as well of course). The flush interval in B is sufficient
> to see both updates before flushing, hence both updates get dropped and
> nothing apparently changed (D doesn't get told about anything from
> B). The flush interval of C triggers after getting the re-enable, and D
> gets the disabling update, followed by the enabling update once C's
> flush interval triggers again. Worse if the connection A-C gets severed
> between the updates, now C and D learned that the channel is disabled
> and will not get the re-enabling update since B has dropped that one
> altogether. If B now gets told by D about the disable, it'll also go
> "ok, I'll disable it as well", leaving the entire network believing that
> the channel is disabled.
>
> This is really hard to debug, since A has sent a re-enabling
> channel_update, but everybody is stuck in the old state.

I think there may even be a simpler case where not replacing updates
will result in nodes not knowing that a channel has been re-enabled:
suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
it, U3 enables it again and is the same as U1. If you discard it and
just keep U1, and your peer has U2, how will you tell them that the
channel has been enabled again ? Unless "discard" here means keep the
update but don't broadcast it ?


> At least locally updating timestamp and signature for identical updates
> and then not broadcasting if they were the only changes would at least
> prevent the last issue of overriding a dropped state with an earlier
> one, but it'd still leave C and D in an inconsistent state until we have
> some sort of passive sync that compares routing tables and fixes these
> issues.

But then there's a risk that nodes would discard channels as stale
because they don't get new updates when they reconnect.

> I think all the bolted on things are pretty much overkill at this point,
> it is unlikely that we will get any consistency in our views of the
> routing table, but that's actually not needed to route, and we should
> consider this a best effort gossip protocol anyway. If the routing
> protocol is too chatty, we should make efforts towards local policies at
> the senders of the update to reduce the number of flapping updates, not
> build in-network deduplications. Maybe something like "eager-disable"
> and "lazy-enable" is what we should go for, in which disables are sent
> right away, and enables are put on an exponential backoff timeout (after
> all what use are flappy nodes for routing?).

Yes there are probably heuristics that would help reducing gossip
traffic, and I see your point but I was thinking about doing the
opposite: "eager-enable" and "lazy-disable", because from a sender's
p.o.v trying to use a disabled channel is better than ignoring an
enabled channel.

Cheers,
Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-01-20 Thread Fabrice Drouin
Additional info on channel_update traffic:

Comparing daily backups of routing tables over the last 2 weeks shows
that nearly all channels get at least a new update every day. This
means that channel_update traffic is not primarily cause by nodes
publishing new updates when channel are about to become stale:
otherwise we would see 1/14th of our channels getting a new update on
the first day, then another 1/14th on the second day and so on.This is
confirmed by comparing routing table backups over a single day: nearly
all channels were updated, one average once, with an update that
almost always does not include new information.

It could be caused by "flapping" channels, probably because the hosts
that are hosting them are not reliable (as in is often offline).

Heuristics can be used to improve traffic but it's orhtogonal to the
problem of improving our current sync protocol.
Also, these heuristics would probaly be used to close channels to
unreliable nodes instead of filtering/delaying publishing updates for
them.

Finally, this is not just obsessing over bandwidth (though bandwidth
is a real issue for most mobile users). I'm also over obsessing over
startup time and payment UX :), because they do matter a lot for
mobile users, and would like to push the current gossip design as far
as it can go. I also think that we'll face the same issue when
designing inventory messages for channel_update messages.

Cheers,

Fabrice



On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
>
> Fabrice Drouin  writes:
> > I think there may even be a simpler case where not replacing updates
> > will result in nodes not knowing that a channel has been re-enabled:
> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> > it, U3 enables it again and is the same as U1. If you discard it and
> > just keep U1, and your peer has U2, how will you tell them that the
> > channel has been enabled again ? Unless "discard" here means keep the
> > update but don't broadcast it ?
>
> This can only happen if you happen to lose connection to the peer(s)
> which sent U2 before it sends U3.
>
> Again, this corner case penalizes flapping channels.  If we also
> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
>
> > But then there's a risk that nodes would discard channels as stale
> > because they don't get new updates when they reconnect.
>
> You need to accept redundant updates after 1 week, I think.
>
> Cheers,
> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Unification of feature bits?

2019-01-25 Thread Fabrice Drouin
On Mon, 21 Jan 2019 at 08:05, Rusty Russell  wrote:
>
> Hi all,
>
> I have a concrete proposal for feature bits.
>
> 1. Rename 'local features' to 'peer features'.
> 2. Rename 'global features' to 'routing features'.
> 3. Have them share a number space (ie. peer and routing features don't
>overlap).
> 4. Put both in `features` in node announcements, but never use even bits
>for peer features.
>
> This means we can both use node_announcement as "connect to a peer which
> supports feature X" and "can I route through this node?".

Unification of feature bits makes sense but I don't really understand
the concept of `routing features` as opposed to `node features`. What
would prevent us from routing payments through a node ? (AMP ? changes
to the onion packet ?)
I find it easier to reason in terms of `node features`, which are
advertised in node announcements, and `peer/connection features`,
which are a subset of `node features` applied to a specific
connection.
Node features would be all the features that we have today
(option_data_loss_protect, initial_routing_sync,
option_upfront_shutdown_script, gossip_queries), since it makes sense
to advertise them except maybe for initial_routing_sync, with the
addition of wumbo which could only be optional.

What is the rationale for not allowing even bits in peer features ? It
makes sense for node features, but there are cases where you may
require specific features for a specific connection
(option_data_loss_protect for example, or
option_upfront_shutdown_script)

Cheers,

Fabrice




> Similarly, (future) DNS seed filtering might support filtering only by
> pairs of bits (ie. give me peers which support X, even or odd).
>
> Cheers,
> Rusty.
> ___
> Lightning-dev mailing list
> Lightning-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Quick analysis of channel_update data

2019-02-18 Thread Fabrice Drouin
I'll start collecting and checking data again, but from what I see now
using our checksum extension still significantly reduces gossip
traffic.

I'm not saying that heuristics to reduce the number of updates cannot
help, but I just don't think it should be our primary way of handling
such traffic. If you've opened channels to nodes that are unreliable
then you should eventually close these channels, but delaying how you
publish updates that disable/enable them has an impact on everyone,
especially if they mostly send payments (as opposed to relaying or
receiving them).

Cheers,

Fabrice

On Mon, 18 Feb 2019 at 13:10, Rusty Russell  wrote:
>
> BTW, I took a snapshot of our gossip store from two weeks back, which
> simply stores all gossip in order (compacting every week or so).
>
> channel_updates which updated existing channels: 17766
> ... which changed *only* the timestamps: 12644
> ... which were a week since the last: 7233
> ... which only changed the disable/enable: 4839
>
> So there are about 5100 timestamp-only updates less than a week apart
> (about 2000 are 1036 seconds apart, who is this?).
>
> 1. I'll look at getting even more conservative with flapping (120second
>delay if we've just sent an update) but that doesn't seem to be the
>majority of traffic.
> 2. I'll also slow down refreshes to every 12 days, rather than 7, but
>again it's only a marginal change.
>
> But basically, the majority of updates I saw two weeks ago are actually
> refreshes, not spam.
>
> Hope that adds something?
> Rusty.
>
> Fabrice Drouin  writes:
> > Additional info on channel_update traffic:
> >
> > Comparing daily backups of routing tables over the last 2 weeks shows
> > that nearly all channels get at least a new update every day. This
> > means that channel_update traffic is not primarily cause by nodes
> > publishing new updates when channel are about to become stale:
> > otherwise we would see 1/14th of our channels getting a new update on
> > the first day, then another 1/14th on the second day and so on.This is
> > confirmed by comparing routing table backups over a single day: nearly
> > all channels were updated, one average once, with an update that
> > almost always does not include new information.
> >
> > It could be caused by "flapping" channels, probably because the hosts
> > that are hosting them are not reliable (as in is often offline).
> >
> > Heuristics can be used to improve traffic but it's orhtogonal to the
> > problem of improving our current sync protocol.
> > Also, these heuristics would probaly be used to close channels to
> > unreliable nodes instead of filtering/delaying publishing updates for
> > them.
> >
> > Finally, this is not just obsessing over bandwidth (though bandwidth
> > is a real issue for most mobile users). I'm also over obsessing over
> > startup time and payment UX :), because they do matter a lot for
> > mobile users, and would like to push the current gossip design as far
> > as it can go. I also think that we'll face the same issue when
> > designing inventory messages for channel_update messages.
> >
> > Cheers,
> >
> > Fabrice
> >
> >
> >
> > On Wed, 9 Jan 2019 at 00:44, Rusty Russell  wrote:
> >>
> >> Fabrice Drouin  writes:
> >> > I think there may even be a simpler case where not replacing updates
> >> > will result in nodes not knowing that a channel has been re-enabled:
> >> > suppose you got 3 updates U1, U2, U3 for the same channel, U2 disables
> >> > it, U3 enables it again and is the same as U1. If you discard it and
> >> > just keep U1, and your peer has U2, how will you tell them that the
> >> > channel has been enabled again ? Unless "discard" here means keep the
> >> > update but don't broadcast it ?
> >>
> >> This can only happen if you happen to lose connection to the peer(s)
> >> which sent U2 before it sends U3.
> >>
> >> Again, this corner case penalizes flapping channels.  If we also
> >> ratelimit our own enables to 1 per 120 seconds, you won't hit this case?
> >>
> >> > But then there's a risk that nodes would discard channels as stale
> >> > because they don't get new updates when they reconnect.
> >>
> >> You need to accept redundant updates after 1 week, I think.
> >>
> >> Cheers,
> >> Rusty.
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


[Lightning-dev] Removing lnd's source code from the Lightning specs repository

2021-10-08 Thread Fabrice Drouin
Hello,

When you navigate to https://github.com/lightningnetwork/ you find
- the Lightning Network white paper
- the Lightning Network specifications
- and ... the source code for lnd!

This has been an anomaly for years, which has created some confusion
between Lightning the open-source protocol and Lightning Labs, one of
the companies specifying and implementing this protocol, but we didn't
do anything about it.

I believe that was a mistake: a few days ago, Arcane Research
published a fairly detailed report on the state of the Lightning
Network: https://twitter.com/ArcaneResearch/status/1445442967582302213.
They obviously did some real work there, and seem to imply that their
report was vetted by Open Node and Lightning Labs.

Yet in the first version that they published you’ll find this:

"Lightning Labs, founded in 2016, has developed the reference client
for the Lightning Network called Lightning Network Daemon (LND)
They also maintain the network standards documents (BOLTs)
repository."

They changed it because we told them that it was wrong, but the fact
that in 2021 people who took time do do proper research, interviews,
... can still misunderstand that badly how the Lightning developers
community works means that we ourselves badly underestimated how
confusing mixing the open-source specs for Lightning and the source
code for one of its implementations can be.

To be clear, I'm not blaming Arcane Research that much for thinking
that an implementation of an open-source protocol that is hosted with
the white paper and specs for that protocol is a "reference"
implementation, and thinking that since Lightning Labs maintains lnd
then they probably maintain the other stuff too. The problem is how
that information is published.

So I'm proposing that lnd's source code be removed from
https://github.com/lightningnetwork/ (and moved to
https://github.com/lightninglabs for example, with the rest of their
Lightning tools, but it's up to Lightning Labs).

Thanks,

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Removing lnd's source code from the Lightning specs repository

2021-10-12 Thread Fabrice Drouin
On Tue, 12 Oct 2021 at 01:14, Martin Habovštiak
 wrote:
>
> I can confirm I moved a repository few months ago and all links kept working 
> fine.
>

Yes, github makes it really easy, and you keep your issues, PRs,
stars, .. depending on your dev/packaging you may need to rename
packages (something java/scala/... devs have to do from time to time)
but it's also very simple.

The issue here is not technical.

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Removing lnd's source code from the Lightning specs repository

2021-10-15 Thread Fabrice Drouin
On Tue, 12 Oct 2021 at 21:57, Olaoluwa Osuntokun  wrote:
> Also note that lnd has _never_ referred to itself as the "reference"
> implementation.  A few years ago some other implementations adopted that
> title themselves, but have since adopted softer language.

I don't remember that but if you're referring to c-lightning it was
the first lightning implementation, and the only one for a while, so
in a way it was a "reference" at the time ?
Or it could have been a reference to their policy of "implementing the
spec, all the spec and nothing but the spec"  ?

> I think it's worth briefly revisiting a bit of history here w.r.t the github
> org in question. In the beginning, the lightningnetwork github org was
> created by Joseph, and the lightningnetwork/paper repo was added, the
> manuscript that kicked off this entire thing. Later lightningnetwork/lnd was
> created where we started to work on an initial implementation (before the
> BOLTs in their current form existed), and we were added as owners.
> Eventually we (devs of current impls) all met up in Milan and decided to
> converge on a single specification, thus we added the BOLTs to the same
> repo, despite it being used for lnd and knowingly so.

Yes, work on c-lightning then eclair then lnd all began a long time
before the BOLTs process was implemented, and we all set up repos,
accounts...
I agree that we all inherited things  from the "pre-BOLTS" era and
changing them will create some friction but I still believe it should
be done. You also mentioned potential admin rights issues on the
current specs repos which would be solved by moving them to a new
clean repo.

> As it seems the primary grievance here is collocating an implementation of
> Lightning along with the _specification_ of the protocol, and given that the
> spec was added last, how about we move the spec to an independent repo owned
> by the community? I currently have github.com/lightning, and would be happy
> to donate it to the community, or we could create a new org like
> "lightning-specs" or something similar.

Sounds great! github.com/lightning is nice (and I like Damian's idea
of using github.com/lightning/bolts) and seems to please everyone so
it looks that we have a plan!

Fabrice
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev