Re: [Lightning-dev] Decoy node_ids and short_channel_ids

2020-02-02 Thread m.a.holden via Lightning-dev
> (I'm seeking a clever way that Bob can assign them and trivially tell
> which ID is assigned to which peer, but I can't figure it out, so I
> guess Bob keeps a mapping and restricts each peer to 256 live scids?).

Hi Rusty.

Here's a potential way for Alice and Bob to agree a set of 256 scids without 
any additional messages or changes to existing messages beyond a feature flag 
and a flag in open_channel, but comes with a computational cost.

Alice and Bob agree on a random integer `r`. This could be negotiated on 
`open_channel`, but we shouldn't need to send additional information because we 
already have a random integer we can use: the `temporary_channel_id`. This is 
not known to anybody besides Alice and Bob.

When a channel is locked, Bob computes n=256 scids, using something 
approximating `concat(n, trunc_bytes(sha256(ec_mult(2^n*r, Q)), 7))`, where `Q` 
is Alice's public key for the channel funding transaction.

The chance of scid collisions between channels is 2^56, which is probably no 
cause for concern.

Instead of keeping a map of 256 scids for each channel, Bob can use a cuckoo 
filter for efficiency. The filter can be used for a quick membership test and 
also as an associative map from scids to channels. It can also support scid 
deletion in the event of channel closure (at the cost of recomputing 256 
ec_mults again).

So when Bob receives a new HTLC to forward, he tests it against his cuckoo 
filter and retreives a candidate set of possible channels to which it may 
refer. For each channel, he takes the most significant byte of the scid as `m` 
and performs `trunc_bytes(sha256(ec_mult(2^m*r, Q)), 7)` and tests the 
least-significant 7 bytes of the result against the scid.

Alice does not need to keep all of the scids she may use for invoices because 
they can be computed on the fly, but she will need to keep a copy of the 
`temporary_channel_id`.

In the reverse direction of Alice forwarding HTLCs to Bob, Bob's public key for 
the funding transaction is used instead.

Regards,
Mark Holden
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-08 Thread m.a.holden via Lightning-dev
Hi ZmnSCPxj,

> Let me clarify: When you say "node" here, do you mean Lightning Network node?
> Or do you mean instead an in-memory node?

Neither. I meant a node in a tree. I tried to use the term bucket to make the 
distinction between this and Lightning node.
The tree is not strictly in-memory. There is a purely conceptual global 
quadtree.

Each bucket is essentially a view over the union of its 4 children. The bucket 
00b is (b U 0001b U 0010b U 0011b). The bucket 0011b is (001100b U 001101b 
U 001110b 00b), etc. Each client choses a suitable max_depth <= HASHLEN/2, 
and the leaves of their tree contain a set of all PKH whose prefix matches the 
bucket index. A client will keep, at minimum, a subtree of the global quadtree 
(and both nodes for all channels where at least one of the nodes is in the same 
subtree).

As you state, you can use any data structure for storing this in memory, but 
there are obvious benefits to using a quadtree-index to mirror the conceptual 
global one in terms of efficient querying, filtering, spilling, aggregating 
branch sizes.

If many clients follow the convention then the optimisation opportunities 
arise, because the majority of routes will be discoverable with a single 
(possibly parallel) query to the network. Gossip size can be reduced and 
unwanted gossip can be eliminated, which alleviates the most constrained 
resource, bandwidth. If the conventions are widely followed, the benefits are 
maximized for everybody. Not following convention does not break things, it 
just limits the potential wins.

---

> As I understand your proposal, taking over or taking down a node near the 
> root of the tree will make it difficult to look up several nodes at once.
> This is because the tree nature makes nodes near the root of the tree (top of 
> the hierarchy) much more important than leaf nodes, which only worry about 
> themselves.

If a node advertises itself near the top of the tree then it will be a better 
information provider than others, but this is never at the exclusion of others 
below it. All of the information a node in the parent bucket holds is held in 
part by the nodes in the child buckets. The parent does not know anything that 
other people don't know too. No nodes are "more important", but they might 
potentially be "more optimal".

If you know about some nodes in bucket 0011b, but none in 0001b where a payment 
destination is, then you could query any node in 0011b and ask about any nodes 
in bucket 0001b. Since the buckets are nearby, there is a greater probability 
that they'll know about more nodes than somebody further away. This would be 
similar to how your proposal operates. If somebody did advertise being in 
bucket 00b, then they're able to find a potentially better (shorter) path to 
the destination because they know more information and you don't need to find a 
path through multiple buckets. If they are under DDoS, it doesn't bring down 
the network or limit access to the child buckets - it just makes it trivially 
less optimal to route because it *might* (low probability) require more queries 
to reach the destination.

When querying, it is expected that if you know about any nodes in the same 
bucket as the payment destination, then there's a high probability that they 
will know a route to the destination. A parent bucket of that bucket is not any 
more likely to know about the destination, they have the same information. I've 
shown that at small depth, there's a high probability that you will have 
knowledge about a reasonable quantity of paths to every other bucket in the 
global network - so the majority of payments will only need to visit one bucket 
outside your own. The reason to specifically query parent buckets is limited, 
and not ultimately necessary.

The only way I can see the problem you raise being a concern is if 
misconfigured clients have an unreasonably large depth for the amount of 
information in the network, such that there are few, or no channels from their 
own bucket to other buckets. In that case, they might become over-reliant on 
the parent's buckets to receive payments, but there are likely to be numerous 
parents at every depth of the tree (correctly configured clients), meaning 
there isn't a clear-cut target that an attacker could DDoS to try and disrupt a 
bucket.

Nodes do not need to present the real depth at which they keep gossip, but 
there are potentially greater fee-earning opportunities if publicly advertising 
their maximum information capability. Nodes could counterbalance potential DDoS 
risk with higher routing fees, where people might pay more to have a greater 
chance of finding shorter routes in fewer queries, but a frugal client would 
simply chose the cheapest routes, and the more expensive routes via parent 
buckets would be less desirable to them - meaning DDoS on nodes in parent 
buckets may be wasted effort because it would drive people towards saving money 
on fees. Also, since 

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-05 Thread m.a.holden via Lightning-dev
Good morning ZmnSCPxj,

I'll try to clarify my proposal further, but also have some questions about 
yours.

---

> Now, it seems to me what you propose, is to have octrees contain octrees, and 
> so on.

There's one global tree, which is the same for all users. Every node in the 
tree has a bucket and exactly 4 child nodes, except leaves have no children. 
The tree has a max depth, which each client sets itself. The theoretical 
maximum being HASHLEN/2 (In practice, we're likely to be concerned about <8). 
Note that every parent's bucket contains all of the information of all of its 
children's buckets - meaning the root node of the global quadtree is equivalent 
to the unfiltered global network. Nodes pick a depth and concern themselves 
with the bucket at that depth, unless it overflows, in which case they increase 
the depth by 1.

> Now let us return, to my proposal of a distance measurement.
> This effectively maps the LN universe onto a circle.
> Each node commits itself to storing an arc of this circle, and possibly 
> various other nodes it happens to be directly connected to that may be far 
> from the arc near it.

The quadtree can also be looked at from the perspective of a circle, assuming 
there is a reference point P on it. Each bucket is represented by an arc of 
size 2pi/(4^depth), and the PKH-prefix represents a displacement of the arc 
from P (Eg, the bucket 00b would represent an arc from P to pi/2+P). Bucket 00b 
includes all gossip information for the buckets b, 0001b, 0010b and 0011b, 
which are sub-arcs of it, and so forth. Spilling a bucket is the same as 
narrowing the arc to 1/4 its previous size.

Although different from your arbitrary distance k (arc size 2*k?), they're not 
too dissimilar in the kind of distance covered - with the primary distinction 
being that mine proposes using set interval ranges (based on powers of 4), and 
the node picks the range which it's PKH fits into, rather than the range being 
centred on the node.

---

> I can overload those N nodes by generating node addresses that also lie in 
> that octant.
> By just iterating over scalars, about 1/8 of the generated node addresses 
> will lie in the target octant.

If you target the first layer of the tree only, then once nodes spill to the 
second layer, they will filter out any gossip which does not concern them. It 
isn't sufficient to just brute force the first 2 bits, but you need to do it 
for 2^depth, where depth is the target's maximum they're willing to spill to 
(which is not shared).

However, commodity hardware can attempt millions of ec_mult/hash per second, so 
getting a node into the bucket you want is trivial anyway for small depth.

> In order to disrupt a particular node, I must place fake nodes near that 
> node, in an attempt to force its neighbors to reduce the arc of the circle 
> that they can map.
> However, I need to generate fake nodes that are nearer to that node than 
> genuine honest nodes.
> This means the probability of me generating such node addresses are much 
> lower than 1/8, thus requiring that I devote more energy to generating the 
> falsified attack nodes.

I would argue that the above is also true in your approach. It would still be 
trivial to brute force the prefixes which minimize the distance between 
themselves and the target, even with a network of tens of millions of nodes. 
The amount of energy which would need devoting does not seem like it would a 
deciding factor for this attack.

> Further, in executing this attack, while I disrupt one node very well, and 
> nearby nodes somewhat, my effect will be far less than disrupting 1/8 of the 
> network.

Since the attack needs to target the maximum depth that nodes might spill to, 
then the amount of the network which could be affected by the attack would be 
1/4^depth. I can't imagine it being very different in your distance based 
approach, since we're considering the same kind of distances, from a different 
perspective.

---

> Once I get a node address in the targeted octant, I can commit a nominal 
> amount of Bitcoin into some 2-of-2 with an existing actual node I control and 
> the generated node address, then synthesize some node gossip containing those 
> node addresses.

The way I see it, this potential attack affects the global network generally. 
Somebody could try this regardless of our proposals, and try to flood the whole 
network with spam.

BOLT#7 only states that a node MAY forward information about nodes for which it 
does not know any channels. One could reconsider this approach if constrainted 
on resources, such as to say if depth>X, MUST NOT forward any information about 
nodes with no known channels, or perhaps have a variation of gossip which 
combines node information into channel broadcasts to ensure that such node spam 
can't occur. I think the vagueness of the spec on this rule is inidicative that 
it needs addressing.

To try and mount an attack with greater chance of success, t

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-04 Thread m.a.holden via Lightning-dev
Good morning ZmnSCPxj, thanks for the response.


> I would be hesitant to divide the world in such manner.
> I understand that in typical computer science, splitting your objects up into 
> smaller parts is a long-accepted method of doing things.

> However, when it comes to finances and political power (the power to censor 
> or disrupt), such splitting is highly undesirable.

> Thus I am strongly allergic to things that would "hierarchical"ize or 
> "bucket"ize or even just have a separation between "endpoint" and "core" 
> network.

> I would rather propose for acceptance into BOLT, such proposals that would 
> keep the network homogeneous.

Firstly, I completely agree with you that we should not be splitting up the 
network in any way which nodes or channels can be politically targeted, or in 
which any node is designated special privelege or status. I have the same 
allergy as you and took this into considerations when proposing this, and some 
of your suggestions are along a simlar  line of thinking as mine.

I think you might have misunderstood what I was proposing, but it's probably my 
fault for not expressing it well. With my suggestion, all nodes continue to be 
equal participants and there are no special nodes. I used the term "endpoint" 
previously to mean one of the two nodes which a regular channel belongs to, and 
not to mean some kind of special node. Any node can open channels to any other 
node - the "buckets" are merely a strategy for locally organizing gossip so 
that it can be made more efficient. The term "buckets" is used in the 
descriptions of other DHTs such as Kademlia too, and they don't refer to 
splitting up the global network, they merely provide a perspective for looking 
at subsections of it.

I'm certainly interested in any solutions which keep the network permissionless 
because we really don't want to end up with DNS 2.0 or similar.

> Various nodes may have different resources (BTC, CPU, memory, bandwidth, 
> latency).
> Such inequalities are inevitable in this universe.
> But these nodes should still, as much as we can, remain peers on the network.

My suggestion accounts for the difference in computational requirements, as 
each node can determine its own depth in  the tree based on the approximate 
quantity of information it wishes to gossip. A node could filter information to 
whichever depth of the tree they wished, by setting the bucket size at which 
they spill. This also allows for the size of gossip to dynamically shrink as 
the network grows, and is similar to garbage collection, in which anything 
which isn't part of the destination bucket on spilling is purged.

Nodes could also pick (multiple) specific quadtree buckets to communicate all 
gossip about through filters they negotiate (the filter being the hash prefix 
of the desired bucket). It might be desirable to broadcast their filter as part 
of the gossip itself, so that other nodes can learn who are the better 
information providers, but again this would be completely optional.

---

> This rule can be probabilistically fulfilled by having each node N know *at 
> least* routes to some nodes Ls..., where for all L <- Ls, dist(N, L) < some 
> constant.
> The constant here can be selected by each node independently, depending on 
> its memory and CPU capacity.
(the node knowing more routes than that is perfectly fine)

The quadtree proposes a similar idea, but instead of using some constant, N 
knows about routes to Ls, where each L is within the same bucket. Essentially, 
i < dist(N, L) <= j, where i=BUCKET_MIN and j=BUCKET_MAX. For example, if using 
8-bit ids and the first two bits identify the bucket, then i=, 
j=0011. I guess the equivalent distance using your constant would be 
k=0010, to cover the same range but without discriminating the results 
based on any boundaries like my suggestion does.

> Then, we can have the following global rule:
> * Given three nodes X, Y, and Z, and dist(X, Z) < dist(Y, Z), then X SHOULD 
> be more likely to know a route from itself to Z, than Y to know a route from 
> itself to Z, is also met for nodes which are in different buckets.

This rule is essentially the same as what I was thinking for the distance 
between buckets. With the autopilot suggestion of decreasing the number of 
channels opened as distance increases, the probability of knowing about a 
neighbouring bucket is increased compared with a distant one.

---

> It looks for any node L <- Ls such that dist(L, D) < dist(N, D).
> In fact, it can just sort those nodes according to dist(L, D) and start with 
> the lowest distance, and fail once it reaches a dist(L, D) that exceeds its 
> own dist(N, D).
> The above algorithm converges since dist(N, D) for each N that is delegated 
> to will progressively get smaller until we reach some N that knows the 
> destination D.

This proposal also seems very similar to how existing DHTs work, unless I'm 
mistaken. My problem with the approach of e

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-01 Thread m.a.holden via Lightning-dev
Hi ZmnSCPxj & René.

One way you could have both determinism and encourage a diverse distribution of 
network maps is to treat it as a spatial indexing problem, where the space we 
use is the lexicographical space of the node ids (or hashes of), borrowing some 
similarities from DHTs.

If for example, we take a quadtree, you can take the 2 most-significant bits of 
the public key hash, which would put your node into one of 4 buckets. Nodes 
could advertise a feature bit indicating that they are committed to keeping the 
entire routemap of the bucket which matches their own public key hash, which 
would be all of the nodes in the same bucket - and all of the channels with one 
or both of their endpoints in the bucket.

I'd estimate that with a quadtree, information held by each node could be 
reduced to about 40% of that of the parent node in the quadtree, although the 
real amounts would depend on such things as autopilot defaults (ie, how many 
channels you open to nodes within your bucket versus channels to nodes in other 
buckets). Nodes could decide their own bucket capacities on which they wish to 
spill and reduce the amount of gossip by taking the 2 next most significant 
bits of the PKH, and could go several layers deep.

A node which needs to make a payment to another node within its bucket should 
be able to do so without querying (unless there are no routes with the required 
capacity). If making a payment to another bucket, then there would still exist 
a decent number of channels in the local routemap to nodes in those other 
buckets, and these nodes could be queried to find the second half of a route to 
the destination, or could use JIT routing for the second half, assuming the 
first half of the route can be selected from the local routemap.

In terms of relating this to "locality" in the geographical sense, one could 
create a convention where each bucket represents an approximate physical 
location. The globe can be spatially-indexed as a quadtree by taking a 
tetrahedral map projection (eg, Lee conformal projection[1]). The 4 equalateral 
triangles of the tetrahedron can be infinitely split again into 4 smaller 
equal-sized equalateral triangles for however many layers deep the quadtree 
might be. With this, it might be possible to have a convention where there is a 
relation between the lexicographical space and the geographical space, and 
wallet software would essentially brute force a private key to put you into the 
corresponding bucket to your physical location (trivial for the small number of 
bits we're talking about). Routing would be improved for local trade because 
you would have the entire local topology stored, and would only need to query 
when making payment at distance. (This may raise some privacy concerns which 
would need discussing.)

One issue is that it would result in a very unbalanced tree given that 
population is dense in some areas and sparse in others. To overcome this, 
instead of using a conformal or equal-area projection, we might be able to use 
an equal-population-per-area projection, which I had never heard of such 
projection before but have found some research in regards to producing them[2]. 
Nodes would need to agree on the projection in order for this to work, but the 
work could be done once and the results open sourced and shared between the 
implementations.

Autopilot implementations might also need adjusting to consider distance too. 
As a starting point I would suggest a geometric distribution, where half of 
opened channels should be within the same bucket, a quarter should be to 
sibling buckets, and an eight to cousin buckets, etc. This would result in 
increased probability of routing and reduced querying for local payments - 
paying your local coffee shop should be query-free - and payments to the other 
side of the world might require increased querying.

There are also privacy considerations if nodes indicate their approximate 
locations which would need discussing. What do you think?

Also, this method does not need the be the exclusive way in which gossip is 
communicated between nodes, and one might also combine with something like 
ZmnSCPxj has suggested, for gossiping about the highest capacity nodes. It 
might be also possible to share information about the highest capacity channels 
in a bucket too.

[1]:https://en.wikipedia.org/wiki/Lee_conformal_world_in_a_tetrahedron
[2]:https://www.pnas.org/content/101/20/7499.full

(PS, sorry for the separate thread, LML will not let me subscribe to the list)___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev