Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-04 Thread m.a.holden via Lightning-dev
Good morning ZmnSCPxj, thanks for the response.


> I would be hesitant to divide the world in such manner.
> I understand that in typical computer science, splitting your objects up into 
> smaller parts is a long-accepted method of doing things.

> However, when it comes to finances and political power (the power to censor 
> or disrupt), such splitting is highly undesirable.

> Thus I am strongly allergic to things that would "hierarchical"ize or 
> "bucket"ize or even just have a separation between "endpoint" and "core" 
> network.

> I would rather propose for acceptance into BOLT, such proposals that would 
> keep the network homogeneous.

Firstly, I completely agree with you that we should not be splitting up the 
network in any way which nodes or channels can be politically targeted, or in 
which any node is designated special privelege or status. I have the same 
allergy as you and took this into considerations when proposing this, and some 
of your suggestions are along a simlar  line of thinking as mine.

I think you might have misunderstood what I was proposing, but it's probably my 
fault for not expressing it well. With my suggestion, all nodes continue to be 
equal participants and there are no special nodes. I used the term "endpoint" 
previously to mean one of the two nodes which a regular channel belongs to, and 
not to mean some kind of special node. Any node can open channels to any other 
node - the "buckets" are merely a strategy for locally organizing gossip so 
that it can be made more efficient. The term "buckets" is used in the 
descriptions of other DHTs such as Kademlia too, and they don't refer to 
splitting up the global network, they merely provide a perspective for looking 
at subsections of it.

I'm certainly interested in any solutions which keep the network permissionless 
because we really don't want to end up with DNS 2.0 or similar.

> Various nodes may have different resources (BTC, CPU, memory, bandwidth, 
> latency).
> Such inequalities are inevitable in this universe.
> But these nodes should still, as much as we can, remain peers on the network.

My suggestion accounts for the difference in computational requirements, as 
each node can determine its own depth in  the tree based on the approximate 
quantity of information it wishes to gossip. A node could filter information to 
whichever depth of the tree they wished, by setting the bucket size at which 
they spill. This also allows for the size of gossip to dynamically shrink as 
the network grows, and is similar to garbage collection, in which anything 
which isn't part of the destination bucket on spilling is purged.

Nodes could also pick (multiple) specific quadtree buckets to communicate all 
gossip about through filters they negotiate (the filter being the hash prefix 
of the desired bucket). It might be desirable to broadcast their filter as part 
of the gossip itself, so that other nodes can learn who are the better 
information providers, but again this would be completely optional.

---

> This rule can be probabilistically fulfilled by having each node N know *at 
> least* routes to some nodes Ls..., where for all L <- Ls, dist(N, L) < some 
> constant.
> The constant here can be selected by each node independently, depending on 
> its memory and CPU capacity.
(the node knowing more routes than that is perfectly fine)

The quadtree proposes a similar idea, but instead of using some constant, N 
knows about routes to Ls, where each L is within the same bucket. Essentially, 
i < dist(N, L) <= j, where i=BUCKET_MIN and j=BUCKET_MAX. For example, if using 
8-bit ids and the first two bits identify the bucket, then i=, 
j=0011. I guess the equivalent distance using your constant would be 
k=0010, to cover the same range but without discriminating the results 
based on any boundaries like my suggestion does.

> Then, we can have the following global rule:
> * Given three nodes X, Y, and Z, and dist(X, Z) < dist(Y, Z), then X SHOULD 
> be more likely to know a route from itself to Z, than Y to know a route from 
> itself to Z, is also met for nodes which are in different buckets.

This rule is essentially the same as what I was thinking for the distance 
between buckets. With the autopilot suggestion of decreasing the number of 
channels opened as distance increases, the probability of knowing about a 
neighbouring bucket is increased compared with a distant one.

---

> It looks for any node L <- Ls such that dist(L, D) < dist(N, D).
> In fact, it can just sort those nodes according to dist(L, D) and start with 
> the lowest distance, and fail once it reaches a dist(L, D) that exceeds its 
> own dist(N, D).
> The above algorithm converges since dist(N, D) for each N that is delegated 
> to will progressively get smaller until we reach some N that knows the 
> destination D.

This proposal also seems very similar to how existing DHTs work, unless I'm 
mistaken. My problem with the approach of 

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-05 Thread m.a.holden via Lightning-dev
Good morning ZmnSCPxj,

I'll try to clarify my proposal further, but also have some questions about 
yours.

---

> Now, it seems to me what you propose, is to have octrees contain octrees, and 
> so on.

There's one global tree, which is the same for all users. Every node in the 
tree has a bucket and exactly 4 child nodes, except leaves have no children. 
The tree has a max depth, which each client sets itself. The theoretical 
maximum being HASHLEN/2 (In practice, we're likely to be concerned about <8). 
Note that every parent's bucket contains all of the information of all of its 
children's buckets - meaning the root node of the global quadtree is equivalent 
to the unfiltered global network. Nodes pick a depth and concern themselves 
with the bucket at that depth, unless it overflows, in which case they increase 
the depth by 1.

> Now let us return, to my proposal of a distance measurement.
> This effectively maps the LN universe onto a circle.
> Each node commits itself to storing an arc of this circle, and possibly 
> various other nodes it happens to be directly connected to that may be far 
> from the arc near it.

The quadtree can also be looked at from the perspective of a circle, assuming 
there is a reference point P on it. Each bucket is represented by an arc of 
size 2pi/(4^depth), and the PKH-prefix represents a displacement of the arc 
from P (Eg, the bucket 00b would represent an arc from P to pi/2+P). Bucket 00b 
includes all gossip information for the buckets b, 0001b, 0010b and 0011b, 
which are sub-arcs of it, and so forth. Spilling a bucket is the same as 
narrowing the arc to 1/4 its previous size.

Although different from your arbitrary distance k (arc size 2*k?), they're not 
too dissimilar in the kind of distance covered - with the primary distinction 
being that mine proposes using set interval ranges (based on powers of 4), and 
the node picks the range which it's PKH fits into, rather than the range being 
centred on the node.

---

> I can overload those N nodes by generating node addresses that also lie in 
> that octant.
> By just iterating over scalars, about 1/8 of the generated node addresses 
> will lie in the target octant.

If you target the first layer of the tree only, then once nodes spill to the 
second layer, they will filter out any gossip which does not concern them. It 
isn't sufficient to just brute force the first 2 bits, but you need to do it 
for 2^depth, where depth is the target's maximum they're willing to spill to 
(which is not shared).

However, commodity hardware can attempt millions of ec_mult/hash per second, so 
getting a node into the bucket you want is trivial anyway for small depth.

> In order to disrupt a particular node, I must place fake nodes near that 
> node, in an attempt to force its neighbors to reduce the arc of the circle 
> that they can map.
> However, I need to generate fake nodes that are nearer to that node than 
> genuine honest nodes.
> This means the probability of me generating such node addresses are much 
> lower than 1/8, thus requiring that I devote more energy to generating the 
> falsified attack nodes.

I would argue that the above is also true in your approach. It would still be 
trivial to brute force the prefixes which minimize the distance between 
themselves and the target, even with a network of tens of millions of nodes. 
The amount of energy which would need devoting does not seem like it would a 
deciding factor for this attack.

> Further, in executing this attack, while I disrupt one node very well, and 
> nearby nodes somewhat, my effect will be far less than disrupting 1/8 of the 
> network.

Since the attack needs to target the maximum depth that nodes might spill to, 
then the amount of the network which could be affected by the attack would be 
1/4^depth. I can't imagine it being very different in your distance based 
approach, since we're considering the same kind of distances, from a different 
perspective.

---

> Once I get a node address in the targeted octant, I can commit a nominal 
> amount of Bitcoin into some 2-of-2 with an existing actual node I control and 
> the generated node address, then synthesize some node gossip containing those 
> node addresses.

The way I see it, this potential attack affects the global network generally. 
Somebody could try this regardless of our proposals, and try to flood the whole 
network with spam.

BOLT#7 only states that a node MAY forward information about nodes for which it 
does not know any channels. One could reconsider this approach if constrainted 
on resources, such as to say if depth>X, MUST NOT forward any information about 
nodes with no known channels, or perhaps have a variation of gossip which 
combines node information into channel broadcasts to ensure that such node spam 
can't occur. I think the vagueness of the spec on this rule is inidicative that 
it needs addressing.

To try and mount an attack with greater chance of success, 

Re: [Lightning-dev] Routemap scaling (was: Just in Time Routing (JIT-Routing) and a channel rebalancing heuristic as an add on for improved routing success in BOLT 1.0)

2019-04-01 Thread m.a.holden via Lightning-dev
Hi ZmnSCPxj & René.

One way you could have both determinism and encourage a diverse distribution of 
network maps is to treat it as a spatial indexing problem, where the space we 
use is the lexicographical space of the node ids (or hashes of), borrowing some 
similarities from DHTs.

If for example, we take a quadtree, you can take the 2 most-significant bits of 
the public key hash, which would put your node into one of 4 buckets. Nodes 
could advertise a feature bit indicating that they are committed to keeping the 
entire routemap of the bucket which matches their own public key hash, which 
would be all of the nodes in the same bucket - and all of the channels with one 
or both of their endpoints in the bucket.

I'd estimate that with a quadtree, information held by each node could be 
reduced to about 40% of that of the parent node in the quadtree, although the 
real amounts would depend on such things as autopilot defaults (ie, how many 
channels you open to nodes within your bucket versus channels to nodes in other 
buckets). Nodes could decide their own bucket capacities on which they wish to 
spill and reduce the amount of gossip by taking the 2 next most significant 
bits of the PKH, and could go several layers deep.

A node which needs to make a payment to another node within its bucket should 
be able to do so without querying (unless there are no routes with the required 
capacity). If making a payment to another bucket, then there would still exist 
a decent number of channels in the local routemap to nodes in those other 
buckets, and these nodes could be queried to find the second half of a route to 
the destination, or could use JIT routing for the second half, assuming the 
first half of the route can be selected from the local routemap.

In terms of relating this to "locality" in the geographical sense, one could 
create a convention where each bucket represents an approximate physical 
location. The globe can be spatially-indexed as a quadtree by taking a 
tetrahedral map projection (eg, Lee conformal projection[1]). The 4 equalateral 
triangles of the tetrahedron can be infinitely split again into 4 smaller 
equal-sized equalateral triangles for however many layers deep the quadtree 
might be. With this, it might be possible to have a convention where there is a 
relation between the lexicographical space and the geographical space, and 
wallet software would essentially brute force a private key to put you into the 
corresponding bucket to your physical location (trivial for the small number of 
bits we're talking about). Routing would be improved for local trade because 
you would have the entire local topology stored, and would only need to query 
when making payment at distance. (This may raise some privacy concerns which 
would need discussing.)

One issue is that it would result in a very unbalanced tree given that 
population is dense in some areas and sparse in others. To overcome this, 
instead of using a conformal or equal-area projection, we might be able to use 
an equal-population-per-area projection, which I had never heard of such 
projection before but have found some research in regards to producing them[2]. 
Nodes would need to agree on the projection in order for this to work, but the 
work could be done once and the results open sourced and shared between the 
implementations.

Autopilot implementations might also need adjusting to consider distance too. 
As a starting point I would suggest a geometric distribution, where half of 
opened channels should be within the same bucket, a quarter should be to 
sibling buckets, and an eight to cousin buckets, etc. This would result in 
increased probability of routing and reduced querying for local payments - 
paying your local coffee shop should be query-free - and payments to the other 
side of the world might require increased querying.

There are also privacy considerations if nodes indicate their approximate 
locations which would need discussing. What do you think?

Also, this method does not need the be the exclusive way in which gossip is 
communicated between nodes, and one might also combine with something like 
ZmnSCPxj has suggested, for gossiping about the highest capacity nodes. It 
might be also possible to share information about the highest capacity channels 
in a bucket too.

[1]:https://en.wikipedia.org/wiki/Lee_conformal_world_in_a_tetrahedron
[2]:https://www.pnas.org/content/101/20/7499.full

(PS, sorry for the separate thread, LML will not let me subscribe to the list)___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev


Re: [Lightning-dev] Decoy node_ids and short_channel_ids

2020-02-02 Thread m.a.holden via Lightning-dev
> (I'm seeking a clever way that Bob can assign them and trivially tell
> which ID is assigned to which peer, but I can't figure it out, so I
> guess Bob keeps a mapping and restricts each peer to 256 live scids?).

Hi Rusty.

Here's a potential way for Alice and Bob to agree a set of 256 scids without 
any additional messages or changes to existing messages beyond a feature flag 
and a flag in open_channel, but comes with a computational cost.

Alice and Bob agree on a random integer `r`. This could be negotiated on 
`open_channel`, but we shouldn't need to send additional information because we 
already have a random integer we can use: the `temporary_channel_id`. This is 
not known to anybody besides Alice and Bob.

When a channel is locked, Bob computes n=256 scids, using something 
approximating `concat(n, trunc_bytes(sha256(ec_mult(2^n*r, Q)), 7))`, where `Q` 
is Alice's public key for the channel funding transaction.

The chance of scid collisions between channels is 2^56, which is probably no 
cause for concern.

Instead of keeping a map of 256 scids for each channel, Bob can use a cuckoo 
filter for efficiency. The filter can be used for a quick membership test and 
also as an associative map from scids to channels. It can also support scid 
deletion in the event of channel closure (at the cost of recomputing 256 
ec_mults again).

So when Bob receives a new HTLC to forward, he tests it against his cuckoo 
filter and retreives a candidate set of possible channels to which it may 
refer. For each channel, he takes the most significant byte of the scid as `m` 
and performs `trunc_bytes(sha256(ec_mult(2^m*r, Q)), 7)` and tests the 
least-significant 7 bytes of the result against the scid.

Alice does not need to keep all of the scids she may use for invoices because 
they can be computed on the fly, but she will need to keep a copy of the 
`temporary_channel_id`.

In the reverse direction of Alice forwarding HTLCs to Bob, Bob's public key for 
the funding transaction is used instead.

Regards,
Mark Holden
___
Lightning-dev mailing list
Lightning-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/lightning-dev