[bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-16 Thread Cory Fields via bitcoin-dev
Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted
peers can supply the full outputs when needed, with very little overhead.
Any attempt to spoof those outputs would be apparent, as their hashes would not
be present in the hash set. There are many advantages to this, most apparently
in disk and memory savings, as well as a validation speedup. The primary
disadvantage is a small increase in network traffic. I believe that the
advantages outweigh the disadvantages.

--

Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
set”) has two primary roles: providing proof that previous outputs exist to be
spent, and providing the actual previous output data for verification when new
transactions attempts to spend them. These roles are not usually discussed
independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
compelling reasons to consider them this way.

To see why, consider running a node with the following changes:

- For each new output, gather all extra data that will be needed for
  verification when spending it later as an input: the amount, scriptPubKey,
  creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.).
  Call this the Dereferenced Prevout data.
- Create a hash from the concatenation of the new outpoint and the dereferenced
  prevout data. Call this a Unspent Transaction Output Hash.
- Rather than storing the full dereferenced prevout entries in a UTXO set as is
  currently done, instead store their hashes to an Unspent Transaction Output
  Hash Set, or UHS.
- When relaying a transaction, append the dereferenced prevout for each input.

Now when a transaction is received, it contains everything needed for
verification, including the input amount, height, and coinbaseness, which would
have otherwise required a lookup the UTXO set.

To verify an input's unspentness, again create a hash from the concatenation of
the referenced outpoint and the provided dereferenced prevout, and check for
its presence in the UHS. The hash will only be present if a hash of the exact
same data was previously added to (and not since removed from) the UHS. As
such, we are protected from a peer attempting to lie about the dereferenced
prevout data.

### Some benefits of the UHS model

- Requires no consensus changes, purely a p2p/implementation change.

- UHS is substantially smaller than a full UTXO set (just over half for the
  main chain, see below). In-memory caching can be much more effective as a
  result.

- A block’s transactions can be fully verified before doing a potentially
  expensive database lookup for the previous output data. The UHS can be
  queried afterwards (or in parallel) to verify previous output inclusion.

- Entire blocks could potentially be verified out-of-order because all input
  data is provided; only the inclusion checks have to be in-order. Admittedly
  this is likely too complicated to be realistic.

- pay-to-pubkey outputs are less burdensome on full nodes, since they use no
  more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot and
  Graftroot outputs may share the same benefits.

- The burden of holding UTXO data is technically shifted from the verifiers to
  the spender. In reality, full nodes will likely always have a copy as well,
  but conceptually it's a slight improvement to the incentive model.

- Block data from peers can also be used to roll backwards during a reorg. This
  potentially enables an even more aggressive pruning mode.

- UTXO storage size grows exactly linearly with UTXO count, as opposed to
  growing linearly with UTXO data size. This may be relevant when considering
  new larger output types which would otherwise cause the UTXO Set size to
  increase more quickly.

- The UHS is a simple set, no need for a key-value database. LevelDB could
  potentially be dropped as a dependency in some distant future.

- Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes
  [1]. Unspent Transaction Output Hashes would simply be mapped to points on a
  curve before adding them to the set.

- With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
  could potentially safely verify entire blocks. The size of the required
  proofs would be largely irrelevant as they would be consumed locally.

- Others?

### TxIn De-duplication

Setting aside the potential benefits, the obvious drawback of using a UHS is a
significant network traffic increase. Fortunately, some properties of
transactions can be exploited to offset most of the difference.

For quick reference:

p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG
p2pkh scriptSig:[signature] [pubkey]

p2sh scriptPubKey:  HASH160 [script hash] EQUAL
p2sh scriptSig: [signature(s)] [script]

Notice that if a peer is sending a scriptPubKey and a scriptSig together, as
they would when using a UHS, there would likely be some redundancy. Using a
p2sh output for example, 

[bitcoin-dev] Moving away from BIP37, unsetting NODE_BLOOM

2018-05-16 Thread Caius Cosades via bitcoin-dev
As previously discussed[0][1][2] on the mailing list, github issue commentary, 
and IRC channels, there's substantial reason to disable BIP37 in network nodes 
which are getting stronger as the size of the chain increases. BIP37 has 
significant denial of service issues which are unsolvable in the design, it 
introduces undue load on the bitcoin network  by default, and doesn't provide 
an acceptable amount of security and reliability to "lightweight wallets" as 
originally intended. 

BIP37 allows "lightweight wallets" to connect to nodes in the network, and 
request that they load, deseralize, and expensively apply an arbitrary bloom 
filter to their block files and mempool. This should never have been the role 
of nodes in the network, rather it should have been opt-in, or performed by a 
different piece of software entirely. The inability of the nodes to cache the 
responses or meaningfully rate limit them makes it detrimental to serve these 
requests. 

BIP37 was intended to have stronger privacy than it does in reality[3][4], 
where effectively any node that can capture `filterload` and `filteradd` 
responses can trivially de-anonymize an entire wallet that has connected 
irrespective of the amount of noise they add to their filters. The connected 
node lying by omission is undetectable by any wallet software, where they will 
be lead to believe that there are no matching responses; this is counter-able 
by further destroying privacy and loading down the network by having multiple 
peers simultaneously return filter results and hoping that at least one isn't 
lying. 

NODE_BLOOM has been implemented already which allows nodes to signal in their 
service message that they do, or do not support filtering. I suggest that in 
the next major release this is defaulted to 0, and any software relying on 
BIP37 move to using other filtering options, or another piece of dedicated 
software to serve the requests. Future releases of the reference software 
should remove BIP37 commands entirely. 


[0]: 
https://www.reddit.com/r/Bitcoin/comments/3hjak7/the_hard_work_of_core_devs_not_xt_makes_bitcoin/cu9xntf/?context=3
[1]: https://github.com/bitcoin/bitcoin/issues/6578
[2]: 
https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-August/010535.html
[3]: https://jonasnick.github.io/slides/2016-zurich-meetup.pdf
[4]: https://eprint.iacr.org/2014/763.pdf
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev