Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted
peers can supply the full outputs when needed, with very little overhead.
Any attempt to spoof those outputs would be apparent, as their hashes would not
be present in the hash set. There are many advantages to this, most apparently
in disk and memory savings, as well as a validation speedup. The primary
disadvantage is a small increase in network traffic. I believe that the
advantages outweigh the disadvantages.
--
Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
set”) has two primary roles: providing proof that previous outputs exist to be
spent, and providing the actual previous output data for verification when new
transactions attempts to spend them. These roles are not usually discussed
independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
compelling reasons to consider them this way.
To see why, consider running a node with the following changes:
- For each new output, gather all extra data that will be needed for
verification when spending it later as an input: the amount, scriptPubKey,
creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.).
Call this the Dereferenced Prevout data.
- Create a hash from the concatenation of the new outpoint and the dereferenced
prevout data. Call this a Unspent Transaction Output Hash.
- Rather than storing the full dereferenced prevout entries in a UTXO set as is
currently done, instead store their hashes to an Unspent Transaction Output
Hash Set, or UHS.
- When relaying a transaction, append the dereferenced prevout for each input.
Now when a transaction is received, it contains everything needed for
verification, including the input amount, height, and coinbaseness, which would
have otherwise required a lookup the UTXO set.
To verify an input's unspentness, again create a hash from the concatenation of
the referenced outpoint and the provided dereferenced prevout, and check for
its presence in the UHS. The hash will only be present if a hash of the exact
same data was previously added to (and not since removed from) the UHS. As
such, we are protected from a peer attempting to lie about the dereferenced
prevout data.
### Some benefits of the UHS model
- Requires no consensus changes, purely a p2p/implementation change.
- UHS is substantially smaller than a full UTXO set (just over half for the
main chain, see below). In-memory caching can be much more effective as a
result.
- A block’s transactions can be fully verified before doing a potentially
expensive database lookup for the previous output data. The UHS can be
queried afterwards (or in parallel) to verify previous output inclusion.
- Entire blocks could potentially be verified out-of-order because all input
data is provided; only the inclusion checks have to be in-order. Admittedly
this is likely too complicated to be realistic.
- pay-to-pubkey outputs are less burdensome on full nodes, since they use no
more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot and
Graftroot outputs may share the same benefits.
- The burden of holding UTXO data is technically shifted from the verifiers to
the spender. In reality, full nodes will likely always have a copy as well,
but conceptually it's a slight improvement to the incentive model.
- Block data from peers can also be used to roll backwards during a reorg. This
potentially enables an even more aggressive pruning mode.
- UTXO storage size grows exactly linearly with UTXO count, as opposed to
growing linearly with UTXO data size. This may be relevant when considering
new larger output types which would otherwise cause the UTXO Set size to
increase more quickly.
- The UHS is a simple set, no need for a key-value database. LevelDB could
potentially be dropped as a dependency in some distant future.
- Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes
[1]. Unspent Transaction Output Hashes would simply be mapped to points on a
curve before adding them to the set.
- With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
could potentially safely verify entire blocks. The size of the required
proofs would be largely irrelevant as they would be consumed locally.
- Others?
### TxIn De-duplication
Setting aside the potential benefits, the obvious drawback of using a UHS is a
significant network traffic increase. Fortunately, some properties of
transactions can be exploited to offset most of the difference.
For quick reference:
p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG
p2pkh scriptSig:[signature] [pubkey]
p2sh scriptPubKey: HASH160 [script hash] EQUAL
p2sh scriptSig: [signature(s)] [script]
Notice that if a peer is sending a scriptPubKey and a scriptSig together, as
they would when using a UHS, there would likely be some redundancy. Using a
p2sh output for example,