Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
I generally like the direction of this proposal in terms of allowing full nodes to run with a different storage/bandwidth tradeoff. Cory, were this implemented, would you expect Core to support both operating modes (full UTXO set and UHS) depending on user configuration, or would UHS be mandatory? Also, given that Bram Cohen's TXO bitfield proposal was an inspiration for this, could you comment on why the UHS is preferable to that approach? An alternative that goes even further in the direction of more bandwidth, less storage, would be for nodes to simply maintain a Merkle Mountain Range over all TXOs in order of creation and a spentness bitfield. Blocks could be requested with the prev outputs and a Merkle proof linking them into the MMR root. Since the Merkle proof is deterministic, it could be computed by archive nodes and miners and saved alongside the block data for relay. Another benefit of this is the TXO MMR root may be independently useful if committed into the coinbase transaction. On Thu, Jun 7, 2018 at 7:02 AM Sjors Provoost via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > eMMC storage, which low end devices often use, come in 2x increments. > Running a pruned full node on 8 GB is difficult if not impossible (the UTXO > set peaked at 3.5 GB in January, but a full node stores additional stuff). > > However, 16 GB is only €10 more expensive and presumably standard by the > time this would be rolled out. > > On AWS every GB of SSD storage avoided saves $1 per year, not end of the > world stuff, but not negligible either. Outbound traffic costs $0.10 / GB > (ignoring free allowance), so when uploading 200 GB per year, the 5% would > offset $1 of storage cost savings. > > The above seems marginal, probably not worth it unless there’s really no > downside. > > What I find attractive about this proposal is the ability to squeeze more > out of limited RAM (typically only 1 or 2 GB on these low end devices). I’d > have to test Cory’s branch to see if that actually matters in practice. > > It’s also useful to distinguish benefits during initial sync from ongoing > operation. The former I’ve almost given up on for low end devices (can > take weeks), in favor of doing it on a faster computer and copying the > result. The latter needs far less RAM, so perhaps this proposal doesn’t > help much there, but that would be useful to measure. > > Did you try the recent SHA256 optimizations on your branch? > > Sjors > > > Op 17 mei 2018, om 18:56 heeft Gregory Maxwell via bitcoin-dev < > bitcoin-dev@lists.linuxfoundation.org> het volgende geschreven: > > > > On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev > > wrote: > >> Tl;dr: Rather than storing all unspent outputs, store their hashes. > > > > My initial thoughts are it's not _completely_ obvious to me that a 5% > > ongoing bandwidth increase is actually a win to get something like a > > 40% reduction in the size of a pruned node (and less than a 1% > > reduction in an archive node) primarily because I've not seen size of > > a pruned node cited as a usage limiting factor basically anywhere. I > > would assume it is a win but wouldn't be shocked to see a careful > > analysis that concluded it wasn't. > > > > But perhaps more interestingly, I think the overhead is not really 5%, > > but it's 5% measured in the context of the phenomenally inefficient tx > > mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ). > > Napkin math on the size of a txn alone tells me it's more like a 25% > > increase if you just consider size of tx vs size of > > tx+scriptpubkeys,amounts. If I'm not missing something there, I think > > that would get in into a very clear not-win range. > > > > On the positive side is that it doesn't change the blockchain > > datastructure, so it's something implementations could do without > > marrying the network to it forever. > > > > ___ > bitcoin-dev mailing list > bitcoin-dev@lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev > ___ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
eMMC storage, which low end devices often use, come in 2x increments. Running a pruned full node on 8 GB is difficult if not impossible (the UTXO set peaked at 3.5 GB in January, but a full node stores additional stuff). However, 16 GB is only €10 more expensive and presumably standard by the time this would be rolled out. On AWS every GB of SSD storage avoided saves $1 per year, not end of the world stuff, but not negligible either. Outbound traffic costs $0.10 / GB (ignoring free allowance), so when uploading 200 GB per year, the 5% would offset $1 of storage cost savings. The above seems marginal, probably not worth it unless there’s really no downside. What I find attractive about this proposal is the ability to squeeze more out of limited RAM (typically only 1 or 2 GB on these low end devices). I’d have to test Cory’s branch to see if that actually matters in practice. It’s also useful to distinguish benefits during initial sync from ongoing operation. The former I’ve almost given up on for low end devices (can take weeks), in favor of doing it on a faster computer and copying the result. The latter needs far less RAM, so perhaps this proposal doesn’t help much there, but that would be useful to measure. Did you try the recent SHA256 optimizations on your branch? Sjors > Op 17 mei 2018, om 18:56 heeft Gregory Maxwell via bitcoin-dev > het volgende geschreven: > > On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev > wrote: >> Tl;dr: Rather than storing all unspent outputs, store their hashes. > > My initial thoughts are it's not _completely_ obvious to me that a 5% > ongoing bandwidth increase is actually a win to get something like a > 40% reduction in the size of a pruned node (and less than a 1% > reduction in an archive node) primarily because I've not seen size of > a pruned node cited as a usage limiting factor basically anywhere. I > would assume it is a win but wouldn't be shocked to see a careful > analysis that concluded it wasn't. > > But perhaps more interestingly, I think the overhead is not really 5%, > but it's 5% measured in the context of the phenomenally inefficient tx > mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ). > Napkin math on the size of a txn alone tells me it's more like a 25% > increase if you just consider size of tx vs size of > tx+scriptpubkeys,amounts. If I'm not missing something there, I think > that would get in into a very clear not-win range. > > On the positive side is that it doesn't change the blockchain > datastructure, so it's something implementations could do without > marrying the network to it forever. > ___ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
You should read this: https://bitcointalk.org/index.php?topic=153662.10 On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev < bitcoin-dev@lists.linuxfoundation.org> wrote: > Tl;dr: Rather than storing all unspent outputs, store their hashes. > Untrusted > peers can supply the full outputs when needed, with very little overhead. > Any attempt to spoof those outputs would be apparent, as their hashes > would not > be present in the hash set. There are many advantages to this, most > apparently > in disk and memory savings, as well as a validation speedup. The primary > disadvantage is a small increase in network traffic. I believe that the > advantages outweigh the disadvantages. > > -- > > Bitcoin’s unspent transaction output set (usually referred to as “The UTXO > set”) has two primary roles: providing proof that previous outputs exist > to be > spent, and providing the actual previous output data for verification when > new > transactions attempts to spend them. These roles are not usually discussed > independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are > compelling reasons to consider them this way. > > To see why, consider running a node with the following changes: > > - For each new output, gather all extra data that will be needed for > verification when spending it later as an input: the amount, > scriptPubKey, > creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, > etc.). > Call this the Dereferenced Prevout data. > - Create a hash from the concatenation of the new outpoint and the > dereferenced > prevout data. Call this a Unspent Transaction Output Hash. > - Rather than storing the full dereferenced prevout entries in a UTXO set > as is > currently done, instead store their hashes to an Unspent Transaction > Output > Hash Set, or UHS. > - When relaying a transaction, append the dereferenced prevout for each > input. > > Now when a transaction is received, it contains everything needed for > verification, including the input amount, height, and coinbaseness, which > would > have otherwise required a lookup the UTXO set. > > To verify an input's unspentness, again create a hash from the > concatenation of > the referenced outpoint and the provided dereferenced prevout, and check > for > its presence in the UHS. The hash will only be present if a hash of the > exact > same data was previously added to (and not since removed from) the UHS. As > such, we are protected from a peer attempting to lie about the dereferenced > prevout data. > > ### Some benefits of the UHS model > > - Requires no consensus changes, purely a p2p/implementation change. > > - UHS is substantially smaller than a full UTXO set (just over half for the > main chain, see below). In-memory caching can be much more effective as a > result. > > - A block’s transactions can be fully verified before doing a potentially > expensive database lookup for the previous output data. The UHS can be > queried afterwards (or in parallel) to verify previous output inclusion. > > - Entire blocks could potentially be verified out-of-order because all > input > data is provided; only the inclusion checks have to be in-order. > Admittedly > this is likely too complicated to be realistic. > > - pay-to-pubkey outputs are less burdensome on full nodes, since they use > no > more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. > Taproot and > Graftroot outputs may share the same benefits. > > - The burden of holding UTXO data is technically shifted from the > verifiers to > the spender. In reality, full nodes will likely always have a copy as > well, > but conceptually it's a slight improvement to the incentive model. > > - Block data from peers can also be used to roll backwards during a reorg. > This > potentially enables an even more aggressive pruning mode. > > - UTXO storage size grows exactly linearly with UTXO count, as opposed to > growing linearly with UTXO data size. This may be relevant when > considering > new larger output types which would otherwise cause the UTXO Set size to > increase more quickly. > > - The UHS is a simple set, no need for a key-value database. LevelDB could > potentially be dropped as a dependency in some distant future. > > - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set > hashes > [1]. Unspent Transaction Output Hashes would simply be mapped to points > on a > curve before adding them to the set. > > - With the help of inclusion proofs and rolling hashes, libbitcoinconsensus > could potentially safely verify entire blocks. The size of the required > proofs would be largely irrelevant as they would be consumed locally. > > - Others? > > ### TxIn De-duplication > > Setting aside the potential benefits, the obvious drawback of using a UHS > is a > significant network traffic increase. Fortunately, some properties of > transactions can be exploited to offset most of the difference. > > For q
Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
Matt: That's a good point. I'll do up a chart comparing utxo size at each block, as well as comparing over-the-wire size for each block. I think the period of coalescing earlier this year should be a good example of what you're describing. Greg: heh, I was wondering how long it would take for someone to point out that I'm cheating. I avoided using the word "compression", mostly to side-step having the discussion turning into reworking the wire serialization. But you're absolutely right, the de-duping benefits are independent of the UHS use-case. Cory On Thu, May 17, 2018, 12:56 PM Gregory Maxwell wrote: > On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev > wrote: > > Tl;dr: Rather than storing all unspent outputs, store their hashes. > > My initial thoughts are it's not _completely_ obvious to me that a 5% > ongoing bandwidth increase is actually a win to get something like a > 40% reduction in the size of a pruned node (and less than a 1% > reduction in an archive node) primarily because I've not seen size of > a pruned node cited as a usage limiting factor basically anywhere. I > would assume it is a win but wouldn't be shocked to see a careful > analysis that concluded it wasn't. > > But perhaps more interestingly, I think the overhead is not really 5%, > but it's 5% measured in the context of the phenomenally inefficient tx > mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ). > Napkin math on the size of a txn alone tells me it's more like a 25% > increase if you just consider size of tx vs size of > tx+scriptpubkeys,amounts. If I'm not missing something there, I think > that would get in into a very clear not-win range. > > On the positive side is that it doesn't change the blockchain > datastructure, so it's something implementations could do without > marrying the network to it forever. > ___ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
On Wed, May 16, 2018 at 4:36 PM, Cory Fields via bitcoin-dev wrote: > Tl;dr: Rather than storing all unspent outputs, store their hashes. My initial thoughts are it's not _completely_ obvious to me that a 5% ongoing bandwidth increase is actually a win to get something like a 40% reduction in the size of a pruned node (and less than a 1% reduction in an archive node) primarily because I've not seen size of a pruned node cited as a usage limiting factor basically anywhere. I would assume it is a win but wouldn't be shocked to see a careful analysis that concluded it wasn't. But perhaps more interestingly, I think the overhead is not really 5%, but it's 5% measured in the context of the phenomenally inefficient tx mechanisms ( https://bitcointalk.org/index.php?topic=1377345.0 ). Napkin math on the size of a txn alone tells me it's more like a 25% increase if you just consider size of tx vs size of tx+scriptpubkeys,amounts. If I'm not missing something there, I think that would get in into a very clear not-win range. On the positive side is that it doesn't change the blockchain datastructure, so it's something implementations could do without marrying the network to it forever. ___ bitcoin-dev mailing list bitcoin-dev@lists.linuxfoundation.org https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
Hey Cory, I'm generally a fan of having an option to "prove a block is valid when relaying it" instead of "just relay it", but I am concerned that this proposal is overfitting the current UTXO set. Specifically, because UTXO entries are (roughly) 32 bytes per output plus 32 bytes per transaction on disk today, a material increase in batching and many-output transactions may significantly reduce the UTXO-set-size gain in this proposal while adding complexity to block relay as well as increase the size of block data relayed, which can have adverse effects on propagation. I'd love to see your tests re-run on simulated transaction data with more batching of sends. Matt On 05/16/18 12:36, Cory Fields via bitcoin-dev wrote: > Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted > peers can supply the full outputs when needed, with very little overhead. > Any attempt to spoof those outputs would be apparent, as their hashes would > not > be present in the hash set. There are many advantages to this, most apparently > in disk and memory savings, as well as a validation speedup. The primary > disadvantage is a small increase in network traffic. I believe that the > advantages outweigh the disadvantages. > > -- > > Bitcoin’s unspent transaction output set (usually referred to as “The UTXO > set”) has two primary roles: providing proof that previous outputs exist to be > spent, and providing the actual previous output data for verification when new > transactions attempts to spend them. These roles are not usually discussed > independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are > compelling reasons to consider them this way. > > To see why, consider running a node with the following changes: > > - For each new output, gather all extra data that will be needed for > verification when spending it later as an input: the amount, scriptPubKey, > creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.). > Call this the Dereferenced Prevout data. > - Create a hash from the concatenation of the new outpoint and the > dereferenced > prevout data. Call this a Unspent Transaction Output Hash. > - Rather than storing the full dereferenced prevout entries in a UTXO set as > is > currently done, instead store their hashes to an Unspent Transaction Output > Hash Set, or UHS. > - When relaying a transaction, append the dereferenced prevout for each input. > > Now when a transaction is received, it contains everything needed for > verification, including the input amount, height, and coinbaseness, which > would > have otherwise required a lookup the UTXO set. > > To verify an input's unspentness, again create a hash from the concatenation > of > the referenced outpoint and the provided dereferenced prevout, and check for > its presence in the UHS. The hash will only be present if a hash of the exact > same data was previously added to (and not since removed from) the UHS. As > such, we are protected from a peer attempting to lie about the dereferenced > prevout data. > > ### Some benefits of the UHS model > > - Requires no consensus changes, purely a p2p/implementation change. > > - UHS is substantially smaller than a full UTXO set (just over half for the > main chain, see below). In-memory caching can be much more effective as a > result. > > - A block’s transactions can be fully verified before doing a potentially > expensive database lookup for the previous output data. The UHS can be > queried afterwards (or in parallel) to verify previous output inclusion. > > - Entire blocks could potentially be verified out-of-order because all input > data is provided; only the inclusion checks have to be in-order. Admittedly > this is likely too complicated to be realistic. > > - pay-to-pubkey outputs are less burdensome on full nodes, since they use no > more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot > and > Graftroot outputs may share the same benefits. > > - The burden of holding UTXO data is technically shifted from the verifiers to > the spender. In reality, full nodes will likely always have a copy as well, > but conceptually it's a slight improvement to the incentive model. > > - Block data from peers can also be used to roll backwards during a reorg. > This > potentially enables an even more aggressive pruning mode. > > - UTXO storage size grows exactly linearly with UTXO count, as opposed to > growing linearly with UTXO data size. This may be relevant when considering > new larger output types which would otherwise cause the UTXO Set size to > increase more quickly. > > - The UHS is a simple set, no need for a key-value database. LevelDB could > potentially be dropped as a dependency in some distant future. > > - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes > [1]. Unspent Transaction Output Hashes would simply be mapped to points on a > curve
[bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set
Tl;dr: Rather than storing all unspent outputs, store their hashes. Untrusted peers can supply the full outputs when needed, with very little overhead. Any attempt to spoof those outputs would be apparent, as their hashes would not be present in the hash set. There are many advantages to this, most apparently in disk and memory savings, as well as a validation speedup. The primary disadvantage is a small increase in network traffic. I believe that the advantages outweigh the disadvantages. -- Bitcoin’s unspent transaction output set (usually referred to as “The UTXO set”) has two primary roles: providing proof that previous outputs exist to be spent, and providing the actual previous output data for verification when new transactions attempts to spend them. These roles are not usually discussed independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are compelling reasons to consider them this way. To see why, consider running a node with the following changes: - For each new output, gather all extra data that will be needed for verification when spending it later as an input: the amount, scriptPubKey, creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh, etc.). Call this the Dereferenced Prevout data. - Create a hash from the concatenation of the new outpoint and the dereferenced prevout data. Call this a Unspent Transaction Output Hash. - Rather than storing the full dereferenced prevout entries in a UTXO set as is currently done, instead store their hashes to an Unspent Transaction Output Hash Set, or UHS. - When relaying a transaction, append the dereferenced prevout for each input. Now when a transaction is received, it contains everything needed for verification, including the input amount, height, and coinbaseness, which would have otherwise required a lookup the UTXO set. To verify an input's unspentness, again create a hash from the concatenation of the referenced outpoint and the provided dereferenced prevout, and check for its presence in the UHS. The hash will only be present if a hash of the exact same data was previously added to (and not since removed from) the UHS. As such, we are protected from a peer attempting to lie about the dereferenced prevout data. ### Some benefits of the UHS model - Requires no consensus changes, purely a p2p/implementation change. - UHS is substantially smaller than a full UTXO set (just over half for the main chain, see below). In-memory caching can be much more effective as a result. - A block’s transactions can be fully verified before doing a potentially expensive database lookup for the previous output data. The UHS can be queried afterwards (or in parallel) to verify previous output inclusion. - Entire blocks could potentially be verified out-of-order because all input data is provided; only the inclusion checks have to be in-order. Admittedly this is likely too complicated to be realistic. - pay-to-pubkey outputs are less burdensome on full nodes, since they use no more space on-disk than pay-to-pubkey-hash or pay-to-script-hash. Taproot and Graftroot outputs may share the same benefits. - The burden of holding UTXO data is technically shifted from the verifiers to the spender. In reality, full nodes will likely always have a copy as well, but conceptually it's a slight improvement to the incentive model. - Block data from peers can also be used to roll backwards during a reorg. This potentially enables an even more aggressive pruning mode. - UTXO storage size grows exactly linearly with UTXO count, as opposed to growing linearly with UTXO data size. This may be relevant when considering new larger output types which would otherwise cause the UTXO Set size to increase more quickly. - The UHS is a simple set, no need for a key-value database. LevelDB could potentially be dropped as a dependency in some distant future. - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set hashes [1]. Unspent Transaction Output Hashes would simply be mapped to points on a curve before adding them to the set. - With the help of inclusion proofs and rolling hashes, libbitcoinconsensus could potentially safely verify entire blocks. The size of the required proofs would be largely irrelevant as they would be consumed locally. - Others? ### TxIn De-duplication Setting aside the potential benefits, the obvious drawback of using a UHS is a significant network traffic increase. Fortunately, some properties of transactions can be exploited to offset most of the difference. For quick reference: p2pkh scriptPubKey: DUP HASH160 [pubkey hash] EQUALVERIFY CHECKSIG p2pkh scriptSig:[signature] [pubkey] p2sh scriptPubKey: HASH160 [script hash] EQUAL p2sh scriptSig: [signature(s)] [script] Notice that if a peer is sending a scriptPubKey and a scriptSig together, as they would when using a UHS, there would likely be some redundancy. Using a p2sh output for example, th