Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-11 Thread Tomas via bitcoin-dev


On Tue, Apr 11, 2017, at 11:41, Eric Voskuil wrote:
> It's not the headers/tx-hashes of the blocks that I'm referring to, it
> is the confirmation and spend information relative to all txs and all
> outputs for each branch. This reverse navigation (i.e. utxo
> information) is essential, must be persistent and is branch-relative.

That is exactly what is stored in the spend-tree. 

>> As a simpler example, if two miners both mine a block at
>> approximately the same time and send it to each other, then surely
>> they would want to continue mining on their own block. Otherwise
>> they would be throwing away their own reward.

> That's not your concurrent validation scenario. In the scenario you
> described, the person chooses the weaker block of two that require
> validation because it's better somehow, not because it's his own
> (which does not require validation).

> Consistency is reached, despite seeing things at different times,
> because people use the same rules. If the economy ran on arbitrary
> block preference consistency would be elusive.

No but my example shows  that it is up to the miner to choose which tip
to work on. This is not using different rules, it is just optimizing its
income. This means that the economy *does* run on arbitrary "block
preference", even if it is not running on arbitrary rules.

If two blocks are competing, a miner could optimize its decision which
to mine on, not just on whether one of the blocks is his own, but also
on fees, or on excessive validation costs.

> I read this as encoding the height at which a fork historically
> activated. If you intend to track activation for each branch that will
> not be "height-based" it will be history based.

I understand "height-based" was not the right wording, as it is of
course branch-specific. Per tip ruleset metadata, must be matched with
per-transaction ruleset metadata.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-11 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/11/2017 01:43 AM, Tomas wrote:
> Splitting transactions only happens *on storage* and is just a
> minor optimization compared to storing them in full.

Ok

> Sure, we can still call switching tips a "reorg". And it is indeed
> a trade off as orphan blocks are stored, but a block in the spend
> tree takes only ~12kb and contains the required state information.
> 
It's not the headers/tx-hashes of the blocks that I'm referring to, it
is the confirmation and spend information relative to all txs and all
outputs for each branch. This reverse navigation (i.e. utxo
information) is essential, must be persistent and is branch-relative.

> The blockchain is - by design - only eventually consistent across
> nodes. Even if nodes would use the same "tip-selection" rules, you
> cannot rely on all blocks being propagated and hence each
> transaction having the same number of confirmations across all
> nodes.
> 
> As a simpler example, if two miners both mine a block at
> approximately the same time and send it to each other, then surely
> they would want to continue mining on their own block. Otherwise
> they would be throwing away their own reward.

That's not your concurrent validation scenario. In the scenario you
described, the person chooses the weaker block of two that require
validation because it's better somehow, not because it's his own
(which does not require validation).

> And yes, this can also happen over multiple blocks, but the chances
> of consistency are vastly increased with each confirmation.

Consistency is reached, despite seeing things at different times,
because people use the same rules. If the economy ran on arbitrary
block preference consistency would be elusive.

> I am not talking about rejecting blocks, I am only talking choosing
> on which tip to mine.

This line of reasoning has me a bit baffled. Yet as I said, it's not
important to the question at hand. It is not likely to be optimal to
validate concurrently even if you consider selection of a weaker block
advantageous.

>> If you intend this to be useful it has to help build the chain,
>> not just rely on hardwiring checkpoints once rule changes are
>> presumed to be buried deeply enough to do so (as the result of
>> other implementations ).
>> 
>> I understand this approach, it was ours at one time. There is a 
>> significant difference, and your design is to some degree based
>> on a failure to fully consider this. I encourage you to not
>> assume any consensus-related detail is too small.
> 
> I am not failing to consider this, and I don't consider this too
> small . But ensuring contextual transaction validity by "validate
> =>  valid with rules X,Y,Z" and then checking the active rules
> (softfork activation) on order validation, will give logically the
> same results as "validate with X,Y,Z => valid". This is not
> "hardwiring checkpoints" at all.

Storing the validation flags with each tx is exactly what libbitcoin
does (otherwise pre-validation would be infeasible). But that was not
the full point. You said on this in response previously:

>>> ...height-based compliance as meta data of validation seems to
>>> be
adequate and safe.

I read this as encoding the height at which a fork historically
activated. If you intend to track activation for each branch that will
not be "height-based" it will be history based.

e
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY7KTHAAoJEDzYwH8LXOFOI+QH/RzX++1TNLC9DEMWioE7SmMj
yKOrP8WEkOnnrZdFKxVmwV9oZBekEvDABMnJmFiW5TMjsmPz7XwKAYzV0Y5L5oGU
fZYo3IOPyr0dA9TcpP15gNziR6pFUBq/QTYB6BcbUvvlkJv6xjgIdedgDMEyREWU
Hm/JU5g7gQUQd6MIDWbQ9FbYjtPuNSRQi851YfIn5mDivT4HuidaqQYMd9t5yS2Z
FuoQBI6L5GTJIqml1bTwJ0wsA7+ZseBEgMn1TT1ehy2v1FFJTojTpzIwG+m3eiXg
TxN3U/+fNAj+sKBb8Hq+nb7DvgjvKHyHuyRryBju7yq5d5rsb6meXcoiOtAznP8=
=fRXf
-END PGP SIGNATURE-
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-11 Thread Tomas via bitcoin-dev


On Tue, Apr 11, 2017, at 03:44, Eric Voskuil wrote:

> As I understand it you would split tx inputs and outputs and send them
> independently, and that you intend this to be a P2P network
> optimization - not a consensus rule change. So my comments are based
> on those inferences. If we are talking about consensus changes this
> conversation will end up in an entirely different place.

> I don't agree with the input/output relevance statements above. When a
> tx is announced the entire tx is relevant. It cannot be validated as
> outputs only. If it cannot be validated it cannot be stored by the
> node. Validating the outputs only would require the node store invalid
> transactions.

Splitting transactions only happens *on storage* and is just a minor
optimization compared to storing them in full. (actually a very recent
change with only marginally better results). This is simply because the
output scripts are read on script validation, and storing the outputs of
the transaction separately ensures better spatial locality of reference
(the inputs are just "in the way"). This is not relevant when using a
UTXO-index, because the outputs are then directly stored in the index,
where bitcrust has to read them from the transaction data.

It is not my intention to send them independently.
 
> I do accept that a double-spend detection is not an optimal criteria
> by which to discard a tx. One also needs fee information. But without
> double-spend knowledge the node has no rational way to defend itself
> against an infinity of transactions that spend the minimal fee but
> also have conflicting inputs (i.e. risking the fee only once). So tx
> (pool) validation requires double-spend knowledge and at least a
> summary from outputs.

Double spent information is still available to the network node and
could still be used for DoS protection, although I do believe
alternatives may exist.
 
> 
> A reorg is conceptual and cannot be engineered out. What you are
> referring to is a restructuring of stored information as a consequence
> of a reorg. I don't see this as related to the above. The ability to
> perform reorganization via a branch pointer swap is based not on the
> order or factoring of validation but instead on the amount of
> information stored. It requires more information to maintain multiple
> branches.
> 
> Transactions have confirmation states, validation contexts and spender
> heights for potentially each branch of an unbounded number of
> branches. It is this requirement to maintain that state for each
> branch that makes this design goal a very costly trade-off of space
> and complexity for reorg speed. As I mentioned earlier, it's the
> optimization for this scenario that I find questionable.

Sure, we can still call switching tips a "reorg". And it is indeed a
trade off as orphan blocks are stored, but a block in the spend tree
takes only ~12kb and contains  the required state information. 

I believe this trade off  reduced complexity. For the earlier tree this
could be pruned.

> Because choosing the lesser amount of work is non-consensus behavior.
> Under the same circumstances (i.e. having seen the same set of blocks)
> two nodes will disagree on whether there is one confirmation or no
> confirmations for a given tx. This disagreement will persist (i.e. why
> take the weaker block only to turn around and replace it with the
> stronger block that arrives a few seconds or minutes later). It stands
> to reason that if one rejects a stronger block under a race condition,
> one would reorg out a stronger block when a weaker block arrives a
> little after the stronger block. Does this "optimization" then apply
> to chains of blocks too?

The blockchain is - by design - only eventually consistent across nodes.
Even if nodes would use the same "tip-selection" rules, you cannot rely
on all blocks being propagated and hence each transaction having the
same number of confirmations across all nodes.

As a simpler example, if two miners both mine a block at approximately
the same time and send it to each other, then surely they would want to
continue mining on their own block. Otherwise they would be throwing
away their own reward.  

And yes, this can also happen over multiple blocks, but the chances of
consistency are vastly increased with each confirmation.

> Accepting a block that all previous implementations would have
> rejected under the same circumstance could be considered a hard fork,
> but you may be right.

I am not talking about rejecting blocks, I am only talking choosing on
which tip to mine.

> > Frankly, I think this is a bit of an exaggeration. Soft forks are 
> > counted on a hand, and I don't think there are many - if any - 
> > transactions in the current chain that have changed compliance 
> > based on height.
> 
> Hope is a bug.
> 
> If you intend this to be useful it has to help build the chain, not
> just rely on hardwiring checkpoints once rule changes are presumed to
> be buried 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-10 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/08/2017 04:58 PM, Tomas wrote:
> You seem to ignore here the difference between base load and peak 
> load. If Compact blocks/XThin with further optimizations can 
> presync nearly 100% of the transactions, and nodes can do as much 
> as possible when a transaction comes in, the time spent when a 
> block comes in can be minimized and a lot more transactions can be 
> handled with the same resources.

Maybe it's an issue of terminology. I have never used the terms
base/peak load. However I've been trying to get across, poorly I
suppose, that this is actually implemented in libbitcoin. I generally
refer to it as tx pre-validation. I've also tried to relate that you
are unnecessarily relating pre-validation to compactness. These are
unrelated ideas and better considered independently. One can get
nearly all of the benefit of pre-validation while still receiving
blocks (vs. compact blocks). The advantage of compactness is reduced
latency of the block announcement. The reason for pre-validation is
amortization of the validation and/or storage cost of a block.

> The reason for "splitting" is that for an incoming transaction the
>  spent-state of the outputs being spent isn't particularly
> relevant as you seem to acknowledge. When the block comes in, the
> actual output data isn't relevant.

As I understand it you would split tx inputs and outputs and send them
independently, and that you intend this to be a P2P network
optimization - not a consensus rule change. So my comments are based
on those inferences. If we are talking about consensus changes this
conversation will end up in an entirely different place.

I don't agree with the input/output relevance statements above. When a
tx is announced the entire tx is relevant. It cannot be validated as
outputs only. If it cannot be validated it cannot be stored by the
node. Validating the outputs only would require the node store invalid
transactions.

I do accept that a double-spend detection is not an optimal criteria
by which to discard a tx. One also needs fee information. But without
double-spend knowledge the node has no rational way to defend itself
against an infinity of transactions that spend the minimal fee but
also have conflicting inputs (i.e. risking the fee only once). So tx
(pool) validation requires double-spend knowledge and at least a
summary from outputs.

> The *only* thing that needs to be checked when a block comes in is 
> the order, and the spend-tree approach absolves the need to access 
> outputs here.

Inputs that are already valid against prevouts remain valid assuming
consensus rules have not changed. But any input that spends a coinbase
must be validated for prevout height once there is a block context for
validation. Additionally the set of txs must be validated for total
size, sigops, and fee claim. So it's not true that conflict detection
alone is sufficient. Yet one can cache a tx's size, sigops, fee and
minimum height in a graph so that when a block appears that contains
that tx the input validation can be skipped.

Ignoring the (actual) requirement for the full tx on the pool
validation, the required "order" validation at (compact or other)
block arrival basically consists of traversing each tx, ensuring none
are confirmed in a block below the fork point; traversing each each of
its confirmed inputs, ensuring that none are spent in a block below
the fork point; and ensuring the block's set of transactions do not
contain missing inputs and do not double spend internal to the block.

This and the above-mentioned other required per-transaction block
validation data can be cached to an in-memory structure as a potential
optimization over navigating the store, and as you say, does not
therefore require the actual outputs (script/value). But the original
issue of needing full transactions for independent transaction
validation remains.

> As it also absolves the need for reorgs this greatly simplifies the
> design.

A reorg is conceptual and cannot be engineered out. What you are
referring to is a restructuring of stored information as a consequence
of a reorg. I don't see this as related to the above. The ability to
perform reorganization via a branch pointer swap is based not on the
order or factoring of validation but instead on the amount of
information stored. It requires more information to maintain multiple
branches.

Transactions have confirmation states, validation contexts and spender
heights for potentially each branch of an unbounded number of
branches. It is this requirement to maintain that state for each
branch that makes this design goal a very costly trade-off of space
and complexity for reorg speed. As I mentioned earlier, it's the
optimization for this scenario that I find questionable.

> I am not sure why you say that a one-step approach is more 
> "test-friendly" as this seems to be unrelated.

Full separation of concerns allows all validation to be 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev
Thank you for your elaborate response Eric,

On Sun, Apr 9, 2017, at 00:37, Eric Voskuil wrote:
> My point was that "Using a storage engine without UTXO-index" has been
> done, and may be a useful reference, not that implementation details
> are the same.

I haven't dived into libbitcoin V2/V3 enough to  fully grasp it and
though your comments help, I still not fully do.  I will answer below
what is related to bitcrust itself.

My post wasn't posted to claim innovation; I merely try to explain how
Bitcrust works and why   it performs well. 


> First, I remain confused on your comments pertaining to UTXO growth
> and network protocol. I followed your conversation with Greg and it
> remains unclear to me. From what I understand you have isolated order
> (double spend) from script validation. I think we all understand that
> script validation requires inputs and outputs while double spend
> detection requires correlation of inputs. What I do not understand is
> your choice of optimization axis.
> 
> Detection of double spend is not useful in isolation. One must also
> validate scripts, which requires outputs. I can see that there is an
> opportunity to reject blocks (within the same branch) faster by
> validating for double spends before validating script. But unconfirmed
> transactions do not exist in a branch, and are therefore not truly
> conflicting, until they are mined. And even after they are mined
> conflicting txs remain potentially valid in other branches. So
> rejecting txs due to conflict comes down to a denial of service
> policy, which ultimately must be based on fee increment (e.g. RBF).
> But fees are based on the amount of the output value that remains
> unspent in the transaction. So this in turn requires the retrieval of
> outputs.
> 
> And yet the remaining scenario of fast rejection of invalid blocks is
> not a meaningful optimization. Optimizing for the case where a block
> has valid and sufficient PoW and yet is invalid (for double spend) is
> counterproductive. And even so, the txs within the invalid block may
> be entirely valid independent of the block, so you are back to looking
> up their outputs to obtain fees in the case of a double spend or to
> validate script otherwise. In all cases you need to get the outputs.
> 
> > Bitcrust simply scans the tree. Although earlier designs used a 
> > skip-list, it turns out that accompanied by a spent-index lagging a
> > few blocks behind, raw scanning is faster then anything even though
> > it needs to scan ~5 blocks times ~4000 inputs before reaching the
> > first spent-index,  the actual scan is highly cache efficient and
> > little more then a "REP SCASQ", reaching sub-microsecond per input
> > on each core *including* the lookup in the spend index.
> 
> I realize that you see the implementation of the ordering validation
> as interesting detail, but I find it hard to justify contemplating the
> implementation in isolation from the output lookup requirement. And if
> one must looking up both outputs and spends for each validation, it
> makes more sense to co-locate that data.
> 
> Recovering in one step all data necessary to validate a tx has real
> advantages over either interleaving queries and validation or
> splitting input vs. output validation queries into two steps. It is a
> significantly more test-friendly approach, has better performance
> characteristics, and simplifies code. I cannot see any reason to
> perform the data read for double spend validation in isolation of that
> for script validation.


You seem to ignore here the difference between base load and peak load.
If Compact blocks/XThin with further optimizations can presync nearly
100% of the transactions, and nodes can do as much as possible when a
transaction comes in, the time spent when a block comes in can be
minimized and a lot more transactions can be handled with the same
resources.

The reason for "splitting" is that for an incoming transaction the
spent-state of the outputs being spent isn't particularly relevant as
you seem to acknowledge. When the block comes in, the actual output data
isn't relevant.

The *only* thing that needs to be checked when a block comes in is the
order, and the spend-tree approach absolves the need to access outputs
here.

As it also absolves the need for reorgs this greatly simplifies the
design. I am not sure why you say that a one-step approach is more
"test-friendly" as this seems to be unrelated.

> 
> If by results you are referring to performance numbers, it's very hard
> to draw any conclusions without a full benchmark. It's great that if
> you are able to boost Core, but from my perspective the numbers aren't
> especially compelling.
>

I fully agree and hopefully do not pretend to hide that my numbers are
premature without a full implementation. I just think they are promising
enough to  convince at least myself to move on with this model.
 
> Despite the site's explanation I cannot think of any reason to ever
> 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/06/2017 05:17 PM, Tomas wrote:
> Thanks, but I get the impression that the similarity is rather 
> superficial.

My point was that "Using a storage engine without UTXO-index" has been
done, and may be a useful reference, not that implementation details
are the same.

> To address your points:

Below you addressed two points I made regarding the downside of the
original libbitcoin implementation. These were initial learnings that
informed future implementations (also without a UTXO index). These
were not comparisons to your implementation.

>> (1) higher than necessary storage space requirement due to
>> storing the indexing data required for correlate the spends, and
> 
> Hmm. No. Spends are simply scanned in the spend-tree (full tree, 
> prunable, fully 5.6gb), or caught by the spend-index (bit index, 
> non-prunable, fully 180mb). Neither impose significant storage 
> requirements.
> 
>> 2) higher than necessary validation complexity and cost in terms
>> of computing the spent-ness (including spender height) of an
>> output.
>> 
>> With the exception of de-linking (not deleted) in the case of
>> reorgs, the entire store is append only, implemented in a small
>> set of memory mapped file
> 
> I guess this is the key difference. As the spend-tree stores the
> spend information in a tree structure, no reorgs are required, and
> the resulting code is actually much less complex.

The references to "higher than necessary storage" and "higher than
necessary validation cost" are explicitly relative statements,
comparing earlier and later libbitcoin implementations.

It is not clear to me how you are relating both the storage cost
("Hmm. No. ... Neither impose significant storage requirements.") and
code complexity ("... resulting code is actually much less complex")
of your tx ordering software to my statements. Do you think I am wrong
and libbitcoin v3 is not actually more space and code efficient than
libbitcoin v2?

But given that you have thrown some numbers and ideas out in a request
for feedback, I'm happy to give you some based on several years of
experience working closely with these issues.

First, I remain confused on your comments pertaining to UTXO growth
and network protocol. I followed your conversation with Greg and it
remains unclear to me. From what I understand you have isolated order
(double spend) from script validation. I think we all understand that
script validation requires inputs and outputs while double spend
detection requires correlation of inputs. What I do not understand is
your choice of optimization axis.

Detection of double spend is not useful in isolation. One must also
validate scripts, which requires outputs. I can see that there is an
opportunity to reject blocks (within the same branch) faster by
validating for double spends before validating script. But unconfirmed
transactions do not exist in a branch, and are therefore not truly
conflicting, until they are mined. And even after they are mined
conflicting txs remain potentially valid in other branches. So
rejecting txs due to conflict comes down to a denial of service
policy, which ultimately must be based on fee increment (e.g. RBF).
But fees are based on the amount of the output value that remains
unspent in the transaction. So this in turn requires the retrieval of
outputs.

And yet the remaining scenario of fast rejection of invalid blocks is
not a meaningful optimization. Optimizing for the case where a block
has valid and sufficient PoW and yet is invalid (for double spend) is
counterproductive. And even so, the txs within the invalid block may
be entirely valid independent of the block, so you are back to looking
up their outputs to obtain fees in the case of a double spend or to
validate script otherwise. In all cases you need to get the outputs.

> Bitcrust simply scans the tree. Although earlier designs used a 
> skip-list, it turns out that accompanied by a spent-index lagging a
> few blocks behind, raw scanning is faster then anything even though
> it needs to scan ~5 blocks times ~4000 inputs before reaching the
> first spent-index,  the actual scan is highly cache efficient and
> little more then a "REP SCASQ", reaching sub-microsecond per input
> on each core *including* the lookup in the spend index.

I realize that you see the implementation of the ordering validation
as interesting detail, but I find it hard to justify contemplating the
implementation in isolation from the output lookup requirement. And if
one must looking up both outputs and spends for each validation, it
makes more sense to co-locate that data.

Recovering in one step all data necessary to validate a tx has real
advantages over either interleaving queries and validation or
splitting input vs. output validation queries into two steps. It is a
significantly more test-friendly approach, has better performance
characteristics, and simplifies code. I cannot see any reason to
perform 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev


On Sun, Apr 9, 2017, at 00:12, Gregory Maxwell wrote:
> In Bitcoin Core the software _explicitly_ and intentionally does not
> exploit mempool pre-validation because doing that very easily leads to
> hard to detect consensus faults and makes all mempool code consensus
> critical when it otherwise is not. There have been bugs in the past
> which would have split the network if this optimization had been used.
> 
> (in particular, I believe I recall one related to correctly removing
> coinbase spends from the mempool during reorganization that made them
> immature; and with the optimization and without the CNB post-test
> would have resulted in nodes that saw the reorg creating and accepting
> an invalid block, while nodes that didn't rejecting it; but because of
> prudent design it was largely harmless).

Although I don't quite follow the details (CNB post-test? Connect block
I assume?), the risks you are describing seem to be rather specific to
Core's implementation. For one, bitcrust does not or use need reorgs at
all.

Do you argue (or can you further explain) that the idea of splitting
script validation (or what you call mempool pre-validation), and order
validation is introducing risks  inherent to the protocol? 

Thanks,
Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Gregory Maxwell via bitcoin-dev
On Sat, Apr 8, 2017 at 8:21 PM, Johnson Lau  wrote:
> pre-synced means already in mempool and verified? Then it sounds like we just 
> need some mempool optimisation? The tx order in a block is not important, 
> unless they are dependent

In Bitcoin Core the software _explicitly_ and intentionally does not
exploit mempool pre-validation because doing that very easily leads to
hard to detect consensus faults and makes all mempool code consensus
critical when it otherwise is not. There have been bugs in the past
which would have split the network if this optimization had been used.

(in particular, I believe I recall one related to correctly removing
coinbase spends from the mempool during reorganization that made them
immature; and with the optimization and without the CNB post-test
would have resulted in nodes that saw the reorg creating and accepting
an invalid block, while nodes that didn't rejecting it; but because of
prudent design it was largely harmless).

Because signature validation is cached, and takes the majority of the
block validation time the speed up from the risky optimization isn't
that considerable, and there are other lower hanging fruity with
bigger payouts like Pieter's change to the per-txout management model
and the new non-atomic flushing logic and these things don't make
more of the system consensus critical.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Troy Benjegerdes via bitcoin-dev
I would advise anyone worried about 'hard drive access' to order a
512GB NVME (pci-express interface) flash drive (or a laptop), and
I expect the performance will make you wonder why you ever bothered
with cloud.

My (very brief) analysis of the performance of a full chain download
on a new laptop was that there was more overhead in lock contention and
database writes and it barely loaded the machine. Now maybe this is
because the flash **write** speed is slow (but still several orders
of magnitude above spinning disk), but random reads are sure blazing
fast.

Flash storage sizes also appear to be growing at similiar rates as the
total blockchain size.

Which begs another question: In a distributed byzantine fault-tolerant
system, why do we even need to bother with persistant storage, except
for long-term archival and chain of custody issues, which we could 
serialize the in-memory structures out as a stream to things like tape
drives or write-once optical media.


On Fri, Apr 07, 2017 at 11:39:18AM -0700, Bram Cohen via bitcoin-dev wrote:
> Expanding on this question a bit, it's optimized for parallel access, but
> hard drive access isn't parallel and memory accesses are very fast, so
> shouldn't the target of optimization be about cramming as much as possible
> in memory and minimizing disk accesses?
> 
> On Fri, Apr 7, 2017 at 11:18 AM, Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org> wrote:
> 
> > On Thu, Apr 6, 2017 at 10:12 PM, Tomas via bitcoin-dev
> >  wrote:
> > >As this
> > > solution, reversing the costs of outputs and inputs, seems to have
> > > excellent performance characteristics (as shown in the test results),
> > > updates to the protocol addressing the UTXO growth, might not be worth
> > > considering *protocol improvements*
> >
> > I'm still lost on this-- AFAICT your proposals long term resource
> > requirements are directly proportional to the amount of unspent output
> > data, which grows over time at some fraction of the total transaction
> > volume (plus the rate of spending which is more or less a constant).
> >
> > Can you help out my understanding here?
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >

> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev

> Please no conspiracy theory like stepping on someone’s toes. I believe
> it’s always nice to challenge the established model. However, as I’m
> trying to make some hardfork design, I intend to have a stricter UTXO
> growth limit. As you said "protocol addressing the UTXO growth, might not
> be worth considering protocol improvements*, it sounds like UTXO growth
> limit wouldn’t be very helpful for your model, which I doubt. 

Thank you. I realize that  this particular phrase implies that in my
design, outputs are less costly then inputs, *in total resource costs*,
which I can not defend without completely ignoring base load script
verification. I rephrased it.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Johnson Lau via bitcoin-dev

> On 9 Apr 2017, at 03:56, Tomas  wrote:
> 
> 
>> I don’t fully understand your storage engine. So the following deduction
>> is just based on common sense.
>> 
>> a) It is possible to make unlimited number of 1-in-100-out txs
>> 
>> b) The maximum number of 100-in-1-out txs is limited by the number of
>> previous 1-in-100-out txs
>> 
>> c) Since bitcrust performs not good with 100-in-1-out txs, for anti-DoS
>> purpose you should limit the number of previous 1-in-100-out txs. 
>> 
>> d) Limit 1-in-100-out txs == Limit UTXO growth
>> 
>> I’m not surprised that you find an model more efficient than Core. But I
>> don’t believe one could find a model that doesn’t become more efficient
>> with UTXO growth limitation.
> 
> My efficiency claims are *only* with regards to order validation. If we
> assume all transactions are already pre-synced and verified, bitcrust's
> order validation is very fast, and (only slightly) negatively effected
> by input-counts.

pre-synced means already in mempool and verified? Then it sounds like we just 
need some mempool optimisation? The tx order in a block is not important, 
unless they are dependent

> 
>> One more question: what is the absolute minimum disk and memory usage in
>> bitcrust, compared with the pruning mode in Core?
> 
> As bitcrust doesn't support this yet, I cannot give accurate numbers,
> but I've provided some numbers estimates earlier in the thread.
> 
> 
> Rereading my post and these comments, I may have stepped on some toes
> with regards to SegWit's model. I like SegWit (though I may have a
> slight preference for BIP140), and I understand the reasons for the
> "discount", so this was not my intention. I just think that the reversal
> of costs during peak load order validation is a rather interesting
> feature of using spend-tree  based validation. 
> 
> Tomas

Please no conspiracy theory like stepping on someone’s toes. I believe it’s 
always nice to challenge the established model. However, as I’m trying to make 
some hardfork design, I intend to have a stricter UTXO growth limit. As you 
said "protocol addressing the UTXO growth, might not be worth considering 
protocol improvements*, it sounds like UTXO growth limit wouldn’t be very 
helpful for your model, which I doubt. 
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev

> I don’t fully understand your storage engine. So the following deduction
> is just based on common sense.
> 
> a) It is possible to make unlimited number of 1-in-100-out txs
> 
> b) The maximum number of 100-in-1-out txs is limited by the number of
> previous 1-in-100-out txs
> 
> c) Since bitcrust performs not good with 100-in-1-out txs, for anti-DoS
> purpose you should limit the number of previous 1-in-100-out txs. 
> 
> d) Limit 1-in-100-out txs == Limit UTXO growth
> 
> I’m not surprised that you find an model more efficient than Core. But I
> don’t believe one could find a model that doesn’t become more efficient
> with UTXO growth limitation.

My efficiency claims are *only* with regards to order validation. If we
assume all transactions are already pre-synced and verified, bitcrust's
order validation is very fast, and (only slightly) negatively effected
by input-counts.

Most total time is spend during base load script validation, and UTXO
growth is the definitely the limiting factor there, as the model here
isn't all that different from Core's.


> Maybe you could try an experiment with regtest? Make a lot 1-in-100-out
> txs with many blocks, then spend all the UTXOs with 100-in-1-out txs.
> Compare the performance of bitcrust with core. Then repeat with
> 1-in-1-out chained txs (so the UTXO set is always almost empty)
> 

Again, this really depends on whether we focus on full block validation,
in which case the 100-1, 1-100 distinction will be the similar to Core,
or only regard order validation, in which case Bitcrust will have this
odd reversal. 


> One more question: what is the absolute minimum disk and memory usage in
> bitcrust, compared with the pruning mode in Core?

As bitcrust doesn't support this yet, I cannot give accurate numbers,
but I've provided some numbers estimates earlier in the thread.


Rereading my post and these comments, I may have stepped on some toes
with regards to SegWit's model. I like SegWit (though I may have a
slight preference for BIP140), and I understand the reasons for the
"discount", so this was not my intention. I just think that the reversal
of costs during peak load order validation is a rather interesting
feature of using spend-tree  based validation. 

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev


On Sat, Apr 8, 2017, at 20:27, Tom Harding via bitcoin-dev wrote:

> 

> 

> On Apr 7, 2017 12:42, "Gregory Maxwell"  wrote:

>> On Fri, Apr 7, 2017 at 6:52 PM, Tom Harding via bitcoin-dev

>>   wrote:

>>  > A network in which many nodes maintain a transaction index also
>>  > enables a
>>  > class of light node applications that ask peers to prove
>>  > existence and
>>  > spentness of TXO's.

>> 

>> Only with the additional commitment structure such as those proposed
>>  by Peter Todd in his stxo/txo commitment designs, e.g.

>> https://petertodd.org/2016/delayed-txo-commitments

> Light nodes are improved by detecting invalid transactions, even
> before they are mined.
> _



I am not quite sure why you think this approach would help in this
regard. I may be missing part of how Core works here, but Bitcrust's
txindex is merely used to lookup transactions from hashes and currently,
and seems to  fulfil the same role  as Core's -txindex  mode.


This can be pruned, and in the future auto-pruned as the "flat files"
used as base for all data allow for concurrent pruning. But unlike Core,
it is always needed as without UTXO index, it is needed to find outputs
during base load validation.

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Johnson Lau via bitcoin-dev

> On 8 Apr 2017, at 15:28, Tomas via bitcoin-dev 
>  wrote:
>> 
> 
> I think you are being a bit harsh here . I am also clearly explaining
> the difference only applies to peak load, and just making a suggestion.
> I simply want to stress the importance of protocol / implementation
> separation as even though you are correct UTXO data is always a resource
> cost for script validation (as I also state), the ratio of different
> costs are  not necessarily *identical* across implementation. 
> 
> Note that the converse also holds: In bitcrust, if the last few blocks
> contain many inputs, the peak load verification for this block is
> slower. This is not the case in Core.
> 
> Tomas
> 

I don’t fully understand your storage engine. So the following deduction is 
just based on common sense.

a) It is possible to make unlimited number of 1-in-100-out txs

b) The maximum number of 100-in-1-out txs is limited by the number of previous 
1-in-100-out txs

c) Since bitcrust performs not good with 100-in-1-out txs, for anti-DoS purpose 
you should limit the number of previous 1-in-100-out txs. 

d) Limit 1-in-100-out txs == Limit UTXO growth

I’m not surprised that you find an model more efficient than Core. But I don’t 
believe one could find a model that doesn’t become more efficient with UTXO 
growth limitation.

Maybe you could try an experiment with regtest? Make a lot 1-in-100-out txs 
with many blocks, then spend all the UTXOs with 100-in-1-out txs. Compare the 
performance of bitcrust with core. Then repeat with 1-in-1-out chained txs (so 
the UTXO set is always almost empty)

One more question: what is the absolute minimum disk and memory usage in 
bitcrust, compared with the pruning mode in Core?
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tom Harding via bitcoin-dev
On Apr 7, 2017 12:42, "Gregory Maxwell"  wrote:

On Fri, Apr 7, 2017 at 6:52 PM, Tom Harding via bitcoin-dev
 wrote:
> A network in which many nodes maintain a transaction index also enables a
> class of light node applications that ask peers to prove existence and
> spentness of TXO's.

Only with the additional commitment structure such as those proposed
by Peter Todd in his stxo/txo commitment designs, e.g.
https://petertodd.org/2016/delayed-txo-commitments


Light nodes are improved by detecting invalid transactions, even before
they are mined.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev
On Sat, Apr 8, 2017, at 02:44, Gregory Maxwell wrote:
> As you note that the output costs still bound the resource
> requirements. 

Resource cost is not just a measure of storage requirement; data that
needs to be accessed during peak load induce more cost then data only
used during base load or only rarely used.

> Latency related costs in Bitcoin Core also do not depend on the number
> of outputs in transactions in a block. When a transaction is handled
> it goes into an in-memory buffer and only gets flushed later if isn't
> spent before the buffer fills.  A block will take more time to
> validate with more inputs, same as you observer, but the aggregate
> resource usage for users depends significantly on outputs (so, in fact
> there is even further misaligned incentives than just the fact that
> small outputs have a outsized long term cost).

In Core, when a block comes the inputs are checked against the UTXO set
(which grows with outputs)  even if pre-synced, to verify order. Am I
wrong there? This is not in the case in bitcrust; it is instead checked
against the spend-tree (which grows with inputs).

How "significant" this is, I neither know nor claim,  but it is an
interesting difference. 

> Then I think you may want to retract the claim that "As this solution,
> reversing the costs of outputs and inputs, [...] updates to the
> protocol addressing the UTXO growth, might not be worth considering
> *protocol improvements* "

I think you are being a bit harsh here . I am also clearly explaining
the difference only applies to peak load, and just making a suggestion.
I simply want to stress the importance of protocol / implementation
separation as even though you are correct UTXO data is always a resource
cost for script validation (as I also state), the ratio of different
costs are  not necessarily *identical* across implementation. 

Note that the converse also holds: In bitcrust, if the last few blocks
contain many inputs, the peak load verification for this block is
slower. This is not the case in Core.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Gregory Maxwell via bitcoin-dev
On Fri, Apr 7, 2017 at 9:14 PM, Tomas  wrote:
> The long term *minimal disk storage* requirement, can obviously not be less
> then all the unspent outputs.

Then I think you may want to retract the claim that "As this solution,
reversing the costs of outputs and inputs, [...] updates to the
protocol addressing the UTXO growth, might not be worth considering
*protocol improvements* "

As you note that the output costs still bound the resource
requirements. Short of radical protocol changes like TXO-proofs the
UTXO data remains a driving unavoidable long term resource cost, not
an implementation detail.  Implementation optimizations like improving
locality further or keeping spentness in memory do not change this
fact.

> The storage that is accessed during peak load (block validation with
> pre-synced transactions), is minimized as this only needs the transaction
> index (to lookup ptrs from hashes), the tip of the spend-tree and the tip of

Latency related costs in Bitcoin Core also do not depend on the number
of outputs in transactions in a block. When a transaction is handled
it goes into an in-memory buffer and only gets flushed later if isn't
spent before the buffer fills.  A block will take more time to
validate with more inputs, same as you observer, but the aggregate
resource usage for users depends significantly on outputs (so, in fact
there is even further misaligned incentives than just the fact that
small outputs have a outsized long term cost).
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/07/2017 02:44 PM, Tomas via bitcoin-dev wrote:
> Hi Eric,
> 
> On Fri, Apr 7, 2017, at 21:55, Eric Voskuil via bitcoin-dev wrote:
>> Optimization for lower memory platforms then becomes a process
>> of reducing the need for paging. This is the purpose of a cache.
>> The seam between disk and memory can be filled quite nicely by a
>> small amount of cache. On high RAM systems any cache is actually
>> a de-optimization but on low RAM systems it can prevent excessive
>> paging. This is directly analogous to a CPU cache.
> 
> 
> I am not entirely sure I agree with that, or understand it
> correctly.
> 
> If -for example - the data of some application is a set  of
> records which can be sorted from least frequently used to most
> frequently used then doing just that sort will beat any
> application-layer cache. Regardless of size of data and size of
> RAM, you simply allow the OS to use disk caching or memory map
> caching to work its  magic .

It's a reasonable assumption, and given that the no-explicit-cache
implementation is a subset of the optionally-cached implementation,
was of course the initial implementation.

> In fact, I would argue that an application-layer cache *only*
> makes sense if the data model shows a *hard* distinction between
> often and not often used data. If usage-frequency is a continuous
> line, caching is best left to the OS by focussing on proper spatial
> and temporal locality of reference of your data, because the OS has
> much more information to make the right decision.

In practice this is not the case. The Bitcoin data model is neither
continuous nor strictly segregated by usage.

It is true that with sufficient RAM a cache is totally
counterproductive. It is also my experience that an independent UTXO
store is not a reasonable/necessary trade of disk space, memory
scalability, and/or code complexity in exchange for speed.

But on lower memory systems a explicit cache is beneficial. The
difference is clearly measurable in production code by simply changing
the cache limit and testing on various configurations.

e
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY6CXnAAoJEDzYwH8LXOFOf0YH/2qk3hYC6iEDW/DWM2ffkdb9
QM7A29Pvbfw9Wjr5Xx+ugIQvlAr4T+nByOCT6AnrqNU5K3UUmbC0KIB1rEL94hsK
QYVlLs0cOrjg8qKJpck+wcgiWw3VbEa/Y44hK7NLUxoy2HsLYaxPhqFH3GGgowqR
syga626jf2YUyudZxj1gFuqn7grkwghnzdrEUJMcqQo8IvCqjftGXlKxBGyB/AIs
Dx+5EWO9Q9IxrNpg/fsKKB6xkMxkmSx2hbD7dmEBvi/afbVF66rDTinjInG/LCju
pV7kT/GAWqGQGku6sQyAOexsxVhWA8EA/QEjvbyyGb+3YnR0s6nPk+CxO+RkOgo=
=e+Pr
-END PGP SIGNATURE-
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Hi Eric,

On Fri, Apr 7, 2017, at 21:55, Eric Voskuil via bitcoin-dev wrote:
> Optimization for lower memory platforms then becomes a process of
> reducing the need for paging. This is the purpose of a cache. The seam
> between disk and memory can be filled quite nicely by a small amount
> of cache. On high RAM systems any cache is actually a de-optimization
> but on low RAM systems it can prevent excessive paging. This is
> directly analogous to a CPU cache. 


I am not entirely sure I agree with that, or understand it correctly.

If -for example - the data of some application is a set  of records
which can be sorted from least frequently used to most frequently used
then doing just that sort will beat any application-layer cache.
Regardless of size of data and size of RAM, you simply allow the OS to
use disk caching or memory map caching to work its  magic .

In fact, I would argue that an application-layer cache *only* makes
sense if the data model shows a *hard* distinction between often and not
often used data. If usage-frequency is a continuous line, caching is
best left to the OS by focussing on proper spatial and temporal locality
of reference of your data, because the OS has much more information to
make the right decision. 

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Answering both,



On Fri, Apr 7, 2017 at 11:18 AM, Gregory Maxwell via bitcoin-dev 
 wrote:
>> 

>> I'm still lost on this-- AFAICT your proposals long term resource

>> requirements are directly proportional to the amount of
>> unspent output
>> data, which grows over time at some fraction of the total transaction
>> volume (plus the rate of spending which is more or less a constant).
>> 

>> Can you help out my understanding here?

>> 



On Fri, Apr 7, 2017, at 20:39, Bram Cohen wrote:

> Expanding on this question a bit, it's optimized for parallel access,
> but hard drive access isn't parallel and memory accesses are very
> fast, so shouldn't the target of optimization be about cramming as
> much as possible in memory and minimizing disk accesses?


The long term *minimal disk storage* requirement, can obviously not be
less then all the unspent outputs. Minimal disk requirements is not
something bitcrust attempts to address.


 The storage that is accessed during peak load (block validation with
 pre-synced transactions), is minimized as this only needs the
 transaction index (to lookup ptrs from hashes), the tip of the spend-
 tree and the tip of the spend-index (together to check double
 spents/spending non-existing outputs). These not only easily fit in
 RAM, but are accessed in a cache efficient way. *These* only grow with
 inputs as the spend tree contains one record per input referencing the
 output being spent.


Script validation is also not something bitcrust *directly* addresses;
it uses libbitcoinconsensus for the actual validation and lookups to
outputs are mostly similar. They are kept fast by trusting the OS on MRU
caching of transaction-outputs; I don't think that for this part the
UTXO index has much drawbacks,. Bitcrust seems to have a small advantage
due to the awesomeness of Rayon's parallelization and the lock-free data
structures, but a disadvantage in that keeping all spent outputs
decreases spatial locality of reference. Script validation is not the
innovative part.


Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/07/2017 11:39 AM, Bram Cohen via bitcoin-dev wrote:
> Expanding on this question a bit, it's optimized for parallel
> access, but hard drive access isn't parallel and memory accesses
> are very fast, so shouldn't the target of optimization be about
> cramming as much as possible in memory and minimizing disk
> accesses?

While this may seem to be the case it is not generally optimal. The
question is overly broad as one may or may not be optimizing for any
combination of:

startup time (first usability)
warm-up time (priming)
shutdown time (flush)
fault tolerance (hard shutdown survivability)
top block validation (read speed)
full chain validation (read/write speed)
RAM consumption
Disk consumption
Query response
Servers (big RAM)
Desktops (small RAM)
Mining (fast validation)
Wallets (background performance)
SSD vs. HDD

But even limiting the question to input validation, all of these
considerations (at least) are present.

Ideally one wants the simplest implementation that is optimal under
all considerations. While this may be a unicorn, it is possible to
achieve a simple implementation (relative to alternatives) that allows
for the trade-offs necessary to be managed through configuration (by
the user and/or implementation).

Shoving the entire data set into RAM has the obvious problem of
limited RAM. Eventually the OS will be paging more of the data back to
disk (as virtual RAM). In other words this does not scale, as a change
in hardware disproportionately impacts performance. Ideally one wants
the trade between "disk" and "memory" to be made by the underlying
platform, as that is its purpose. Creating one data structure for disk
and another for memory not only increases complexity, but denies the
platform visibility into this trade-off. As such the platform
eventually ends up working directly against the optimization.

An on-disk structure that is not mapped into memory by the application
allows the operating system to maintain as much or as little state in
memory as it considers optimal, given the other tasks that the user
has given it. In the case of memory mapped files (which are optimized
by all operating systems as central to their virtual memory systems)
it is possible for everything from zero to the full store to be memory
resident.

Optimization for lower memory platforms then becomes a process of
reducing the need for paging. This is the purpose of a cache. The seam
between disk and memory can be filled quite nicely by a small amount
of cache. On high RAM systems any cache is actually a de-optimization
but on low RAM systems it can prevent excessive paging. This is
directly analogous to a CPU cache. There are clear optimal points in
terms of cache size, and the implementation and management of such a
cache can and should be internal to a store. Of course a cache cannot
provide perfect scale all the way to zero RAM, but it scales quite
well for actual systems.

While a particular drive may not support parallel operations one
should not assume that a disk-based store does not benefit from
parallelism. Simply refer to the model described above and you will
see that with enough memory the entire blockchain can be
memory-resident, and for high performance operations a fraction of
that is sufficient for a high degree of parallelism.

In practice a cache of about 10k transactions worth of outputs is
optimal for 8GB RAM. This requires just a few blocks for warm-up,
which can be primed in inconsequential time at startup. Fault
tolerance can be managed by flushing after all writes, which also
reduces shutdown time to zero. For higher performance systems,
flushing can be disabled entirely, increasing shutdown time but also
dramatically increasing write performance. Given that the blockchain
is a cache, this is a very reasonable trade-off in some scenarios. The
model works just as well with HDD as SSD, although certainly SSD
performs better overall.

e
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY5+7GAAoJEDzYwH8LXOFOsAsH/3QK55aWH6sAi6OsTwV1FLZV
Y/2SSjwn1vUh55MDkPpCxDwV99JqVwpk0vGM8mGg5s4ZS8sxOPqwGiBz/SZWbF9v
oStJS0DjUPnbYtI/mrC30GuAYVcKnc5DFDHvjX6f0xrLIzViFR7eiW0npUH6Xipt
RI9Mockaf1CqqGExtbIqWal0YDEQGH0ekXRp7uEjh8nPUoKqTVvxDCgqVooQfvfx
EeKX9ruSv/r91EM1JQuH8HBBF7+R24tmMtwbpGx0zrDg5ytpIyrRzVH/ze1Mj2a3
ZxThvofGzhKcDiTPWiJI11DBYUvhSH4Kx0uWLzFUA0gxPfWkZQKJWNDl2CEwljk=
=C7rD
-END PGP SIGNATURE-
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Gregory Maxwell via bitcoin-dev
On Fri, Apr 7, 2017 at 6:52 PM, Tom Harding via bitcoin-dev
 wrote:
> A network in which many nodes maintain a transaction index also enables a
> class of light node applications that ask peers to prove existence and
> spentness of TXO's.

Only with the additional commitment structure such as those proposed
by Peter Todd in his stxo/txo commitment designs, e.g.
https://petertodd.org/2016/delayed-txo-commitments
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tom Harding via bitcoin-dev
On Apr 6, 2017 6:31 PM, "Tomas via bitcoin-dev" <
bitcoin-dev@lists.linuxfoundation.org> wrote:


Bitcrust just uses a *transaction-index*, where outputs can be looked up
regardless of being spent.



A network in which many nodes maintain a transaction index also enables a
class of light node applications that ask peers to prove existence and
spentness of TXO's.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Bram Cohen via bitcoin-dev
Expanding on this question a bit, it's optimized for parallel access, but
hard drive access isn't parallel and memory accesses are very fast, so
shouldn't the target of optimization be about cramming as much as possible
in memory and minimizing disk accesses?

On Fri, Apr 7, 2017 at 11:18 AM, Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Thu, Apr 6, 2017 at 10:12 PM, Tomas via bitcoin-dev
>  wrote:
> >As this
> > solution, reversing the costs of outputs and inputs, seems to have
> > excellent performance characteristics (as shown in the test results),
> > updates to the protocol addressing the UTXO growth, might not be worth
> > considering *protocol improvements*
>
> I'm still lost on this-- AFAICT your proposals long term resource
> requirements are directly proportional to the amount of unspent output
> data, which grows over time at some fraction of the total transaction
> volume (plus the rate of spending which is more or less a constant).
>
> Can you help out my understanding here?
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Gregory Maxwell via bitcoin-dev
On Thu, Apr 6, 2017 at 10:12 PM, Tomas via bitcoin-dev
 wrote:
>As this
> solution, reversing the costs of outputs and inputs, seems to have
> excellent performance characteristics (as shown in the test results),
> updates to the protocol addressing the UTXO growth, might not be worth
> considering *protocol improvements*

I'm still lost on this-- AFAICT your proposals long term resource
requirements are directly proportional to the amount of unspent output
data, which grows over time at some fraction of the total transaction
volume (plus the rate of spending which is more or less a constant).

Can you help out my understanding here?
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Thank you,



The benches are running in Google Cloud Engine; currently on 8 vCPU
32gb, but I tend to switch hardware regularly.


Roughly, the results are better for Bitcrust with high end hardware and
the difference for total block validations is mostly diminished at 2
vCPU, 7,5 gb.


Note that the spend-tree optimization primarily aims to improve peak
load order validation; when a block with pre-synced transactions comes
in, but this is tricky to accurately bench with Core using this simple
method of comparison by logs.


I will upgrade to, and show the results against 0.14 in the next weeks.


Best,

Tomas





On Fri, Apr 7, 2017, at 16:14, Greg Sanders wrote:

> Interesting work.

> 

> I was wondering if you could tellank  us what specs for the machine
> being used as preliminary benchmark is here:
> https://bitcrust.org/results ?
> 

> I'd be interested to also see comparisons with 0.14 which has some
> improvements for script validation with more cores.
> 

> On Fri, Apr 7, 2017 at 4:47 AM, Tomas via bitcoin-dev  d...@lists.linuxfoundation.org> wrote:
>> Thank you Marcos,

>> 

>>  Though written in Rust, bitcrust-db is definitely usable as
>>  pluggable
>>  module as its interface will be roughly some queries, add_tx and

>>  add_block with blobs and flags. (Bitcrust internally uses a

>>  deserialize-only model, keeping references to the blobs with the
>>  parsed
>>  data).

>> 

>>  However, from Core's side I believe network and storage are
>>  currently
>>  rather tightly coupled, which will make this far from trivial.

>> 

>>  Regardless, I am also hoping (with funding & a team) to build a
>>  Bitcrust
>>  networking component as well to bring a strong competitor to the
>>  market.
>> 

>>  best,

>>  Tomas

>> 

>> 

>> 

>> 

>> On Fri, Apr 7, 2017, at 09:55, Marcos mayorga wrote:

>>  > Hi Tomas,

>>  >

>>  > I've read it and think it is an excellent work, I'd like to see it
>>  > integrated into bitcoin-core as a 'kernel module'.

>>  >

>>  > I see there are a lot of proof of concepts out there, IMO
>>  > every one
>>  > deserve a room in the bitcoin client as a selectable feature, to
>>  > make the
>>  > software more flexible and less dictatorial, an user could easily
>>  > select
>>  > which features she wants to run.

>>  >

>>  > Best regards,

>>  > Marcos

>>  >

>>  > > I have been working on a bitcoin implementation that uses a
>>  > > different
>>  > > approach to indexing for verifying the order of transactions.
>>  > > Instead of
>>  > > using an index of unspent outputs, double spends are verified by
>>  > > using a
>>  > > spend-tree where spends are scanned against spent outputs
>>  > > instead of
>>  > > unspent outputs.

>>  > >

>>  > > This allows for much better concurrency, as not only blocks, but
>>  > > also
>>  > > individual inputs can be verified fully in parallel.

>>  > >

>>  > > I explain the approach at https://bitcrust.org, source code is
>>  > > available
>>  > > at https://github.com/tomasvdw/bitcrust

>>  > >

>>  > > I am sharing this not only to ask for your feedback, but also to
>>  > > call
>>  > > for a clear separation of protocol and implementations: As this
>>  > > solution, reversing the costs of outputs and inputs, seems to
>>  > > have
>>  > > excellent performance characteristics (as shown in the test
>>  > > results),
>>  > > updates to the protocol addressing the UTXO growth, might not be
>>  > > worth
>>  > > considering *protocol improvements* and it might be best to
>>  > > address
>>  > > these concerns as implementation details.

>>  > >

>>  > > Kind regards,

>>  > > Tomas van der Wansem

>>  > > to...@bitcrust.org

>>  > > Bitcrust

>>  > > ___

>>  > > bitcoin-dev mailing list

>>  > > bitcoin-dev@lists.linuxfoundation.org

>>  > > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

>>  > >

>>  >

>>  >

>>  ___

>>  bitcoin-dev mailing list

>> bitcoin-dev@lists.linuxfoundation.org

>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Greg Sanders via bitcoin-dev
Interesting work.

I was wondering if you could tell us what specs for the machine being used
as preliminary benchmark is here: https://bitcrust.org/results ?

I'd be interested to also see comparisons with 0.14 which has some
improvements for script validation with more cores.

On Fri, Apr 7, 2017 at 4:47 AM, Tomas via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Thank you Marcos,
>
> Though written in Rust, bitcrust-db is definitely usable as pluggable
> module as its interface will be roughly some queries, add_tx and
> add_block with blobs and flags. (Bitcrust internally uses a
> deserialize-only model, keeping references to the blobs with the parsed
> data).
>
> However, from Core's side I believe network and storage are currently
> rather tightly coupled, which will make this far from trivial.
>
> Regardless, I am also hoping (with funding & a team) to build a Bitcrust
> networking component as well to bring a strong competitor to the market.
>
> best,
> Tomas
>
>
>
> On Fri, Apr 7, 2017, at 09:55, Marcos mayorga wrote:
> > Hi Tomas,
> >
> > I've read it and think it is an excellent work, I'd like to see it
> > integrated into bitcoin-core as a 'kernel module'.
> >
> > I see there are a lot of proof of concepts out there, IMO every one
> > deserve a room in the bitcoin client as a selectable feature, to make the
> > software more flexible and less dictatorial, an user could easily select
> > which features she wants to run.
> >
> > Best regards,
> > Marcos
> >
> > > I have been working on a bitcoin implementation that uses a different
> > > approach to indexing for verifying the order of transactions. Instead
> of
> > > using an index of unspent outputs, double spends are verified by using
> a
> > > spend-tree where spends are scanned against spent outputs instead of
> > > unspent outputs.
> > >
> > > This allows for much better concurrency, as not only blocks, but also
> > > individual inputs can be verified fully in parallel.
> > >
> > > I explain the approach at https://bitcrust.org, source code is
> available
> > > at https://github.com/tomasvdw/bitcrust
> > >
> > > I am sharing this not only to ask for your feedback, but also to call
> > > for a clear separation of protocol and implementations: As this
> > > solution, reversing the costs of outputs and inputs, seems to have
> > > excellent performance characteristics (as shown in the test results),
> > > updates to the protocol addressing the UTXO growth, might not be worth
> > > considering *protocol improvements* and it might be best to address
> > > these concerns as implementation details.
> > >
> > > Kind regards,
> > > Tomas van der Wansem
> > > to...@bitcrust.org
> > > Bitcrust
> > > ___
> > > bitcoin-dev mailing list
> > > bitcoin-dev@lists.linuxfoundation.org
> > > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> > >
> >
> >
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Thank you Marcos,

Though written in Rust, bitcrust-db is definitely usable as pluggable
module as its interface will be roughly some queries, add_tx and
add_block with blobs and flags. (Bitcrust internally uses a
deserialize-only model, keeping references to the blobs with the parsed
data).  

However, from Core's side I believe network and storage are currently
rather tightly coupled, which will make this far from trivial.

Regardless, I am also hoping (with funding & a team) to build a Bitcrust
networking component as well to bring a strong competitor to the market.

best,
Tomas



On Fri, Apr 7, 2017, at 09:55, Marcos mayorga wrote:
> Hi Tomas,
> 
> I've read it and think it is an excellent work, I'd like to see it
> integrated into bitcoin-core as a 'kernel module'.
> 
> I see there are a lot of proof of concepts out there, IMO every one
> deserve a room in the bitcoin client as a selectable feature, to make the
> software more flexible and less dictatorial, an user could easily select
> which features she wants to run.
> 
> Best regards,
> Marcos
> 
> > I have been working on a bitcoin implementation that uses a different
> > approach to indexing for verifying the order of transactions. Instead of
> > using an index of unspent outputs, double spends are verified by using a
> > spend-tree where spends are scanned against spent outputs instead of
> > unspent outputs.
> >
> > This allows for much better concurrency, as not only blocks, but also
> > individual inputs can be verified fully in parallel.
> >
> > I explain the approach at https://bitcrust.org, source code is available
> > at https://github.com/tomasvdw/bitcrust
> >
> > I am sharing this not only to ask for your feedback, but also to call
> > for a clear separation of protocol and implementations: As this
> > solution, reversing the costs of outputs and inputs, seems to have
> > excellent performance characteristics (as shown in the test results),
> > updates to the protocol addressing the UTXO growth, might not be worth
> > considering *protocol improvements* and it might be best to address
> > these concerns as implementation details.
> >
> > Kind regards,
> > Tomas van der Wansem
> > to...@bitcrust.org
> > Bitcrust
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> 
> 
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Marcos mayorga via bitcoin-dev
Hi Tomas,

I've read it and think it is an excellent work, I'd like to see it
integrated into bitcoin-core as a 'kernel module'.

I see there are a lot of proof of concepts out there, IMO every one
deserve a room in the bitcoin client as a selectable feature, to make the
software more flexible and less dictatorial, an user could easily select
which features she wants to run.

Best regards,
Marcos

> I have been working on a bitcoin implementation that uses a different
> approach to indexing for verifying the order of transactions. Instead of
> using an index of unspent outputs, double spends are verified by using a
> spend-tree where spends are scanned against spent outputs instead of
> unspent outputs.
>
> This allows for much better concurrency, as not only blocks, but also
> individual inputs can be verified fully in parallel.
>
> I explain the approach at https://bitcrust.org, source code is available
> at https://github.com/tomasvdw/bitcrust
>
> I am sharing this not only to ask for your feedback, but also to call
> for a clear separation of protocol and implementations: As this
> solution, reversing the costs of outputs and inputs, seems to have
> excellent performance characteristics (as shown in the test results),
> updates to the protocol addressing the UTXO growth, might not be worth
> considering *protocol improvements* and it might be best to address
> these concerns as implementation details.
>
> Kind regards,
> Tomas van der Wansem
> to...@bitcrust.org
> Bitcrust
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev


On Fri, Apr 7, 2017, at 03:09, Gregory Maxwell wrote:
> 
> How do you deal with validity rules changing based on block height?

I expected that one :). Just like the 100 blocks coinbase rule, changes
by softforks need to be added as metadata to the transaction-index, but
this is not yet in place.

As for the script validation itself using libbitcoinconsensus, this is a
bit hairy as this expects the rules to be known. Luckily, simply
gradually retrying using "lower" rules won't hurt performance, as
transaction that mismatch newer rules are rare.

Generally, bitcrust would appreciate a "is valid with X rules" instead 
of a "validate with X rules" approach.


> So it sounds like to work the software still needs an analog of a
> (U)TXO database? I am confused by the earlier comments about thinking
> the the resource consumption of the (U)TXO database is not a
> consideration in your design.

No, but transactional access is. Bitcrust just uses a
*transaction-index*, where outputs can be looked up regardless of being
spent. As the concept of being "spent" depends on the branch, script
validation ignores this and simply looks up the outputs.

Transactions are split in two parts for better locality of reference
when accessing outputs.

The transaction index only looks similar to an "UTXO-index" after full
pruning.

> If you get a transaction claiming to spend 0xDEADBEEFDEADBEEF, an
> output that never existed how does your spent index reject this spend

The spend-tree is scanned until either DEADBEAF is found (=ERR double
spent),  the transaction of DEADBEEF is found (=all ok!), or the start
of the chain is reached (=ERR spending unknown output!)

To prevent actually having to scan to genesis, the spent-index "catches"
the search after a few blocks and performs the same lookup (positive for
tx, negative for output) on a simple bit index.


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev


On Fri, Apr 7, 2017, at 02:32, Gregory Maxwell wrote:

> Perhaps a simple question would help:
> 
> What is the minimal amount of space your system requires to take a new
> block received from the P2P network and verifying that all its spends
> were valid spends of existing coins unspent coins today?
> 
> For Bitcoin Core the answer is ~2GB (plus the configuration handling
> currently forces you to keep another 550MB of blocks for reorgs).

Bitcrust separates script validation (base load, when transaction come
in) from order validation (peak load, when blocks come in).

For script validation it would obviously need the ~2GB (or I think
~1.5GB) of outputs needed to validate these.  For order validation it
needs ~200mb or the spent-index (for bit-lookups) and I would guess
roughly ~500mb of the spent-tree (for scanning), though I don't think
the 5.7GB full spend tree isn't worth pruning anytime soon.

Then it is currently using a  ~1.5GB   index for transaction hash to
fileptr lookups, though this could be made more space efficient.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Gregory Maxwell via bitcoin-dev
On Fri, Apr 7, 2017 at 12:48 AM, Tomas  wrote:
> Bitcrust separates script validation (base load, when transaction come
> in) from order validation (peak load, when blocks come in).

How do you deal with validity rules changing based on block height?

> For script validation it would obviously need the ~2GB (or I think
> ~1.5GB) of outputs needed to validate these.

So it sounds like to work the software still needs an analog of a
(U)TXO database? I am confused by the earlier comments about thinking
the the resource consumption of the (U)TXO database is not a
consideration in your design.

> For order validation it
> needs ~200mb or the spent-index (for bit-lookups) and I would guess
> roughly ~500mb of the spent-tree (for scanning), though I don't think
> the 5.7GB full spend tree isn't worth pruning anytime soon.

If you get a transaction claiming to spend 0xDEADBEEFDEADBEEF, an
output that never existed how does your spent index reject this spend?
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev
Hi Eric,

Thanks, but I get the impression that the similarity is rather
superficial.  

To address your points:

> (1) higher than necessary storage space requirement due to storing the
> indexing data required for correlate the spends, and

Hmm. No. Spends are simply scanned in the spend-tree (full tree,
prunable, fully 5.6gb), or caught by the spend-index (bit index,
non-prunable, fully 180mb). Neither impose significant storage
requirements.

> 2) higher than necessary validation complexity and cost in terms of
> computing the spent-ness (including spender height) of an output.
>
> With the exception of de-linking (not deleted) in the case of reorgs, the
> entire store is append only, implemented in a small set of memory
> mapped file

I guess this is the key difference. As the spend-tree stores the spend
information in a tree structure, no reorgs are required, and the
resulting code is actually much less complex.

Bitcrust simply scans the tree. Although earlier designs used a
skip-list, it turns out that accompanied by a spent-index lagging a few
blocks behind, raw scanning is faster then anything even though it needs
to scan ~5 blocks times ~4000 inputs before reaching the first
spent-index,  the actual scan is highly cache efficient and little more
then a "REP SCASQ", reaching sub-microsecond per input on each core
*including* the lookup in the spend index.

 > I don't follow this part, maybe you could clarify. A spends index
> grows with the size of the spend set (forever) as it cannot be pruned,
> which certainly exceeds the size of the UTXO set (unless nothing is
> spent). The advantage is that you don't have to keep rewriting the
> store when you use a spends set (because the store can be append only).

My point is, that the spend tree grows per *input* of a transaction
instead of per *output* of a transaction, because this is what is
scanned on order validation.

The spend tree can be pruned because the spend index (~200mb) catches
early spends.

Disregarding the baseload script validation, the peak load order
validation of bitcrust is more negatively effected by a transaction with
many inputs than by a transaction of many outputs.

I encourage you to check out the results at https://bitcrust.org

Regards,
Tomas

On Fri, Apr 7, 2017, at 01:38, Eric Voskuil wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA256
> 
> On 04/06/2017 03:12 PM, Tomas via bitcoin-dev wrote:
> 
> Hi Tomas,
> 
> > I have been working on a bitcoin implementation that uses a
> > different approach to indexing for verifying the order of
> > transactions. Instead of using an index of unspent outputs, double
> > spends are verified by using a spend-tree where spends are scanned
> > against spent outputs instead of unspent outputs.
> 
> This is the approach that genjix used in libbitcoin version2. With the
> exception of de-linking (not deleted) in the case of reorgs, the
> entire store is append only, implemented in a small set of memory
> mapped files. The downsides to the approach are:
> 
> (1) higher than necessary storage space requirement due to storing the
> indexing data required for correlate the spends, and
> 
> (2) higher than necessary validation complexity and cost in terms of
> computing the spent-ness (including spender height) of an output.
> 
> His implementation used a hash table, so performance-wise it did quite
> well and would theoretically outperform a tree, O(1) vs. O(log2(N)).
> 
> > This allows for much better concurrency, as not only blocks, but
> > also individual inputs can be verified fully in parallel.
> 
> I was successful in parallelizing input validation (across the inputs
> of an unconfirmed tx and across the set of all inputs in a block)
> using the v2 store. However, it is not the case that the spends
> approach is necessary for concurrency.
> 
> To resolve the above two problems the version3 store does not use a
> spends table/index. Nor does it store any table of UTXOs. Yet
> validation is highly parallelized. Instead of additional indexes it
> uses the tx hash table, augmented with 32 bits per output for spender
> height. So there is a O(1) cost of finding the tx and a O(N) cost of
> finding the spender height where N is the number of outputs in the tx.
> But because the number of outputs in a tx is bounded (by block size)
> this is constant time in the number of transactions.
> 
> This works out much faster than the spends table, and without the
> storage cost or complexity disadvantages. It also scales with
> available hardware, as the memory mapped files become in-memory hash
> tables. For low memory machines we found it was important to implement
> an opaque UTXO cache to limit paging, but for higher end systems zero
> cache is optimal.
> 
> > I am sharing this not only to ask for your feedback, but also to
> > call for a clear separation of protocol and implementations: As
> > this solution, reversing the costs of outputs and inputs, seems to
> > have excellent performance 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Eric Voskuil via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA256

On 04/06/2017 03:12 PM, Tomas via bitcoin-dev wrote:

Hi Tomas,

> I have been working on a bitcoin implementation that uses a
> different approach to indexing for verifying the order of
> transactions. Instead of using an index of unspent outputs, double
> spends are verified by using a spend-tree where spends are scanned
> against spent outputs instead of unspent outputs.

This is the approach that genjix used in libbitcoin version2. With the
exception of de-linking (not deleted) in the case of reorgs, the
entire store is append only, implemented in a small set of memory
mapped files. The downsides to the approach are:

(1) higher than necessary storage space requirement due to storing the
indexing data required for correlate the spends, and

(2) higher than necessary validation complexity and cost in terms of
computing the spent-ness (including spender height) of an output.

His implementation used a hash table, so performance-wise it did quite
well and would theoretically outperform a tree, O(1) vs. O(log2(N)).

> This allows for much better concurrency, as not only blocks, but
> also individual inputs can be verified fully in parallel.

I was successful in parallelizing input validation (across the inputs
of an unconfirmed tx and across the set of all inputs in a block)
using the v2 store. However, it is not the case that the spends
approach is necessary for concurrency.

To resolve the above two problems the version3 store does not use a
spends table/index. Nor does it store any table of UTXOs. Yet
validation is highly parallelized. Instead of additional indexes it
uses the tx hash table, augmented with 32 bits per output for spender
height. So there is a O(1) cost of finding the tx and a O(N) cost of
finding the spender height where N is the number of outputs in the tx.
But because the number of outputs in a tx is bounded (by block size)
this is constant time in the number of transactions.

This works out much faster than the spends table, and without the
storage cost or complexity disadvantages. It also scales with
available hardware, as the memory mapped files become in-memory hash
tables. For low memory machines we found it was important to implement
an opaque UTXO cache to limit paging, but for higher end systems zero
cache is optimal.

> I am sharing this not only to ask for your feedback, but also to
> call for a clear separation of protocol and implementations: As
> this solution, reversing the costs of outputs and inputs, seems to
> have excellent performance characteristics (as shown in the test
> results), updates to the protocol addressing the UTXO growth, might
> not be worth considering *protocol improvements* and it might be
> best to address these concerns as implementation details.

I don't follow this part, maybe you could clarify. A spends index
grows with the size of the spend set (forever) as it cannot be pruned,
which certainly exceeds the size of the UTXO set (unless nothing is
spent). The advantage is that you don't have to keep rewriting the
store when you use a spends set (because the store can be append only).

Feel free to message me if you'd like to discuss in more detail, or to
continue on the libbitcoin mailing list (copied).

e
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQEcBAEBCAAGBQJY5tFpAAoJEDzYwH8LXOFOcMgH/2mw5iOvUYNwvZ2z0KKTSUOA
Pd8d5mKoWvd94QxhQ+RyTbkEkMhHl75+zcBgRsfUTtZlBIe/Z0+OgVIN6ibEw+WD
w7k3HqgQi9gLgydEelxTAX+z3dJ24n4kCCdKAmZbBuK+Yr/7AViugbEqYemKepku
pRWZZS74MUvrYesc0xPn4Ao3DTzMjjY0K2mkuqV8jlwdfZjlAQX9pTx+iSCuMhkd
HJ8w7s8QnjVnUeOlLe29mZwaFJPyOTLJMqgDE6s2sXacAy5QQbVCatygvDQ8A/wC
ktBnKPFb2lGX3bGKu/KwABegBy/hyec+NP0wFR+0MVivCwTK1+SjeHu5MNOSVlM=
=tfVj
-END PGP SIGNATURE-
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev