Re: [bitcoin-dev] Why the BIP-72 Payment Protocol URI Standard is Insecure Against MITM Attacks

2017-09-29 Thread Tomas via bitcoin-dev

On Fri, Sep 29, 2017, at 04:55, Peter Todd via bitcoin-dev wrote:
> The BIP-70 payment protocol used via BIP-72 URI's is insecure, as payment
> qr
> codes don't cryptographically commit to the identity of the merchant,
> which
> means a MITM attacker can redirect the payment if they can obtain a SSL
> cert
> that the wallet accepts.

By that reasoning, we also shouldn't go to https://coinbase.com or
https://kraken.com to buy any bitcoins? As a MITM can redirect the site
_if_ they obtain the coinbase or kraken certificate.

Obviously, HTTPS is secured under the assumption that certificates are
secure.  

Using the payment protocol simply means paying to a secure endpoint (eg
https://tomasvdw.nl/pay) instead of an address.

>  That wallet is also likely using an off-the-shelf SSL library,
> with
> nothing other than an infrequently updated set of root certificates to
> use to
> verify the certificate; your browser has access to a whole host of better
> technologies, such as HSTS pinning, certificate transparency, and
> frequently
> updated root certificate lists with proper revocation (see Symantec).

So we should not use HTTPS for secure transfer because the
implementation may not be good enough? This incorrectly conflates
implementation with specification. There is nothing stopping a developer
from using a proper implementation.

> 
> As an ad-hoc, unstandardized, extension Android Wallet for Bitcoin at
> least
> supports a h= parameter with a hash commitment to what the payment
> request
> should be, and will reject the MITM attacker if that hash doesn't match.
> But
> that's not actually in the standard itself, and as far as I can tell has
> never
> been made into a BIP.

Currently it is widely used by merchants, but not yet for light clients
_receiving_ money. If it becomes more wide spread,   it offers a range
of advantages as  the bitcoin-address of the URI can and should be
deprecated (made impossible with "h="). A payment address just becomes a
secure endpoint.

This means no more address reuse is possible. Also, it drops the need
for mempool synchronization among non-miners, solely as a "notification"
mechanism. In addition it means light clients know exactly when a
transaction is coming in, so they can efficiently rely on client-side
filtering a small set of blocks, improving their privacy.

In my opinion, the payment protocol is key to scaling.

> As-is BIP-72 is very dangerous and should be depreciated, with a new BIP
> made
> to replace it.

Sorry, but maybe you  could explain better how secure communication over
HTTPS is "very dangerous"? I think some websites would like to know :)

Tomas van der Wansem
bitcrust
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Partial UTXO tree as commitment

2017-09-05 Thread Tomas via bitcoin-dev
I would like to propose an efficient UTXO commitment scheme.

A UTXO commitment can be useful for:

1. Fast syncing a full node, by downloading the UTXO-set
2. Proofing (non) existence of a UTXO..

Various schemes have been proposed:

* Merkle/radix trees and variants; all of which have the problem that
they significantly increase the burden of maintaining the UTXO set.
Furthermore, such schemes tend to practically prescribe the UTXO storage
format severely limiting continuous per-implementation optimizations.
* A "flat" rolling hash, eg the ECMH proposed by Pieter Wiulle which is
cheap to calculate but only solves (1) and not (2).

I propose a hybrid approach, with very limited extra burden to maintain
and reasonably small proofs: 

We divide the UTXO set in buckets by prefix of their TXID, then maintain
a rolling hash for each bucket. The commitment is then the root of the
tree constructed from the resulting bucket hashes. To construct the
tree: For each depth, we group the hashes of the next depth per 64
hashes and calculate the rolling hash of each. (Effectively, this is a
prefix tree with a fixed branch-size of 64).

Bucketcount
---
txcount = number of TXIDs in the UTXO set
bucketcount = (smallest power of 2 larger than sqrt(txcount))  << 6

Rationale for bucketcount:

* This currently gives a bucketcount of 2^19, which is very cheap to
maintain with a 16mb array of rolling hashes.
* This currently gives an average bucket size of 4kb. With a rolling
hash, full nodes don't need to maintain the buckets themselves, but they
are used for proofs.
* The burden of future UTXO growth is divided among maintaining the
rolling hashes and size of the proof: 10,000x as large UTXO set (20TB),
gives ~400kb buckets and ~1.6gb in maintaining rolling hashes.
* This gives a tree depth of 5, which means the cost of every UTXO
update is increased  by ~3 rolling hashes (and a double SHA), as the
lowest depths don't benefit from caching.
* A proof for (non) existence of a UTXO is ~ 4*64*32 =8kb (branch-nodes)
+ 4kb (bucket) = ~12kb

Specification [WIP]
---
We define the "UTXO commitment" as the serialized byte array: "U" "T"
"X" "O" VARINT(version) VARINT(txcount) UINT256(UTXO-root)[todo
clarify]

A block that contains an output in the coinbase whose scriptPubKey
consists solely of OP_RETURN [UTXO commitment] must be rejected if in
the UTXO commitment the version equals 1 and either 
* After updating the UTXO state, the number of distinct TXIDs in the
UTXO set is not equal to the txcount value of the UTXO commitment
* After updating the UTXO state, the UTXO-root in the UTXO commitment is
not equal to the UTXO-root defined below.

The UTXO-root can be calculated as follows:

* Define _bucketcount_ as (smallest power of 2 larger than
sqrt(txcount))  << 6
* Given a TXID in the UTXO set, define UTXO(TXID) as the double SHA256
of (TXID + coins). (coins is the serialization of unspent outputs to be
spec'ed). 
* Let bucket N be the set of values UTXO(TXID) for each TXID in the
UTXO-set where (TXID mod _bucketcount_) equals N.
* Let rhash N be the rolling hash (TBD) of all values in bucket N
* Let the hash sequence be the ordered sequence  rhash
[0,_bucketcount_).

1. If the hash sequence contains at most 64 entries, then the UTXO-root
is the rolling hash of all entries in the hash sequence, otherwise:
2. Group the hash sequence in ordered subsequences of 64 entries each.
3. Find the rolling hash of each subsequence
4. Continue with 1., with the hash sequence being the ordered sequence
of these rolling hashes.

Note: an implementation may want to maintain and update the set of
rolling hashes at higher depths on each UTXO set operation.

Note: the secure ECMH is a good candidate for the bucket hash. This
could also be used for the branch rolling hashes, but it might be worth
considering XOR for those as there seem to be simply not enough
candidates to find a colliding set?

Note: two magic numbers are used: "<< 6" for the bucket count, and "64"
for the branch size. They work nicely but are pulled out of a dark place
and merit some experimentation.

Use cases for light clients
-
These UTXO proofs could be used as compact fraud proofs, although the
benefit of this is not generally agreed upon.

They can also be used to increase low-conf security to light clients, by
validating the signatures and order-validity of incoming transactions
against the right bucket of the current UTXO set.

An interesting use case may be another type of light client. It could be
interesting for a light client to abandon the bloom filters, and instead
use the UTXO proofs to verify whether an incoming or outgoing
transaction is confirmed. This could be beneficial for "rarely active"
light clients such as smartphone apps, as it prevents the need to
synchronize previous blocks with bloom filters, and allows syncing to
the latest block with 12kb/output.

Summary 
--
* 

Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients

2017-06-09 Thread Tomas via bitcoin-dev

On Fri, Jun 9, 2017, at 05:50, Olaoluwa Osuntokun wrote:
> Tomas wrote:
> > I don't completely understand the benefit of making the outpoints and
> > pubkey hashes (weakly) verifiable. These only serve as notifications and
> > therefore do not seem to introduce an attack vector.
> 
> Not sure what you mean by this. Care to elaborate? 
> 

I will rephrase. The BIP reads:

> Additionally, Full nodes can nearly undetectably lie by omission causing a 
> denial of service which can 
lead to undesirable failure modes in applications whose safety
critically relies on responding to certain
on-chain events.

I understand that the compact  header chain is used to mitigate against
this, but I am unsure about the use 
cases and trade-offs.

For a normal wallet, the only thing I can imagine an attacker could do
is pretending a transaction did not confirm 
yet, causing nuisance.  

An application critically depending on knowing what happens on-chain 
surely is better off  downloading 
the TXIDs, providing PoW security? Gaining knowledge of incoming TXIDs
is nicely solved the payment protocol.

Are there enough use cases that critically depend on pub key hashes
being used on-chain, to make the compact header chain worth its costs? 

Regards,
Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP Proposal: Compact Client Side Filtering for Light Clients

2017-06-08 Thread Tomas via bitcoin-dev
On Thu, Jun 1, 2017, at 21:01, Olaoluwa Osuntokun via bitcoin-dev wrote:> Hi 
y'all, 
> 
> Alex Akselrod and I would like to propose a new light client BIP for
> consideration: 
>* https://github.com/Roasbeef/bips/blob/master/gcs_light_client.mediawiki> 
> 

Very interesting. 

I would like to consider how this compares to another light client type
with rather different security characteristics where each client would
receive for each transaction in each block,
* The TXID (uncompressed)
* The spent outpoints (with TXIDs compressed)
* The pubkey hash (compressed to reasonable amount of false positives)

A rough estimate would indicate this to be about 2-2.5x as big per block
as your proposal, but comes with rather different security
characteristics, and would not require download since genesis.
The client could verify the TXIDs against the merkle root with a much
stronger (PoW) guarantee compared to the guarantee based on the
assumption of peers being distinct, which your proposal seems to make.
Like your proposal this removes the privacy and processing  issues from
server-side filtering, but unlike your proposal retrieval of all txids
in each block can also serve for a basis of  fraud proofs and
(disprovable) fraud hints, without resorting to full block downloads.
I don't completely understand the benefit of making the outpoints and
pubkey hashes (weakly) verifiable. These only serve as notifications and
therefore do not seem to introduce an attack vector. Omitting data is
always possible, so receiving data is a prerequisite for verification,
not an assumption that can be made.  How could an attacker benefit from
"hiding notifications"?
I think client-side filtering is definitely an important route to
take, but is it worth compressing away the information to verify the
merkle root?
Regards,
Tomas van der Wansem
bitcrust

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Proposal to allow users to configure the maximum block weight based on a support threshold

2017-05-24 Thread Tomas via bitcoin-dev

On Wed, May 24, 2017, at 04:42, Erik Aronesty wrote:
> Instead of block thresholds, use utxo bits to coordinate size changes
> (larger and smaller should be allowed).> 
> There is no reason for miners to be involved in a decision to change
> this aspects of the protocol.   Plenty of other ways to coordinate.
Miners cannot change the block size or any other rule without support of
the users, because their blocks and coins would be rejected. This
mechanism that Bitcoin brought us, has been working fine and I see no
reason to change it with utxo bits.
I *only* propose an optional way  to synchronize changes without the
need of off chain agreements, which seems like a simple improvement over
the current situation.

> 
> Otherwise someone can make it seem to a miner like 99pct of nodes are
> ready for a larger weight even though that's false.
I agree that the user agent signalling isn't very important, and I think
that most of us aware that you cannot rely on counting them.
> 
> On May 23, 2017 8:03 AM, "Tomas via bitcoin-dev"  d...@lists.linuxfoundation.org> wrote:>> I have a proposal that would allow 
> each user to optionally
>> configure the>>  maximum block weight at a support threshold.
>> 
>>  It recognizes that there is no need for off chain bickering, by
>>  providing a mechanism that lets each users freely choose their own
>>  parameters while still maintaining full coordination of any changes.>> 
>>  The BIP can be found here:
>> 
>> https://github.com/tomasvdw/bips/blob/master/bip-changing-the-maximum-block%20weight-based-on-a-support-threshold.mediawiki>>
>>  
>>  It is worth noting that this proposal does neither gives more
>>  power to>>  miners nor reduces decentralization. Miners still rely on their
>>  blocks>>  being accepted by economic nodes to sell their minted coins. This
>>  proposal doesn't change that.
>> 
>>  Regards,
>>  Tomas van der Wansem
>>  bitcrust
>>  ___
>>  bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Proposal to allow users to configure the maximum block weight based on a support threshold

2017-05-23 Thread Tomas via bitcoin-dev
I have a proposal that would allow each user to optionally configure the
maximum block weight at a support threshold.

It recognizes that there is no need for off chain bickering, by
providing a mechanism that lets each users freely choose their own
parameters while still maintaining full coordination of any changes.

The BIP can be found here: 

https://github.com/tomasvdw/bips/blob/master/bip-changing-the-maximum-block%20weight-based-on-a-support-threshold.mediawiki

It is worth noting that this proposal does neither gives more power to
miners nor reduces decentralization. Miners still rely on their blocks
being accepted by economic nodes to sell their minted coins. This
proposal doesn't change that. 

Regards,
Tomas van der Wansem
bitcrust
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Non-confirming block signalling

2017-05-05 Thread Tomas via bitcoin-dev
Sorry, I wasn't aware. This is indeed the same proposal.



On Fri, May 5, 2017, at 15:01, Bryan Bishop wrote:
> On Fri, May 5, 2017 at 6:24 AM, Tomas via bitcoin-dev  d...@lists.linuxfoundation.org> wrote:>> I propose a method to mark blocks to 
> indicate that they were
>> generated>> without verifying the previous block. This can be done by using
>> a bit of>> the version field.
> 
> see also:
> https://lists.linuxfoundation.org/pipermail/bitcoin-dev/2015-December/011853.html>
>  
> - Bryan
> http://heybryan.org/
> 1 512 203 0507

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Non-confirming block signalling

2017-05-05 Thread Tomas via bitcoin-dev
I propose a method to mark blocks to indicate that they were generated
without verifying the previous block. This can be done by using a bit of
the version field.

This would counter the reduction of security caused by what is known as
"SPV-mining".

The BIP is here:
https://github.com/tomasvdw/bips/blob/master/bip-non-confirming-block-signalling.mediawiki

Tomas
bitcrust
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Fraud Proofs with semi SPV

2017-05-05 Thread Tomas via bitcoin-dev
I would like some feedback on the idea to use a node type a bit heavier
then SPV (dubbed FSPV) to solve Fraud Proofs.

An FSPV node not only downloads block headers, but also the "spend-tree
blocks", consisting of all TXIDs and all previous output indices and
TXIDs. The latter can be compacted using a scheme similar to Compact
Blocks, which will make the spend-tree block ~80kb in size.

ThIs way the FSPV can track the full transaction graph at little cost.

The advantage is, that Fraud Hint messages for absent/withheld
transactions become feasible. A normal SPV  is reduced to Full Node by
such (cheaply faked) hint, but for an FSPV the cost is almost zero.

All it needs to do is add a taint-bit in the tree and propagate the
taint to the transaction graph. It then knows it needs to request the
Fraud Hinted transaction to consider any descendant transaction valid.

This makes it sufficient to punish fraudulent fraud hints or withheld
transactions by normal "banscore" procedures.

All other fraud can be proven by transaction-sets.

More information here:  https://bitcrust.org/blog-fraud-proofs

Tomas
Bitcrust
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Full node "tip" function

2017-05-04 Thread Tomas via bitcoin-dev
The ones that *could* pay non-mining full nodes are miners/pools, by
outsourcing transaction selection using a different PoW.  By doing so
they could buy proof-of-uncensored-selection and proof-of-goodwill for a
small fee.
We would allow full nodes to generate and broadcast a template
block which:
* Does not contain a valid header yet
* Contains the transaction selection
* Contains a  coinbase output with a predetermined part of the block
  reward (say 0.5%) to themselves* Contains a nonce for PoW of a predetermined 
currently ASIC resistant
  hash function behind a OP_RETURN.
The template with the highest PoW since the last block would be leading.
A miner/pool can then choose to use this instead of their own, adding
the rest of the reward and the SHA nonce themselves. That way they would
set up a competition among full nodes.
This would of course be voluntary but provable, so maybe in a pool's
interest to do this via naming and shaming.
Tomas
bitcrust

On Wed, May 3, 2017, at 23:43, Ben Thompson via bitcoin-dev wrote:
> I feel like this would be pointless as the vast majority of users
> would likely download the blockchain from a node that was not
> enforcing a tip requirement as it would seem like unnecessary cost as
> in protocols such as BitTorrent there is no such tips in sharing files
> and the blockchain distribution is in eccense the same thing. However
> perhaps I am underestimating the generosity of node operators but I
> feel that adding a cost to the blockchain (assuming that all users add
> a tip requirement) would lead to centralisation.> 
> On Wed, 3 May 2017, 22:21 Erik Aronesty via bitcoin-dev,  d...@lists.linuxfoundation.org> wrote:>> IDEA:
>> - Full nodes advertise a bitcoin address.   Users that need to
>>   download the block chain from that node can be encouraged to send a
>>   tip to the peers that served them (by % served).   Recommended tip
>>   of 10mbit should be fine.>> 
>> - A full nodes can *require* a tip to download the blockchain.  If
>>   they do, users that don't specify a tip cannot use them.>> 
>> CONS:
>> 
>> For some people, this may represent a barrier to hosting their own
>> full node.   After all, if you have to pay $15 just to get a copy of
>> the blockchain, that just adds to the already expensive prospect of
>> hosting a full node.>> PROS: 
>> 
>> As long as you manage to stay online, you should get your money back
>> and more.   This is the an incentive for quality, long term hosting.>> In 
>> the long term, this should cause stable nodes to stick around
>> longer.   It also discourages "installation spam" attacks on the
>> network.>> Fees for other node operations can be considered if this is
>> successful.>> 

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-11 Thread Tomas via bitcoin-dev


On Tue, Apr 11, 2017, at 11:41, Eric Voskuil wrote:
> It's not the headers/tx-hashes of the blocks that I'm referring to, it
> is the confirmation and spend information relative to all txs and all
> outputs for each branch. This reverse navigation (i.e. utxo
> information) is essential, must be persistent and is branch-relative.

That is exactly what is stored in the spend-tree. 

>> As a simpler example, if two miners both mine a block at
>> approximately the same time and send it to each other, then surely
>> they would want to continue mining on their own block. Otherwise
>> they would be throwing away their own reward.

> That's not your concurrent validation scenario. In the scenario you
> described, the person chooses the weaker block of two that require
> validation because it's better somehow, not because it's his own
> (which does not require validation).

> Consistency is reached, despite seeing things at different times,
> because people use the same rules. If the economy ran on arbitrary
> block preference consistency would be elusive.

No but my example shows  that it is up to the miner to choose which tip
to work on. This is not using different rules, it is just optimizing its
income. This means that the economy *does* run on arbitrary "block
preference", even if it is not running on arbitrary rules.

If two blocks are competing, a miner could optimize its decision which
to mine on, not just on whether one of the blocks is his own, but also
on fees, or on excessive validation costs.

> I read this as encoding the height at which a fork historically
> activated. If you intend to track activation for each branch that will
> not be "height-based" it will be history based.

I understand "height-based" was not the right wording, as it is of
course branch-specific. Per tip ruleset metadata, must be matched with
per-transaction ruleset metadata.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-11 Thread Tomas via bitcoin-dev


On Tue, Apr 11, 2017, at 03:44, Eric Voskuil wrote:

> As I understand it you would split tx inputs and outputs and send them
> independently, and that you intend this to be a P2P network
> optimization - not a consensus rule change. So my comments are based
> on those inferences. If we are talking about consensus changes this
> conversation will end up in an entirely different place.

> I don't agree with the input/output relevance statements above. When a
> tx is announced the entire tx is relevant. It cannot be validated as
> outputs only. If it cannot be validated it cannot be stored by the
> node. Validating the outputs only would require the node store invalid
> transactions.

Splitting transactions only happens *on storage* and is just a minor
optimization compared to storing them in full. (actually a very recent
change with only marginally better results). This is simply because the
output scripts are read on script validation, and storing the outputs of
the transaction separately ensures better spatial locality of reference
(the inputs are just "in the way"). This is not relevant when using a
UTXO-index, because the outputs are then directly stored in the index,
where bitcrust has to read them from the transaction data.

It is not my intention to send them independently.
 
> I do accept that a double-spend detection is not an optimal criteria
> by which to discard a tx. One also needs fee information. But without
> double-spend knowledge the node has no rational way to defend itself
> against an infinity of transactions that spend the minimal fee but
> also have conflicting inputs (i.e. risking the fee only once). So tx
> (pool) validation requires double-spend knowledge and at least a
> summary from outputs.

Double spent information is still available to the network node and
could still be used for DoS protection, although I do believe
alternatives may exist.
 
> 
> A reorg is conceptual and cannot be engineered out. What you are
> referring to is a restructuring of stored information as a consequence
> of a reorg. I don't see this as related to the above. The ability to
> perform reorganization via a branch pointer swap is based not on the
> order or factoring of validation but instead on the amount of
> information stored. It requires more information to maintain multiple
> branches.
> 
> Transactions have confirmation states, validation contexts and spender
> heights for potentially each branch of an unbounded number of
> branches. It is this requirement to maintain that state for each
> branch that makes this design goal a very costly trade-off of space
> and complexity for reorg speed. As I mentioned earlier, it's the
> optimization for this scenario that I find questionable.

Sure, we can still call switching tips a "reorg". And it is indeed a
trade off as orphan blocks are stored, but a block in the spend tree
takes only ~12kb and contains  the required state information. 

I believe this trade off  reduced complexity. For the earlier tree this
could be pruned.

> Because choosing the lesser amount of work is non-consensus behavior.
> Under the same circumstances (i.e. having seen the same set of blocks)
> two nodes will disagree on whether there is one confirmation or no
> confirmations for a given tx. This disagreement will persist (i.e. why
> take the weaker block only to turn around and replace it with the
> stronger block that arrives a few seconds or minutes later). It stands
> to reason that if one rejects a stronger block under a race condition,
> one would reorg out a stronger block when a weaker block arrives a
> little after the stronger block. Does this "optimization" then apply
> to chains of blocks too?

The blockchain is - by design - only eventually consistent across nodes.
Even if nodes would use the same "tip-selection" rules, you cannot rely
on all blocks being propagated and hence each transaction having the
same number of confirmations across all nodes.

As a simpler example, if two miners both mine a block at approximately
the same time and send it to each other, then surely they would want to
continue mining on their own block. Otherwise they would be throwing
away their own reward.  

And yes, this can also happen over multiple blocks, but the chances of
consistency are vastly increased with each confirmation.

> Accepting a block that all previous implementations would have
> rejected under the same circumstance could be considered a hard fork,
> but you may be right.

I am not talking about rejecting blocks, I am only talking choosing on
which tip to mine.

> > Frankly, I think this is a bit of an exaggeration. Soft forks are 
> > counted on a hand, and I don't think there are many - if any - 
> > transactions in the current chain that have changed compliance 
> > based on height.
> 
> Hope is a bug.
> 
> If you intend this to be useful it has to help build the chain, not
> just rely on hardwiring checkpoints once rule changes are presumed to
> be buried 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev
Thank you for your elaborate response Eric,

On Sun, Apr 9, 2017, at 00:37, Eric Voskuil wrote:
> My point was that "Using a storage engine without UTXO-index" has been
> done, and may be a useful reference, not that implementation details
> are the same.

I haven't dived into libbitcoin V2/V3 enough to  fully grasp it and
though your comments help, I still not fully do.  I will answer below
what is related to bitcrust itself.

My post wasn't posted to claim innovation; I merely try to explain how
Bitcrust works and why   it performs well. 


> First, I remain confused on your comments pertaining to UTXO growth
> and network protocol. I followed your conversation with Greg and it
> remains unclear to me. From what I understand you have isolated order
> (double spend) from script validation. I think we all understand that
> script validation requires inputs and outputs while double spend
> detection requires correlation of inputs. What I do not understand is
> your choice of optimization axis.
> 
> Detection of double spend is not useful in isolation. One must also
> validate scripts, which requires outputs. I can see that there is an
> opportunity to reject blocks (within the same branch) faster by
> validating for double spends before validating script. But unconfirmed
> transactions do not exist in a branch, and are therefore not truly
> conflicting, until they are mined. And even after they are mined
> conflicting txs remain potentially valid in other branches. So
> rejecting txs due to conflict comes down to a denial of service
> policy, which ultimately must be based on fee increment (e.g. RBF).
> But fees are based on the amount of the output value that remains
> unspent in the transaction. So this in turn requires the retrieval of
> outputs.
> 
> And yet the remaining scenario of fast rejection of invalid blocks is
> not a meaningful optimization. Optimizing for the case where a block
> has valid and sufficient PoW and yet is invalid (for double spend) is
> counterproductive. And even so, the txs within the invalid block may
> be entirely valid independent of the block, so you are back to looking
> up their outputs to obtain fees in the case of a double spend or to
> validate script otherwise. In all cases you need to get the outputs.
> 
> > Bitcrust simply scans the tree. Although earlier designs used a 
> > skip-list, it turns out that accompanied by a spent-index lagging a
> > few blocks behind, raw scanning is faster then anything even though
> > it needs to scan ~5 blocks times ~4000 inputs before reaching the
> > first spent-index,  the actual scan is highly cache efficient and
> > little more then a "REP SCASQ", reaching sub-microsecond per input
> > on each core *including* the lookup in the spend index.
> 
> I realize that you see the implementation of the ordering validation
> as interesting detail, but I find it hard to justify contemplating the
> implementation in isolation from the output lookup requirement. And if
> one must looking up both outputs and spends for each validation, it
> makes more sense to co-locate that data.
> 
> Recovering in one step all data necessary to validate a tx has real
> advantages over either interleaving queries and validation or
> splitting input vs. output validation queries into two steps. It is a
> significantly more test-friendly approach, has better performance
> characteristics, and simplifies code. I cannot see any reason to
> perform the data read for double spend validation in isolation of that
> for script validation.


You seem to ignore here the difference between base load and peak load.
If Compact blocks/XThin with further optimizations can presync nearly
100% of the transactions, and nodes can do as much as possible when a
transaction comes in, the time spent when a block comes in can be
minimized and a lot more transactions can be handled with the same
resources.

The reason for "splitting" is that for an incoming transaction the
spent-state of the outputs being spent isn't particularly relevant as
you seem to acknowledge. When the block comes in, the actual output data
isn't relevant.

The *only* thing that needs to be checked when a block comes in is the
order, and the spend-tree approach absolves the need to access outputs
here.

As it also absolves the need for reorgs this greatly simplifies the
design. I am not sure why you say that a one-step approach is more
"test-friendly" as this seems to be unrelated.

> 
> If by results you are referring to performance numbers, it's very hard
> to draw any conclusions without a full benchmark. It's great that if
> you are able to boost Core, but from my perspective the numbers aren't
> especially compelling.
>

I fully agree and hopefully do not pretend to hide that my numbers are
premature without a full implementation. I just think they are promising
enough to  convince at least myself to move on with this model.
 
> Despite the site's explanation I cannot think of any reason to ever
> 

Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev


On Sun, Apr 9, 2017, at 00:12, Gregory Maxwell wrote:
> In Bitcoin Core the software _explicitly_ and intentionally does not
> exploit mempool pre-validation because doing that very easily leads to
> hard to detect consensus faults and makes all mempool code consensus
> critical when it otherwise is not. There have been bugs in the past
> which would have split the network if this optimization had been used.
> 
> (in particular, I believe I recall one related to correctly removing
> coinbase spends from the mempool during reorganization that made them
> immature; and with the optimization and without the CNB post-test
> would have resulted in nodes that saw the reorg creating and accepting
> an invalid block, while nodes that didn't rejecting it; but because of
> prudent design it was largely harmless).

Although I don't quite follow the details (CNB post-test? Connect block
I assume?), the risks you are describing seem to be rather specific to
Core's implementation. For one, bitcrust does not or use need reorgs at
all.

Do you argue (or can you further explain) that the idea of splitting
script validation (or what you call mempool pre-validation), and order
validation is introducing risks  inherent to the protocol? 

Thanks,
Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev

> Please no conspiracy theory like stepping on someone’s toes. I believe
> it’s always nice to challenge the established model. However, as I’m
> trying to make some hardfork design, I intend to have a stricter UTXO
> growth limit. As you said "protocol addressing the UTXO growth, might not
> be worth considering protocol improvements*, it sounds like UTXO growth
> limit wouldn’t be very helpful for your model, which I doubt. 

Thank you. I realize that  this particular phrase implies that in my
design, outputs are less costly then inputs, *in total resource costs*,
which I can not defend without completely ignoring base load script
verification. I rephrased it.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev

> I don’t fully understand your storage engine. So the following deduction
> is just based on common sense.
> 
> a) It is possible to make unlimited number of 1-in-100-out txs
> 
> b) The maximum number of 100-in-1-out txs is limited by the number of
> previous 1-in-100-out txs
> 
> c) Since bitcrust performs not good with 100-in-1-out txs, for anti-DoS
> purpose you should limit the number of previous 1-in-100-out txs. 
> 
> d) Limit 1-in-100-out txs == Limit UTXO growth
> 
> I’m not surprised that you find an model more efficient than Core. But I
> don’t believe one could find a model that doesn’t become more efficient
> with UTXO growth limitation.

My efficiency claims are *only* with regards to order validation. If we
assume all transactions are already pre-synced and verified, bitcrust's
order validation is very fast, and (only slightly) negatively effected
by input-counts.

Most total time is spend during base load script validation, and UTXO
growth is the definitely the limiting factor there, as the model here
isn't all that different from Core's.


> Maybe you could try an experiment with regtest? Make a lot 1-in-100-out
> txs with many blocks, then spend all the UTXOs with 100-in-1-out txs.
> Compare the performance of bitcrust with core. Then repeat with
> 1-in-1-out chained txs (so the UTXO set is always almost empty)
> 

Again, this really depends on whether we focus on full block validation,
in which case the 100-1, 1-100 distinction will be the similar to Core,
or only regard order validation, in which case Bitcrust will have this
odd reversal. 


> One more question: what is the absolute minimum disk and memory usage in
> bitcrust, compared with the pruning mode in Core?

As bitcrust doesn't support this yet, I cannot give accurate numbers,
but I've provided some numbers estimates earlier in the thread.


Rereading my post and these comments, I may have stepped on some toes
with regards to SegWit's model. I like SegWit (though I may have a
slight preference for BIP140), and I understand the reasons for the
"discount", so this was not my intention. I just think that the reversal
of costs during peak load order validation is a rather interesting
feature of using spend-tree  based validation. 

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev


On Sat, Apr 8, 2017, at 20:27, Tom Harding via bitcoin-dev wrote:

> 

> 

> On Apr 7, 2017 12:42, "Gregory Maxwell"  wrote:

>> On Fri, Apr 7, 2017 at 6:52 PM, Tom Harding via bitcoin-dev

>>   wrote:

>>  > A network in which many nodes maintain a transaction index also
>>  > enables a
>>  > class of light node applications that ask peers to prove
>>  > existence and
>>  > spentness of TXO's.

>> 

>> Only with the additional commitment structure such as those proposed
>>  by Peter Todd in his stxo/txo commitment designs, e.g.

>> https://petertodd.org/2016/delayed-txo-commitments

> Light nodes are improved by detecting invalid transactions, even
> before they are mined.
> _



I am not quite sure why you think this approach would help in this
regard. I may be missing part of how Core works here, but Bitcrust's
txindex is merely used to lookup transactions from hashes and currently,
and seems to  fulfil the same role  as Core's -txindex  mode.


This can be pruned, and in the future auto-pruned as the "flat files"
used as base for all data allow for concurrent pruning. But unlike Core,
it is always needed as without UTXO index, it is needed to find outputs
during base load validation.

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-08 Thread Tomas via bitcoin-dev
On Sat, Apr 8, 2017, at 02:44, Gregory Maxwell wrote:
> As you note that the output costs still bound the resource
> requirements. 

Resource cost is not just a measure of storage requirement; data that
needs to be accessed during peak load induce more cost then data only
used during base load or only rarely used.

> Latency related costs in Bitcoin Core also do not depend on the number
> of outputs in transactions in a block. When a transaction is handled
> it goes into an in-memory buffer and only gets flushed later if isn't
> spent before the buffer fills.  A block will take more time to
> validate with more inputs, same as you observer, but the aggregate
> resource usage for users depends significantly on outputs (so, in fact
> there is even further misaligned incentives than just the fact that
> small outputs have a outsized long term cost).

In Core, when a block comes the inputs are checked against the UTXO set
(which grows with outputs)  even if pre-synced, to verify order. Am I
wrong there? This is not in the case in bitcrust; it is instead checked
against the spend-tree (which grows with inputs).

How "significant" this is, I neither know nor claim,  but it is an
interesting difference. 

> Then I think you may want to retract the claim that "As this solution,
> reversing the costs of outputs and inputs, [...] updates to the
> protocol addressing the UTXO growth, might not be worth considering
> *protocol improvements* "

I think you are being a bit harsh here . I am also clearly explaining
the difference only applies to peak load, and just making a suggestion.
I simply want to stress the importance of protocol / implementation
separation as even though you are correct UTXO data is always a resource
cost for script validation (as I also state), the ratio of different
costs are  not necessarily *identical* across implementation. 

Note that the converse also holds: In bitcrust, if the last few blocks
contain many inputs, the peak load verification for this block is
slower. This is not the case in Core.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Hi Eric,

On Fri, Apr 7, 2017, at 21:55, Eric Voskuil via bitcoin-dev wrote:
> Optimization for lower memory platforms then becomes a process of
> reducing the need for paging. This is the purpose of a cache. The seam
> between disk and memory can be filled quite nicely by a small amount
> of cache. On high RAM systems any cache is actually a de-optimization
> but on low RAM systems it can prevent excessive paging. This is
> directly analogous to a CPU cache. 


I am not entirely sure I agree with that, or understand it correctly.

If -for example - the data of some application is a set  of records
which can be sorted from least frequently used to most frequently used
then doing just that sort will beat any application-layer cache.
Regardless of size of data and size of RAM, you simply allow the OS to
use disk caching or memory map caching to work its  magic .

In fact, I would argue that an application-layer cache *only* makes
sense if the data model shows a *hard* distinction between often and not
often used data. If usage-frequency is a continuous line, caching is
best left to the OS by focussing on proper spatial and temporal locality
of reference of your data, because the OS has much more information to
make the right decision. 

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Answering both,



On Fri, Apr 7, 2017 at 11:18 AM, Gregory Maxwell via bitcoin-dev 
 wrote:
>> 

>> I'm still lost on this-- AFAICT your proposals long term resource

>> requirements are directly proportional to the amount of
>> unspent output
>> data, which grows over time at some fraction of the total transaction
>> volume (plus the rate of spending which is more or less a constant).
>> 

>> Can you help out my understanding here?

>> 



On Fri, Apr 7, 2017, at 20:39, Bram Cohen wrote:

> Expanding on this question a bit, it's optimized for parallel access,
> but hard drive access isn't parallel and memory accesses are very
> fast, so shouldn't the target of optimization be about cramming as
> much as possible in memory and minimizing disk accesses?


The long term *minimal disk storage* requirement, can obviously not be
less then all the unspent outputs. Minimal disk requirements is not
something bitcrust attempts to address.


 The storage that is accessed during peak load (block validation with
 pre-synced transactions), is minimized as this only needs the
 transaction index (to lookup ptrs from hashes), the tip of the spend-
 tree and the tip of the spend-index (together to check double
 spents/spending non-existing outputs). These not only easily fit in
 RAM, but are accessed in a cache efficient way. *These* only grow with
 inputs as the spend tree contains one record per input referencing the
 output being spent.


Script validation is also not something bitcrust *directly* addresses;
it uses libbitcoinconsensus for the actual validation and lookups to
outputs are mostly similar. They are kept fast by trusting the OS on MRU
caching of transaction-outputs; I don't think that for this part the
UTXO index has much drawbacks,. Bitcrust seems to have a small advantage
due to the awesomeness of Rayon's parallelization and the lock-free data
structures, but a disadvantage in that keeping all spent outputs
decreases spatial locality of reference. Script validation is not the
innovative part.


Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Thank you,



The benches are running in Google Cloud Engine; currently on 8 vCPU
32gb, but I tend to switch hardware regularly.


Roughly, the results are better for Bitcrust with high end hardware and
the difference for total block validations is mostly diminished at 2
vCPU, 7,5 gb.


Note that the spend-tree optimization primarily aims to improve peak
load order validation; when a block with pre-synced transactions comes
in, but this is tricky to accurately bench with Core using this simple
method of comparison by logs.


I will upgrade to, and show the results against 0.14 in the next weeks.


Best,

Tomas





On Fri, Apr 7, 2017, at 16:14, Greg Sanders wrote:

> Interesting work.

> 

> I was wondering if you could tellank  us what specs for the machine
> being used as preliminary benchmark is here:
> https://bitcrust.org/results ?
> 

> I'd be interested to also see comparisons with 0.14 which has some
> improvements for script validation with more cores.
> 

> On Fri, Apr 7, 2017 at 4:47 AM, Tomas via bitcoin-dev  d...@lists.linuxfoundation.org> wrote:
>> Thank you Marcos,

>> 

>>  Though written in Rust, bitcrust-db is definitely usable as
>>  pluggable
>>  module as its interface will be roughly some queries, add_tx and

>>  add_block with blobs and flags. (Bitcrust internally uses a

>>  deserialize-only model, keeping references to the blobs with the
>>  parsed
>>  data).

>> 

>>  However, from Core's side I believe network and storage are
>>  currently
>>  rather tightly coupled, which will make this far from trivial.

>> 

>>  Regardless, I am also hoping (with funding & a team) to build a
>>  Bitcrust
>>  networking component as well to bring a strong competitor to the
>>  market.
>> 

>>  best,

>>  Tomas

>> 

>> 

>> 

>> 

>> On Fri, Apr 7, 2017, at 09:55, Marcos mayorga wrote:

>>  > Hi Tomas,

>>  >

>>  > I've read it and think it is an excellent work, I'd like to see it
>>  > integrated into bitcoin-core as a 'kernel module'.

>>  >

>>  > I see there are a lot of proof of concepts out there, IMO
>>  > every one
>>  > deserve a room in the bitcoin client as a selectable feature, to
>>  > make the
>>  > software more flexible and less dictatorial, an user could easily
>>  > select
>>  > which features she wants to run.

>>  >

>>  > Best regards,

>>  > Marcos

>>  >

>>  > > I have been working on a bitcoin implementation that uses a
>>  > > different
>>  > > approach to indexing for verifying the order of transactions.
>>  > > Instead of
>>  > > using an index of unspent outputs, double spends are verified by
>>  > > using a
>>  > > spend-tree where spends are scanned against spent outputs
>>  > > instead of
>>  > > unspent outputs.

>>  > >

>>  > > This allows for much better concurrency, as not only blocks, but
>>  > > also
>>  > > individual inputs can be verified fully in parallel.

>>  > >

>>  > > I explain the approach at https://bitcrust.org, source code is
>>  > > available
>>  > > at https://github.com/tomasvdw/bitcrust

>>  > >

>>  > > I am sharing this not only to ask for your feedback, but also to
>>  > > call
>>  > > for a clear separation of protocol and implementations: As this
>>  > > solution, reversing the costs of outputs and inputs, seems to
>>  > > have
>>  > > excellent performance characteristics (as shown in the test
>>  > > results),
>>  > > updates to the protocol addressing the UTXO growth, might not be
>>  > > worth
>>  > > considering *protocol improvements* and it might be best to
>>  > > address
>>  > > these concerns as implementation details.

>>  > >

>>  > > Kind regards,

>>  > > Tomas van der Wansem

>>  > > to...@bitcrust.org

>>  > > Bitcrust

>>  > > ___

>>  > > bitcoin-dev mailing list

>>  > > bitcoin-dev@lists.linuxfoundation.org

>>  > > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

>>  > >

>>  >

>>  >

>>  ___

>>  bitcoin-dev mailing list

>> bitcoin-dev@lists.linuxfoundation.org

>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-07 Thread Tomas via bitcoin-dev
Thank you Marcos,

Though written in Rust, bitcrust-db is definitely usable as pluggable
module as its interface will be roughly some queries, add_tx and
add_block with blobs and flags. (Bitcrust internally uses a
deserialize-only model, keeping references to the blobs with the parsed
data).  

However, from Core's side I believe network and storage are currently
rather tightly coupled, which will make this far from trivial.

Regardless, I am also hoping (with funding & a team) to build a Bitcrust
networking component as well to bring a strong competitor to the market.

best,
Tomas



On Fri, Apr 7, 2017, at 09:55, Marcos mayorga wrote:
> Hi Tomas,
> 
> I've read it and think it is an excellent work, I'd like to see it
> integrated into bitcoin-core as a 'kernel module'.
> 
> I see there are a lot of proof of concepts out there, IMO every one
> deserve a room in the bitcoin client as a selectable feature, to make the
> software more flexible and less dictatorial, an user could easily select
> which features she wants to run.
> 
> Best regards,
> Marcos
> 
> > I have been working on a bitcoin implementation that uses a different
> > approach to indexing for verifying the order of transactions. Instead of
> > using an index of unspent outputs, double spends are verified by using a
> > spend-tree where spends are scanned against spent outputs instead of
> > unspent outputs.
> >
> > This allows for much better concurrency, as not only blocks, but also
> > individual inputs can be verified fully in parallel.
> >
> > I explain the approach at https://bitcrust.org, source code is available
> > at https://github.com/tomasvdw/bitcrust
> >
> > I am sharing this not only to ask for your feedback, but also to call
> > for a clear separation of protocol and implementations: As this
> > solution, reversing the costs of outputs and inputs, seems to have
> > excellent performance characteristics (as shown in the test results),
> > updates to the protocol addressing the UTXO growth, might not be worth
> > considering *protocol improvements* and it might be best to address
> > these concerns as implementation details.
> >
> > Kind regards,
> > Tomas van der Wansem
> > to...@bitcrust.org
> > Bitcrust
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> 
> 
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev


On Fri, Apr 7, 2017, at 03:09, Gregory Maxwell wrote:
> 
> How do you deal with validity rules changing based on block height?

I expected that one :). Just like the 100 blocks coinbase rule, changes
by softforks need to be added as metadata to the transaction-index, but
this is not yet in place.

As for the script validation itself using libbitcoinconsensus, this is a
bit hairy as this expects the rules to be known. Luckily, simply
gradually retrying using "lower" rules won't hurt performance, as
transaction that mismatch newer rules are rare.

Generally, bitcrust would appreciate a "is valid with X rules" instead 
of a "validate with X rules" approach.


> So it sounds like to work the software still needs an analog of a
> (U)TXO database? I am confused by the earlier comments about thinking
> the the resource consumption of the (U)TXO database is not a
> consideration in your design.

No, but transactional access is. Bitcrust just uses a
*transaction-index*, where outputs can be looked up regardless of being
spent. As the concept of being "spent" depends on the branch, script
validation ignores this and simply looks up the outputs.

Transactions are split in two parts for better locality of reference
when accessing outputs.

The transaction index only looks similar to an "UTXO-index" after full
pruning.

> If you get a transaction claiming to spend 0xDEADBEEFDEADBEEF, an
> output that never existed how does your spent index reject this spend

The spend-tree is scanned until either DEADBEAF is found (=ERR double
spent),  the transaction of DEADBEEF is found (=all ok!), or the start
of the chain is reached (=ERR spending unknown output!)

To prevent actually having to scan to genesis, the spent-index "catches"
the search after a few blocks and performs the same lookup (positive for
tx, negative for output) on a simple bit index.


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev


On Fri, Apr 7, 2017, at 02:32, Gregory Maxwell wrote:

> Perhaps a simple question would help:
> 
> What is the minimal amount of space your system requires to take a new
> block received from the P2P network and verifying that all its spends
> were valid spends of existing coins unspent coins today?
> 
> For Bitcoin Core the answer is ~2GB (plus the configuration handling
> currently forces you to keep another 550MB of blocks for reorgs).

Bitcrust separates script validation (base load, when transaction come
in) from order validation (peak load, when blocks come in).

For script validation it would obviously need the ~2GB (or I think
~1.5GB) of outputs needed to validate these.  For order validation it
needs ~200mb or the spent-index (for bit-lookups) and I would guess
roughly ~500mb of the spent-tree (for scanning), though I don't think
the 5.7GB full spend tree isn't worth pruning anytime soon.

Then it is currently using a  ~1.5GB   index for transaction hash to
fileptr lookups, though this could be made more space efficient.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Using a storage engine without UTXO-index

2017-04-06 Thread Tomas via bitcoin-dev
Hi Eric,

Thanks, but I get the impression that the similarity is rather
superficial.  

To address your points:

> (1) higher than necessary storage space requirement due to storing the
> indexing data required for correlate the spends, and

Hmm. No. Spends are simply scanned in the spend-tree (full tree,
prunable, fully 5.6gb), or caught by the spend-index (bit index,
non-prunable, fully 180mb). Neither impose significant storage
requirements.

> 2) higher than necessary validation complexity and cost in terms of
> computing the spent-ness (including spender height) of an output.
>
> With the exception of de-linking (not deleted) in the case of reorgs, the
> entire store is append only, implemented in a small set of memory
> mapped file

I guess this is the key difference. As the spend-tree stores the spend
information in a tree structure, no reorgs are required, and the
resulting code is actually much less complex.

Bitcrust simply scans the tree. Although earlier designs used a
skip-list, it turns out that accompanied by a spent-index lagging a few
blocks behind, raw scanning is faster then anything even though it needs
to scan ~5 blocks times ~4000 inputs before reaching the first
spent-index,  the actual scan is highly cache efficient and little more
then a "REP SCASQ", reaching sub-microsecond per input on each core
*including* the lookup in the spend index.

 > I don't follow this part, maybe you could clarify. A spends index
> grows with the size of the spend set (forever) as it cannot be pruned,
> which certainly exceeds the size of the UTXO set (unless nothing is
> spent). The advantage is that you don't have to keep rewriting the
> store when you use a spends set (because the store can be append only).

My point is, that the spend tree grows per *input* of a transaction
instead of per *output* of a transaction, because this is what is
scanned on order validation.

The spend tree can be pruned because the spend index (~200mb) catches
early spends.

Disregarding the baseload script validation, the peak load order
validation of bitcrust is more negatively effected by a transaction with
many inputs than by a transaction of many outputs.

I encourage you to check out the results at https://bitcrust.org

Regards,
Tomas

On Fri, Apr 7, 2017, at 01:38, Eric Voskuil wrote:
> -BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA256
> 
> On 04/06/2017 03:12 PM, Tomas via bitcoin-dev wrote:
> 
> Hi Tomas,
> 
> > I have been working on a bitcoin implementation that uses a
> > different approach to indexing for verifying the order of
> > transactions. Instead of using an index of unspent outputs, double
> > spends are verified by using a spend-tree where spends are scanned
> > against spent outputs instead of unspent outputs.
> 
> This is the approach that genjix used in libbitcoin version2. With the
> exception of de-linking (not deleted) in the case of reorgs, the
> entire store is append only, implemented in a small set of memory
> mapped files. The downsides to the approach are:
> 
> (1) higher than necessary storage space requirement due to storing the
> indexing data required for correlate the spends, and
> 
> (2) higher than necessary validation complexity and cost in terms of
> computing the spent-ness (including spender height) of an output.
> 
> His implementation used a hash table, so performance-wise it did quite
> well and would theoretically outperform a tree, O(1) vs. O(log2(N)).
> 
> > This allows for much better concurrency, as not only blocks, but
> > also individual inputs can be verified fully in parallel.
> 
> I was successful in parallelizing input validation (across the inputs
> of an unconfirmed tx and across the set of all inputs in a block)
> using the v2 store. However, it is not the case that the spends
> approach is necessary for concurrency.
> 
> To resolve the above two problems the version3 store does not use a
> spends table/index. Nor does it store any table of UTXOs. Yet
> validation is highly parallelized. Instead of additional indexes it
> uses the tx hash table, augmented with 32 bits per output for spender
> height. So there is a O(1) cost of finding the tx and a O(N) cost of
> finding the spender height where N is the number of outputs in the tx.
> But because the number of outputs in a tx is bounded (by block size)
> this is constant time in the number of transactions.
> 
> This works out much faster than the spends table, and without the
> storage cost or complexity disadvantages. It also scales with
> available hardware, as the memory mapped files become in-memory hash
> tables. For low memory machines we found it was important to implement
> an opaque UTXO cache to limit paging, but for higher end systems zero
> cache is optimal.
> 
> > I am 

[bitcoin-dev] [BIP Draft] Decentralized Improvement Proposals

2015-12-30 Thread Tomas via bitcoin-dev
In an attempt to reduce developer centralization, and to reduce the risk
of forks introduced by implementation other than bitcoin-core, I have
drafted a BIP to support changes to the protocol from different
implementations.

The BIP can be found at:
https://github.com/tomasvdw/bips/blob/master/decentralized-improvement-proposals.mediawiki

I believe this BIP could mitigate the risk of forks, and decentralize
the development of the protocol.

If you consider the proposal worthy of discussion, please assign a
BIP-number.

Regards,
Tomas van der Wansem
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] [BIP Draft] Decentralized Improvement Proposals

2015-12-30 Thread Tomas via bitcoin-dev

> The specification itself looks like an inefficient and bloaty reinvention
> of 
> version bits.

The actual assignment of version bits isn't clear from the
specification. Are you saying that any implementation that wants to
propose a change is encouraged to pick a free version bit and use it?

Furthermore, my proposal addresses the danger of forward-incompatible
changes; a hard-fork can no longer occur as every implementation will
agree on the active the set of rules even if it has not implemented
them. This seems to be lacking in the version bits proposal.

Tomas
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev