Re: [Bitcoin-development] BIP proposal: Authenticated prefix trees

2014-01-05 Thread Thomas Voegtlin
Hello and happy new year to this mailing list!


Thank you Mark for the incredible work you've been doing on this.
I am following this very closely, because it is of primary importance
for Electrum.

I have written a Python-levelDB implementation of this UTXO hashtree,
which is currently being tested, and will be added to Electrum servers.

My implementation follows Alan Reiner's idea to store the tree as items
in a key-value database. I believe that a C++ implementation like yours
will be at least an order of magnitude faster, and I am looking forward 
to it.

I too believe that BIPs should define interoperability points, but probably
not implementation details. For the UTXO hashtree, this means that a BIP
should at least specify how the root hash is constructed. This might be the
only thing that needs to be specified.

However, I see no pressing issue with writing a BIP; it might be preferable
to implement and test different options first, and learn from that.

Thomas



Le 20/12/2013 02:47, Mark Friedenbach a écrit :
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Hello fellow bitcoin developers. Included below is the first draft of
 a BIP for a new Merkle-compressed data structure. The need for this
 data structure arose out of the misnamed Ultimate blockchain
 compression project, but it has since been recognized to have many
 other applications.

 In addition to this BIP I am preparing three additional BIPs
 describing the use of this data structure in stateless validation 
 mining, the UBC address index for SPV+ operating modes, document
 timestamping and merged mining.

 A Python implementation of this data structure is available here:

 https://github.com/monetizeio/python-bitcoin

 A C++ implementation is being worked on.

 As per the BIP-1 procedure, I am submitting this rough draft to the
 community for discussion. I welcome all comments and criticisms of
 both form and content.

 - -Mark


 ==Abstract==

 This BIP describes a [http://en.wikipedia.org/wiki/Hash_tree Merkle
 hash tree] variant of the [http://en.wikipedia.org/wiki/Trie
 prefix-tree data structure], ideally suited for encoding key-value
 indices which support memory-efficient proofs.

 ==Motivation==

 There are a number of applications which would benefit from having a
 data structure with the following properties:

 * '''Arbitrary mapping of keys to values.''' A ''key'' can be any
 bytestring, and its ''value'' any other bytestring.
 * '''Duplicate keys disallowed.''' Every key has one, and only one
 value associated with it. Some applications demand assurance that no
 key value is reused, and that this constraint can be checked without
 requiring access to the entire data structure.
 * '''Efficient look-up by key.''' The data structure should support
 sub-linear lookup operations with respect to the number of keys in the
 mapping. Logarithmic time or linear with respect to the length of the
 key should be achievable and would be sufficient for realistic
 applications.
 * '''Merkle compression of mapping structure.''' It should be possible
 to produce a reduced description of the tree consisting of a single
 root hash value which is deterministically calculated from the mapping
 structure.
 * '''Efficient proofs of inclusion.''' It should be possible to
 extract a proof of key/value mapping which is limited in size and
 verification time by the length of the key in the worst case.
 * '''Computation of updates using local information.''' Given a set of
 inclusion proofs, it should be possible to calculate adjustments to
 the local mapping structure (update or deletion of included mappings,
 or insertion between two included mappings which are adjacent in the
 global structure).

 Such applications include committed validation indices which enable
 stateless mining nodes, committed wallet indices which enable
 trust-less querying of the unspent transaction output set by
 codescriptPubKey/code, efficient document time-stamping, and
 secure  efficient merged mining. This BIP describes an authenticated
 prefix tree which has the above properties, but leaves the myriad
 applications to be formalized in future BIPs.

 ==Data structure==

 This BIP defines a binary prefix tree. Such a structure provides a
 mapping of bitstrings (the ''keys'') to bytestrings (the ''values'').
 It is an acyclic binary tree which implicitly encodes keys within the
 traversal path -- a left branch is a 0, and a right branch is a 1.
 Each node is reachable by only one unique path, and reading off the
 branches taken (0 for each left, 1 for each right) as one follows the
 path from root to target yields the node's key.

 The particular binary prefix tree defined by this BIP is a hybrid
 PATRICIA / de la Brandais tree structure.
 [http://en.wikipedia.org/wiki/Radix_tree PATRICIA trees] compress a
 long sequence of non-branching nodes into a single interior node with
 a per-branch ''skip prefix''. This achieves significant savings in
 storage space, root 

[Bitcoin-development] Privacy and blockchain data

2014-01-05 Thread Peter Todd
* Summary

CoinJoin, CoinSwap and similar technologies improve your privacy by
making sure information about what coins you own doesn't make it into
the blockchain, but syncing your wallet is a privacy risk in itself and
can easily leak that same info. Here's an overview of that risk, how to
quantify it, and how to reduce it efficiently.


* Background

In the most general sense a Bitcoin wallet is a collection of one or
more scriptPubKeys, often known as addresses.(*) The basic purpose of
the wallet is maintain the set of all transaction outputs (txouts)
matching the scriptPubKeys in the wallet.  Secondary to that purpose is
to maintain the set of all transactions associated with scriptPubKeys in
the wallet; almost all (all?) wallet software maintains transaction
information rather than only txout data. Usually, but not always, the
wallet will have some mechanism to spend transaction outputs, creating
new transactions. (if the wallet doesn't it is referred to as a
watch-only wallet)

Given a full set of blockchain data the task of keeping the set of all
relevant transactions and txouts up-to-date is simple: scan the
blockchain for the relevant data. The challenge is to devise systems
where wallets can be kept up to date without this requirement in a way
that is secure, efficient, scalable, and meets the user's privacy
requirements.

*) Alternatively addresses can be thought of as instructions to the
   payor as to how to generate a scriptPubKey that the payee can spend,
   a subtlety different concept.


* Threat Model and Goals

Currently the Bitcoin network consists of a large (low thousands) number
of allegedly independent nodes. There is no mechanism to prevent an
attacker from sybil attacking the network other than the availability of
IP addresses. This protection is made even weaker by the difficulty of
being sure you have a non-sybilled list of nodes to connect too; IP
addresses are passed gossip-style with no authentication.

From a privacy perspective we are conservative and assume an active,
internal, and global attacker - using the terminology of Diaz et al.(1)
- that controls up to 100% of the nodes you are connected too. With
regard to retrieval of blockchain data we can use the Sweeney's notion
of k-anonymity(2) where the privacy-sensitive data for an individual is
obscured by it's inclusion in a data of a large set of individuals, the
anonymity set.


* Basic Functionality

With regard to blockchain data we have the following basic functions:

** Spending funds

The user creates a transaction and gets it to miners by some method,
usually the P2P network although also possibly by direct submission.
Either way privacy can be achieved through a mix network such as Tor
and/or relaying other users' transactions so as to embed yours within a
larger anonymity set. In some cases payment protocols can shift the
problem to the recipient of the funds. Using CoinJoin also helps
increase the anonymity set.

Usually the sender will want to determine when the transaction confirms;
once the transaction has confirmed modulo a reorganization the
confirmation count can only increase. Transaction mutability and
double-spends by malicious CoinJoin participants complicate the task of
detecting confirmation: ideally we could simply query for the presence
of a given txid in each new block, however the transaction could be
mutated, changing the txid. The most simple way to detect confirmation
is then to query for spends of the txouts spend by the transaction.


** Receiving new funds

While in the future payment protocols may give recipients transaction
information directly it is most likely that wallets will continue to
have to query peers for new transactions paying scriptPubKey's under the
user's control for the forseeable future.


** Detection of unauthorized spends

Users' want early detection of private key compromise, accomplished by
querying blockchain data for spends from txouts in their wallets. This
has implications for how change must be handled, discussed below.


* Scalability/Efficiency

The total work done by the system as a whole for all queries given some
number of transactions n is the scalability of the scheme. In addition
scalability, and privacy in some cases, is improved if work can be
easily spread out across multiple nodes both at a per-block and
within-block level.


* Reliability/Robustness

Deterministic wallets using BIP32 or similar, where all private keys are
derived from a fixed seed, have proven to be extremely popular with
users for their simple backup model. While losing transaction metadata
after a data-loss event is unfortunate, losing access to all funds is a
disaster. Any address generation scheme must take this into account and
make it possible for all funds to be recovered quickly and efficiently
from blockchain data. Preserving privacy during this recovery is a
consideration, but 100% recovery of funds should not be sacrificed for
that goal.


* Query schemes

** Bloom