Re: [Bitcoin-development] Privacy and blockchain data

2014-01-10 Thread Peter Todd
On Tue, Jan 07, 2014 at 10:34:46PM -0800, Jeremy Spilman wrote:
 
 2) Common prefixes: Generate addresses such that for a given wallet they
all share a fixed prefix. The length of that prefix determines the
anonymity set and associated privacy/bandwidth tradeoff, which
remainds a fixed ratio of all transactions for the life of the
wallet.
 
 
 Interesting thought to make the privacy/bandwidth trade-off using
 vanitygen and prefix filters.
 
 But doesn't this effectively expand the universe of potential spies
 from 'the global attacker' who is watching your SPV queries, to
 simply 'the globe' -- anyone with a copy of the blockchain?

It's a trade-off. Most people are going to use public peers for their
SPV nodes - they're not going to run full nodes. They also are going to
want to limit how much bandwidth they use to sync their wallets; if they
don't care the can use a very short, or no, prefix and the problem goes
away.

If you make that bandwidth/privacy trade-off by using very specific
filters and non-specific addresses then you have a situation where those
public peers are learning a lot of potentially valuable data. It's easy
to imagine, say, the IRS being willing to pay for data on how many
Bitcoins people have in their wallets to try to catch tax cheats for
instance, and that can easily fund a lot of fast and high-quality peers
that don't advertise the fact that they're selling data on the contents
of your wallet.

On the other hand if you use non-specific filters, and prefixed
addresses for incoming payments, then you're not leaking high-quality
information to anyone. I think this makes for a more robust Bitcoin
system, especially as we need things like CoinJoin for privacy that make
*everyones* privacy matter to you; CoinJoin could easily be defeated by
aquiring lots of good info on the contents of wallets through SPV
queries.

 Some stats on UTXO set size:  (slightly stale -- as of block 270733)
 
7.4m unspent outputs
2.2m transactions with unspent outputs
2.1m unique unspent scriptPubKeys
Side note: the top 1,000 scriptPubKeys have 10% of all unspent outputs.
 
 Let's say you use an 8-bit prefix (1/256) that would be ~10,000
 transactions in the UTXO you would be monitoring. But if I knew a
 few different days / time-periods you transacted, I could figure out
 your prefix.

Actually UTXO isn't the right way to look at this; prefix filters would
be almost certainly matched against all txouts in blocks. Or put another
way, UTXO isn't the right way to look at it because the attacker will
have some rough idea of the time period, and wants to know about
transactions made.

 Of course, anyone you transact with would know your prefix outright.

Well what good, in your example, is it for the attacker to go from I
know my target gets a paycheck every two weeks for $x to His wallet
prefix is abcd with y% probability? Even once you learn the prefix of
your target's wallet, what funds they actually own is still embedded in
a much larger anonymity set of hundreds to thousands of transactions
that had nothing to do with them.

 Wouldn't this also allow obvious identification of spend versus
 change addresses in a transaction?

No, I specifically said that you don't want to use prefixes with change
txouts for that reason. Fortunately while the set of all scriptPubKey's
ever used for change txouts will grow over time, as long as you are not
watching for new payments on any key in that set you only need to query
for the ones that still have funds on them, and that's only because you
want to be able to detect unauthorized spends of them.

-- 
'peter'[:-1]@petertodd.org
00028a5c9edabc9697fe96405f667be1d6d558d1db21d49b8857


signature.asc
Description: Digital signature
--
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments  Everything In Between.
Get a Quote or Start a Free Trial Today. 
http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Privacy and blockchain data

2014-01-07 Thread Jeremy Spilman

 2) Common prefixes: Generate addresses such that for a given wallet they
all share a fixed prefix. The length of that prefix determines the
anonymity set and associated privacy/bandwidth tradeoff, which
remainds a fixed ratio of all transactions for the life of the
wallet.


Interesting thought to make the privacy/bandwidth trade-off using  
vanitygen and prefix filters.

But doesn't this effectively expand the universe of potential spies from  
'the global attacker' who is watching your SPV queries, to simply 'the  
globe' -- anyone with a copy of the blockchain?

Some stats on UTXO set size:  (slightly stale -- as of block 270733)

7.4m unspent outputs
2.2m transactions with unspent outputs
2.1m unique unspent scriptPubKeys
Side note: the top 1,000 scriptPubKeys have 10% of all unspent outputs.

Let's say you use an 8-bit prefix (1/256) that would be ~10,000  
transactions in the UTXO you would be monitoring. But if I knew a few  
different days / time-periods you transacted, I could figure out your  
prefix.

Of course, anyone you transact with would know your prefix outright.

Wouldn't this also allow obvious identification of spend versus change  
addresses in a transaction?


--
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET,  PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


[Bitcoin-development] Privacy and blockchain data

2014-01-05 Thread Peter Todd
* Summary

CoinJoin, CoinSwap and similar technologies improve your privacy by
making sure information about what coins you own doesn't make it into
the blockchain, but syncing your wallet is a privacy risk in itself and
can easily leak that same info. Here's an overview of that risk, how to
quantify it, and how to reduce it efficiently.


* Background

In the most general sense a Bitcoin wallet is a collection of one or
more scriptPubKeys, often known as addresses.(*) The basic purpose of
the wallet is maintain the set of all transaction outputs (txouts)
matching the scriptPubKeys in the wallet.  Secondary to that purpose is
to maintain the set of all transactions associated with scriptPubKeys in
the wallet; almost all (all?) wallet software maintains transaction
information rather than only txout data. Usually, but not always, the
wallet will have some mechanism to spend transaction outputs, creating
new transactions. (if the wallet doesn't it is referred to as a
watch-only wallet)

Given a full set of blockchain data the task of keeping the set of all
relevant transactions and txouts up-to-date is simple: scan the
blockchain for the relevant data. The challenge is to devise systems
where wallets can be kept up to date without this requirement in a way
that is secure, efficient, scalable, and meets the user's privacy
requirements.

*) Alternatively addresses can be thought of as instructions to the
   payor as to how to generate a scriptPubKey that the payee can spend,
   a subtlety different concept.


* Threat Model and Goals

Currently the Bitcoin network consists of a large (low thousands) number
of allegedly independent nodes. There is no mechanism to prevent an
attacker from sybil attacking the network other than the availability of
IP addresses. This protection is made even weaker by the difficulty of
being sure you have a non-sybilled list of nodes to connect too; IP
addresses are passed gossip-style with no authentication.

From a privacy perspective we are conservative and assume an active,
internal, and global attacker - using the terminology of Diaz et al.(1)
- that controls up to 100% of the nodes you are connected too. With
regard to retrieval of blockchain data we can use the Sweeney's notion
of k-anonymity(2) where the privacy-sensitive data for an individual is
obscured by it's inclusion in a data of a large set of individuals, the
anonymity set.


* Basic Functionality

With regard to blockchain data we have the following basic functions:

** Spending funds

The user creates a transaction and gets it to miners by some method,
usually the P2P network although also possibly by direct submission.
Either way privacy can be achieved through a mix network such as Tor
and/or relaying other users' transactions so as to embed yours within a
larger anonymity set. In some cases payment protocols can shift the
problem to the recipient of the funds. Using CoinJoin also helps
increase the anonymity set.

Usually the sender will want to determine when the transaction confirms;
once the transaction has confirmed modulo a reorganization the
confirmation count can only increase. Transaction mutability and
double-spends by malicious CoinJoin participants complicate the task of
detecting confirmation: ideally we could simply query for the presence
of a given txid in each new block, however the transaction could be
mutated, changing the txid. The most simple way to detect confirmation
is then to query for spends of the txouts spend by the transaction.


** Receiving new funds

While in the future payment protocols may give recipients transaction
information directly it is most likely that wallets will continue to
have to query peers for new transactions paying scriptPubKey's under the
user's control for the forseeable future.


** Detection of unauthorized spends

Users' want early detection of private key compromise, accomplished by
querying blockchain data for spends from txouts in their wallets. This
has implications for how change must be handled, discussed below.


* Scalability/Efficiency

The total work done by the system as a whole for all queries given some
number of transactions n is the scalability of the scheme. In addition
scalability, and privacy in some cases, is improved if work can be
easily spread out across multiple nodes both at a per-block and
within-block level.


* Reliability/Robustness

Deterministic wallets using BIP32 or similar, where all private keys are
derived from a fixed seed, have proven to be extremely popular with
users for their simple backup model. While losing transaction metadata
after a data-loss event is unfortunate, losing access to all funds is a
disaster. Any address generation scheme must take this into account and
make it possible for all funds to be recovered quickly and efficiently
from blockchain data. Preserving privacy during this recovery is a
consideration, but 100% recovery of funds should not be sacrificed for
that goal.


* Query schemes

** Bloom