Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Chris Pacia
Yeah that overhead is pretty high. I wasn't thinking about 10 years out.

On Sat, Feb 21, 2015, 11:47 AM Mike Hearn  wrote:

> Adam seems to be making sense to me. Only querying a single node when an
>> address in my wallet matches the block filter seems to be pretty efficient.
>>
>
> No, I think it's less efficient (for the client).
>
> Quick sums:  blocks with 1500 transactions in them are common today. But
> Bitcoin is growing. Let's imagine a system 10x larger than today. Doesn't
> seem implausible to reach that in the next 5-10 years, so 15,000
> transactions. Each transaction has multiple elements we might want to match
> (addresses, keys, etc).
>
> Let's say the average tx contains 5 unique keys/elements. That's the base
> case of {1 input, 2 outputs} without address reuse, plus fudged up a bit
> for multi-sends then down a bit again for address reuse.
>
> 15,000*5=75,000 unique elements per block. With an FP rate of 0.1% we get:
>
> http://hur.st/bloomfilter?n=75000&p=0.001
>
> 131.63KB per block extra overhead.
>
> 144 blocks in a day, so that's 18mb of data per day's worth of sync to
> pull down over the network. If you don't start your wallet for a week
> that's 126 megabytes of data just to get started.
>
> Affordable, yes (in the west). Fast enough to be competitive? Doubtful. I
> don't believe that even in five years mobiles will be pulling down and
> processing that much data within a few seconds, not even in developed
> countries.
>
> But like I said, I don't see why it matters. Anyone who is watching the
> wire close to you learns which transactions are yours, still, so it doesn't
> stop intelligence agencies. Anyone who is running a node learns which
> transactions in the requested block were yours and thus can follow the tx
> chain to learn which other transactions might be yours too, no different to
> today. If you connect to a single node and say "give me the transactions
> sending money to key A in block N", it doesn't matter if you then don't
> request block N+6 from the same peer - they know you will request it
> eventually anyway, just by virtue of the fact that one of the transactions
> they gave you was spent in that block.
>
>
>
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Mike Hearn
>
> Adam seems to be making sense to me. Only querying a single node when an
> address in my wallet matches the block filter seems to be pretty efficient.
>

No, I think it's less efficient (for the client).

Quick sums:  blocks with 1500 transactions in them are common today. But
Bitcoin is growing. Let's imagine a system 10x larger than today. Doesn't
seem implausible to reach that in the next 5-10 years, so 15,000
transactions. Each transaction has multiple elements we might want to match
(addresses, keys, etc).

Let's say the average tx contains 5 unique keys/elements. That's the base
case of {1 input, 2 outputs} without address reuse, plus fudged up a bit
for multi-sends then down a bit again for address reuse.

15,000*5=75,000 unique elements per block. With an FP rate of 0.1% we get:

http://hur.st/bloomfilter?n=75000&p=0.001

131.63KB per block extra overhead.

144 blocks in a day, so that's 18mb of data per day's worth of sync to pull
down over the network. If you don't start your wallet for a week that's 126
megabytes of data just to get started.

Affordable, yes (in the west). Fast enough to be competitive? Doubtful. I
don't believe that even in five years mobiles will be pulling down and
processing that much data within a few seconds, not even in developed
countries.

But like I said, I don't see why it matters. Anyone who is watching the
wire close to you learns which transactions are yours, still, so it doesn't
stop intelligence agencies. Anyone who is running a node learns which
transactions in the requested block were yours and thus can follow the tx
chain to learn which other transactions might be yours too, no different to
today. If you connect to a single node and say "give me the transactions
sending money to key A in block N", it doesn't matter if you then don't
request block N+6 from the same peer - they know you will request it
eventually anyway, just by virtue of the fact that one of the transactions
they gave you was spent in that block.
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Chris Pacia
Adam seems to be making sense to me. Only querying a single node when an
address in my wallet matches the block filter seems to be pretty
efficient. The downside is it relies entirely on Tor for privacy, but
then again it's not the only aspect of spv clients that require it for
privacy (there's broadcasting for example).

And on a related note, if we eventually do end up receiving bip70
payments directly, we still need to query for block inclusion and that
would seem to be an easy way to do it.

On 02/20/2015 12:53 PM, Mike Hearn wrote:
>
> This is talking about a committed bloom filter. Not a committed
> UTXO set.
>
>
> I read the following comment to mean it requires the UTXO commitments.
> Otherwise I'm not sure how you prove absence of withholding with just
> current block structures+an extra filter included in the block:
>
> but with the bloom commitment (and UTXO trie organised commitment) he
> can verify that the positive hits are correct via the merkle path, and
> that the false positives are not being wrongly withheld by obtaining
> merkle path proof that they are not in the trie 
>
>
>
>
>
> --
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
>
> ___
> Bitcoin-development mailing list
> Bitcoin-development@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/bitcoin-development

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Mike Hearn
>
> For downloading transactions unless you frequently receive
> transactions you wont be fetching every block.  Or are you assuming
> bloom filters dialled up to the point of huge false positives?  You
> said otherwise.
>

Well, what I mean is, bitcoinj already gets criticised for having very low
FP rates, but even with those rates we're applying them to hundreds of
thousands of transactions per sync. So it's still enough FPs to trigger at
least one per block, often several, yet people tell us this isn't enough to
give meaningful privacy.


> Relatedly I think bitcoin could do with a store-and-forward message
> bus with privacy and strong reliability via redundancy (but less
> redundancy maybe than consensus all-nodes must receiving and agree and
> store forever).
>

Yup, see here:

https://www.bitcoinauthenticator.org/subspace/
https://groups.google.com/forum/#!topic/bitcoinj/_S15jo5mcDI

Subspace looks like it's developing into what we need.


> You seem to be saying at one point that Tor is useless against
> pervasive eavesdropper threat model


No, Tor is effective against in that threat model. What I meant is that
without Tor, someone doing wire intercepts isn't going to be fazed by using
multiple peers together, and with Tor it's not clear that syncing from
multiple peers in parallel gives much an additional win.

Also, getting Tor practical enough to activate by default is tricky. Though
the same people who are doing Subspace are trying it out to see what
happens.

secondly that other types of attackers are disinterested (how do we know
> that?) or maybe that you
> dont care about privacy vs them (maybe some users do!)
>

Some of my opinions are based on experience of HTTPS deployments, where
many of the same issues apply.


> It would certainly be nice to get real privacy from a wider range of
> attackers but nothing (current situation) is clearly worse; using
> block bloom filters we'd make the pervasive case harder work, and the
> nosy full node learn nothing.


Yes, but what's the best way to fix that?

The calculation goes like this:  we have ~80 hours of hacking time to spend
on privacy this quarter. Do we:

a) Do wire encryption
b) Make Bloom filter clients smarter
c) Optimise Tor
d) Do a new PIR protocol from scratch and possibly run out of time having
failed to launch

Of these (d) is the least appealing to me, especially because I don't feel
like submitting SPV related stuff to Bitcoin Core any more. If I were to
work on the protocol it'd be in the context of Bitcoin XT, which rules out
consensus changes or other things that rely on miners. Wire encryption
would probably raise the bar for our spooky friends quite a lot, with
minimal effort. The ROI looks good, compared to more complex PIR.
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Adam Back
If you want to be constructive and index transactions that are not
p2sh but non-simple and contain checksig so the address is visible,
you could do that with a block bloom filter also.

I wasnt sure if the comments about the need to batch requests was
about downloading headers & filters, or about transactions, there is
no harm downloading headers & bloom filters without Tor - there is no
identity nor addresses revealed by doing so.  So over Tor you would
just be fetching transactions that match the address.

For downloading transactions unless you frequently receive
transactions you wont be fetching every block.  Or are you assuming
bloom filters dialled up to the point of huge false positives?  You
said otherwise.

Mid-term I'd say you want some basic request tunneling as part of
bitcoin, that maybe isnt Tor, to avoid sharing their fate if Tor
controversies are a risk to Tor service.  Some of the bitcoin-Tor
specific weak points could maybe then be addressed.

Relatedly I think bitcoin could do with a store-and-forward message
bus with privacy and strong reliability via redundancy (but less
redundancy maybe than consensus all-nodes must receiving and agree and
store forever).  That  provides an efficient store-and-forward SPV
receivable stealth-address solution that doesnt suck: send the
recipient their payment, if they like it they broadcast it themselves.
As a bonus store-and-forward message mixes are better able to provide
meaningful network privacy than interactive privacy networks.  You
could spend over the same channel

You seem to be saying at one point that Tor is useless against
pervasive eavesdropper threat model (which I am not sure I agree with,
minimally it makes them work for the info and adds uncertainty; and
not been paying super close attention but I think some of the Snowden
releases suggest Tor is a net win) and secondly that other types of
attackers are disinterested (how do we know that?) or maybe that you
dont care about privacy vs them (maybe some users do!)

It would certainly be nice to get real privacy from a wider range of
attackers but nothing (current situation) is clearly worse; using
block bloom filters we'd make the pervasive case harder work, and the
nosy full node learn nothing.

Adam

On 21 February 2015 at 13:28, Mike Hearn  wrote:
> Let's put the UTXO commitments/anti-fraud proofs to one side for a moment. I
> would like to see them happen one day, but they aren't critical to these
> protocols and are just proving to be a distraction.
>
>
>>
>> Then they make fresh random connections to different nodes and request
>> download of the respective individual transactions from the full node.
>
>
> ...
>
>> About privacy the node can make different random connections to
>> different nodes to fetch addresses . The full node cant
>> correlate the addresses as belonging to the same person by correlating
>> the download requests for them, because they are made via different
>> nodes.
>
>
> Apologies for the wall of text, but I don't think this will work nor solve
> any real problem. And I must justify such a strong statement clearly.
>
> First: technical issues
>
> When you download the per-block Bloom filter and test, what you get back is
> a set of script elements (addresses, keys, OP_RETURN tags etc). But then in
> the next step you are saying that you connect to random peers and request
> individual transactions. We don't know that at this point. All we know are a
> set of addresses that possibly matched. So I think what you mean is "wallets
> connect to random peers and request transactions in block N that match a
> given set of addresses".
>
> This is what Bloom filtering already does, of course. Doing the test against
> the per-block filter first doesn't seem to buy us much because with
> thousands of transactions per block, even a very tiny FP rate will still
> trigger a match on every single one.
>
> The second problem I see is that we can't do this in parallel because of the
> following edge case: wallet contains key K and someone sends it money using
> an OP_CHECKSIG output. The input which spends this output does not contain
> any predictable data, thus we do not know what to look for in the following
> blocks to detect a spend of it until we have seen the first transaction and
> know its hash.
>
> In practice this means we must either scan through the chain in sequence and
> update our matching criteria if we see such an output (this is what the
> Bloom filtering protocol already does server-side), or we must constrain the
> user such that output scripts always force repetition of predictable data -
> this is what mostly happens today due to pay-to-address outputs, but not
> always, and correctness is more important than completeness.
>
> If we can't do it in parallel then we must suffer a node round-trip for
> every single block we traverse, because we can't request long runs of blocks
> with a single command. That latency will kill performance dead. It's a non

Re: [Bitcoin-development] bloom filtering, privacy

2015-02-21 Thread Mike Hearn
Let's put the UTXO commitments/anti-fraud proofs to one side for a moment.
I would like to see them happen one day, but they aren't critical to these
protocols and are just proving to be a distraction.



> Then they make fresh random connections to different nodes and request
> download of the respective individual transactions from the full node.
>

...

About privacy the node can make different random connections to
> different nodes to fetch addresses . The full node cant
> correlate the addresses as belonging to the same person by correlating
> the download requests for them, because they are made via different
> nodes.


Apologies for the wall of text, but I don't think this will work nor solve
any real problem. And I must justify such a strong statement clearly.

*First: technical issues*

When you download the per-block Bloom filter and test, what you get back is
a set of script elements (addresses, keys, OP_RETURN tags etc). But then in
the next step you are saying that you connect to random peers and request
individual transactions. We don't know that at this point. All we know are
a set of addresses that possibly matched. So I think what you mean is
"wallets connect to random peers and request transactions in block N that
match a given set of addresses".

This is what Bloom filtering already does, of course. Doing the test
against the per-block filter first doesn't seem to buy us much because with
thousands of transactions per block, even a very tiny FP rate will still
trigger a match on every single one.

The second problem I see is that we can't do this in parallel because of
the following edge case: wallet contains key K and someone sends it money
using an OP_CHECKSIG output. The input which spends this output does not
contain any predictable data, thus we do not know what to look for in the
following blocks to detect a spend of it until we have seen the first
transaction and know its hash.

In practice this means we must either scan through the chain in sequence
and update our matching criteria if we see such an output (this is what the
Bloom filtering protocol already does server-side), or we must constrain
the user such that output scripts always force repetition of predictable
data - this is what mostly happens today due to pay-to-address outputs, but
not always, and correctness is more important than completeness.

If we can't do it in parallel then we must suffer a node round-trip for
every single block we traverse, because we can't request long runs of
blocks with a single command. That latency will kill performance dead. It's
a non starter.

But let's imagine we don't care about OP_CHECKSIG outputs and are willing
to ignore them. There are cases where they are the best and most efficient
technical solution, but let's put that to one side.

The primary difference after making the above changes are that no one node
gets a filter containing *all* our keys and addresses. I don't think a per
block pre-test filter would gain us much efficiency so from a privacy
perspective this is what it boils down to - sharding of the scan.

But we can already do this with the current Bloom filtering protocol.
BitcoinJ doesn't do so because having multiple parallel scans uses up
network IOPs which are a resource of unknown quantity, and because stepping
through the chain in parallel with multiple peers complicates the chain
sync implementation quite a bit.

*Second: this doesn't solve any real problem*

Who cares about collecting Bloom filters off the wire?

Commercial fraudsters? Doubtful. There are much easier ways to steal money.

Spies? Yes! Without a doubt NSA/GCHQ are building or have built databases
of IP addresses to Bitcoin addresses and are correlating it via XKEYSCORE
with other identifiable information.

However, just requesting data from different nodes doesn't help with that,
because they are doing DPI and can still see all the connections, so can
still combine all the filters or received transactions.

Ah, you say, but we're requesting everything via Tor.

Yes, about that. We've implemented that already. Some wallets even use it
by default, like Alon & Chris' Bitcoin Authenticator wallet. It's just one
line of code to activate.

Unfortunately there are severe practical problems to using Tor:

   1. If you don't have a warm consensus then booting it up is very slow.
   We're already slower than our competitors like blockchain.info and
   VISA/MasterCard, we can't make this any worse.

   This one is possibly not that big a deal and can be solved with more
   technical tricks.

   2. Bitcoin Core's DoS strategy means anyone can block all of Tor quite
   trivially. So we'd need some complicated fallback mechanism to disable Tor
   remotely, in case someone did this.

   3. Bitcoin wire traffic isn't encrypted or authenticated so it makes it
   much easier for trolls to tamper with lots of wire traffic at once, whereas
   without Tor it's much harder.

Let's ignore the fact that the Tor pro

Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Adam Back
Seems like Greg & I may be saying different things.  Maybe I am
misunderstanding something at the wire level or in size/scalability
but what I was trying to say is I think simpler.

By UTXO commitment I mean a merkle tree of unspent addresses is
committed in each block.  Also a bloom filter containing addresses in
the block is committed.

Now the user downloads the bloom filter for each block, and searches
it locally.  They see which addresses of theirs maybe in the block
(with some false positives).

Then they make fresh random connections to different nodes and request
download of the respective individual transactions from the full node.
The node can respond either a) here is the transaction and here is its
merkle path in the merkle tree (this is the way SPV works today); or
b) there is no such transaction, this is a false positive, and here is
a pair of merkle trie paths in the UTXO commitment (a trie) that
proves the full node is not withholding and its true that no such
transaction is in the block.

Additionally with UTXO commitments in case a) the user can keep up to
date with the chain tip and request from the full node a merkle path
in the UTXO commitment to show that the coin is still unspent.
(Otherwise you get long range attacks where someone can grind away
until they belatedly find a PoW on an old low hashrate block with UTXO
and fool an SPV node they know the address for into accepting a spend
of something long spent).

About privacy the node can make different random connections to
different nodes to fetch addresses.  Nothing is leaked by downloading
the bloom filter.  Scanning happens locally.  The full node cant
correlate the addresses as belonging to the same person by correlating
the download requests for them, because they are made via different
nodes.  Its not a surprise nor privacy revealing that someone would
want to check receipt of the funds.  The limit is the interactive
nature of ToR which isnt very secure against pervasive monitoring.
But for basic full-node passive attack resistant privacy pretty good.

Contrast with increasing the false positive on bloom queries: here the
full node can test correlation (modulo the false positive ratio) and
combine that with network flow analysis to narrow down who the user
might be.  Plus query size and privacy are in conflict.  Plus the
query size has to be continually retuned to even create a reliable
false positive rate relative to the current UTXO set.  Is that is even
happening now (other than parameter sets)?

About the bitmap:

>Greg Maxwell wrote:
>> there is a scriptpubkey bitmap per block
>> which is committed. Users can request the map, and be SPV confident
>> that they received a faithful one. If there are hits, they can request
>> the block and be confident there was no censoring.

how does the SPV client know what the bits in this map mean to scan?
I presume these would be one bit per address and one would need to
know the full UTXO set in order to know whats in there.  I am not sure
an SPV node would want the hit of keeping up to date with the full
UTXO set?

s/address/scriptpubkey for accuracy)

Adam

On 20 February 2015 at 19:03, Mike Hearn  wrote:
>> It's a straight forward idea: there is a scriptpubkey bitmap per block
>> which is committed. Users can request the map, and be SPV confident
>> that they received a faithful one. If there are hits, they can request
>> the block and be confident there was no censoring.
>
>
> OK, I see now, thanks Gregory. You're right, the use of UTXO set in that
> context was confusing me.
>
> If I go back to when we first did Bloom filtering and imagine the same
> proposal being made, I guess I would have been wondering about the following
> issues. Perhaps they have solutions:
>
> 1. Miners have to upgrade to generate the per-block filters. Any block that
> doesn't have such a filter has to be downloaded in full, still. So it would
> have taken quite a while for the bandwidth savings to materialise.
>
> 2. If checking the filter isn't a consensus rule, any miner who builds a
> wrong filter breaks the entire SPV userbase. With per-node filtering, a
> malicious or wrong node breaks an individual sync, but if the wallet user
> notices they don't seem to be properly synchronised they can just press
> "Rescan chain" and most likely get fixed. In practice broken nodes have
> never been reported, but it's worth considering.
>
> 3. Downloading full blocks is still a lot of data. If you have a wallet that
> receives tips a couple of times per day, and you open your wallet once per
> week, then with 1mb blocks you would be downloading ~14mb of data each time.
> Pretty pokey even on a desktop. Much sadness if you're on mobile.
>
> 4. Header size is constant, but per-block filters wouldn't be. So even the
> null sync would download more data as the network got busier. Of course
> Bloom filtering has the same scaling problem, but only between hard disk ->
> RAM -> CPU rather than across the network.
>
> 5.

Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Mike Hearn
Hey Adam,


> Mike had posted a detailed response on the topic on why its complex
> and becomes bandwidth inefficient to improve it usefully.
>

To clarify, we *could* improve privacy and still preserve usefully high
performance, it's just a lot of complicated programming work. You need to
find out from the OS how much bandwidth you have to play with, for example,
and do all the very complex tracking to surf the wave and keep yourself in
roughly the right place.

The basic summary of which I think is that its not even intended to
> provide any practical privacy protection, its just about compacting
> the query for a set of addresses.
>

The original intent of Bloom filtering was to allow both. We want our cake
and we want to eat it.

The protocol can still do that, with sufficiently smart clients. The
problem is that being sufficiently smart in this regard has never come to
the top of the TODO list - users are always complaining about other things,
so those things are what gets priority.

It's not IMO a protocol issue per se. It's a code complexity and manpower
issue.


> Its seems surprising no one thought of it
> that way before (as it seems obvious when you hear it) but that seems
> to address the privacy issues as the user can fetch the block bloom
> filters and then scan it in complete privacy.


And then what? So you know the block matches. But with reasonable FP rates
every block will match at least a few transactions (this is already the
case - the FP rate is low but high enough that we get back FPs on nearly
every block). So you end up downloading every block? That won't work.

Eventually, wallets need to stop doing linear scans of the entire block
chain to find tx data. That worked fine when blocks were 10kb, it's still
working OK even though we scaled through two orders of magnitude, but we
can imagine that if we reach 10mb blocks then this whole approach will just
be too slow.

The main reason wallets are scanning the chain today (beyond lack of
protocol support for querying the UTXO set by script), is that they want to
show users time-ordered lists of transactions. Financial apps should show
you payment histories, everyone knows this, and without knowing roughly
when a tx happened and which inputs/outputs were mine, providing a useful
rendering is hard. Even with this data the UI is pretty useless, but at
least it's not actually missing.

By combining Subspace and BIP70 we can finally replace the payments list UI
with actual proper metadata that isn't extracted from the block chain, and
at that point non-scanning architectures become a lot more deployable.
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Mike Hearn
>
> It's a straight forward idea: there is a scriptpubkey bitmap per block
> which is committed. Users can request the map, and be SPV confident
> that they received a faithful one. If there are hits, they can request
> the block and be confident there was no censoring.


OK, I see now, thanks Gregory. You're right, the use of UTXO set in that
context was confusing me.

If I go back to when we first did Bloom filtering and imagine the same
proposal being made, I guess I would have been wondering about the
following issues. Perhaps they have solutions:

1. Miners have to upgrade to generate the per-block filters. Any block that
doesn't have such a filter has to be downloaded in full, still. So it would
have taken quite a while for the bandwidth savings to materialise.

2. If checking the filter isn't a consensus rule, any miner who builds a
wrong filter breaks the entire SPV userbase. With per-node filtering, a
malicious or wrong node breaks an individual sync, but if the wallet user
notices they don't seem to be properly synchronised they can just press
"Rescan chain" and most likely get fixed. In practice broken nodes have
never been reported, but it's worth considering.

3. Downloading full blocks is still a lot of data. If you have a wallet
that receives tips a couple of times per day, and you open your wallet once
per week, then with 1mb blocks you would be downloading ~14mb of data each
time. Pretty pokey even on a desktop. Much sadness if you're on mobile.

4. Header size is constant, but per-block filters wouldn't be. So even the
null sync would download more data as the network got busier. Of course
Bloom filtering has the same scaling problem, but only between hard disk ->
RAM -> CPU rather than across the network.

5. Is it really more private? Imagine we see a hit in block 100, so we
download the full block and find our transaction. But now we must also
learn when that transaction is spent, so we can keep the wallet-local UTXO
set up to date. So we scan forward and see another hit in block 105, so now
we download block 105. The remote node can now select all transactions in
block 105 that spend transactions in block 100 and narrow down which txs
are ours. If we are watching a wallet that does multi-sends then it feels
like this problem gets much worse.



I'd really like to find a solution that has O(1) scaling on sync. If we see
nodes just as sources of UTXO data then shovelling the client (tx, existing
merkle path) pairs keyed off script prefixes would (with one additional
index) be O(1) and provide the same security guarantees as Bloom filtering
today. It creates lots of other problems to solve, but at least it would
scale into the forseeable future.
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Gregory Maxwell
On Fri, Feb 20, 2015 at 5:59 PM, Adam Back  wrote:
> So now they ask a full node for merkle paths + transactions for the
> addresses from the UTXO set from the block(s) that it was found in.

Use of the words "UTXO set" here is probably confusing people as it's
likely to make people think of the complete verification state. In
this case it's simply referring to block-local data. (and thus avoids
the large IO overheads of an actual UTXO set).

It's a straight forward idea: there is a scriptpubkey bitmap per block
which is committed. Users can request the map, and be SPV confident
that they received a faithful one. If there are hits, they can request
the block and be confident there was no censoring.

It's possible to tree structure additional layers to the bitmap, so
one can incrementally query to trade0off map size for false request
overhead, it's not clear to me how much of a win this would be for
normal parameters.. It's also possible to extract the txout list for
the whole block and commit to that too so it's possible to request
just the outputs and get a faithful copy of them, which is _much_
smaller than the block overall.

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Mike Hearn
>
> So now they ask a full node for merkle paths + transactions for the
> addresses from the UTXO set from the block(s) that it was found in.


This is the part where I get lost. How does this improve privacy? If I have
to specify which addresses are mine in this block, to get the tx data, the
node learns which addresses are mine at this point, no?

Also, are you saying each block needs a record of the entire UTXO set at
the time the block was made? I'm not sure how to parse this sentence.

Could you please walk me through precisely what happens and what data is
sent, once I learn that a block has interesting data in it?
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Adam Back
The idea is not mine, some random guy appeared in #bitcoin-wizards one
day and said something about it, and lots of people reacted, wow why
didnt we think about that before.

It goes something like each block contains a commitment to a bloom
filter that has all of the addresses in the block stored in it.

Now the user downloads the headers and bloom data for all blocks.  The
know the bloom data is correct in an SPV sense because of the
commitment.  They can scan it offline and locally by searching for
addresses from their wallet in it.  Not sure off hand what is the most
efficient strategy, probably its pretty fast locally anyway.

Now they know (modulo false positives) which addresses of theirs maybe
in the block.

So now they ask a full node for merkle paths + transactions for the
addresses from the UTXO set from the block(s) that it was found in.

Separately UTXO commitments could optionally be combined to improve
security in two ways:

- the normal SPV increase that you can also see that the transaction
is actually in the last blocks UTXO set.

- to avoid withholding by the full node, if the UTXO commitment is a
trie (sorted) they can expect a merkle path to lexically adjacent
nodes either side of where the claimed missing address would be as a
proof that there really are no transactions for that address in the
block.  (Distinguishing false positive from node withholding)

Adam

On 20 February 2015 at 17:43, Mike Hearn  wrote:
> Ah, I see, I didn't catch that this scheme relies on UTXO commitments
> (presumably with Mark's PATRICIA tree system?).
>
> If you're doing a binary search over block contents then does that imply
> multiple protocol round trips per synced block? I'm still having trouble
> visualising how this works. Perhaps you could write down an example run for
> me.
>
> How does it interact with the need to download chains rather than individual
> transactions, and do so without round-tripping to the remote node for each
> block? Bloom filtering currently pulls down blocks in batches without much
> client/server interaction and that is useful for performance.
>
> Like I said, I'd rather just junk the whole notion of chain scanning and get
> to a point where clients are only syncing headers. If nodes were calculating
> a script->(outpoint, merkle branch) map in LevelDB and allowing range
> queries over it, then you could quickly pull down relevant UTXOs along with
> the paths that indicated they did at one point exist. Nodes can still
> withhold evidence that those outputs were spent, but the same is true today
> and in practice this doesn't seem to be an issue.
>
> The primary advantage of that approach is it does not require a change to
> the consensus rules. But there are lots of unanswered questions about how it
> interacts with HD lookahead and so on.
>

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Mike Hearn
>
> This is talking about a committed bloom filter. Not a committed UTXO set.
>

I read the following comment to mean it requires the UTXO commitments.
Otherwise I'm not sure how you prove absence of withholding with just
current block structures+an extra filter included in the block:

but with the bloom commitment (and UTXO trie organised commitment) he
> can verify that the positive hits are correct via the merkle path, and
> that the false positives are not being wrongly withheld by obtaining
> merkle path proof that they are not in the trie
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Gregory Maxwell
On Fri, Feb 20, 2015 at 4:54 PM, Mike Hearn  wrote:
> And then what? So you know the block matches. But with reasonable FP rates
> every block will match at least a few transactions (this is already the case

This approach needs a filter set with a lower FP rate. It doesn't
depend on having a high FP rate for privacy (which is good, since
counting on filter false positives seems to more or less fail to
deliver actual privacy in any case.)

Larger filters mean a somewhat higher baseline bandwidth, though when
users do not reuse addresses and have more addresses than there are
txouts in the block the gap is narrower.

> Ah, I see, I didn't catch that this scheme relies on UTXO commitments

This is talking about a committed bloom filter. Not a committed UTXO set.

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Mike Hearn
Ah, I see, I didn't catch that this scheme relies on UTXO commitments
(presumably with Mark's PATRICIA tree system?).

If you're doing a binary search over block contents then does that imply
multiple protocol round trips per synced block? I'm still having trouble
visualising how this works. Perhaps you could write down an example run for
me.

How does it interact with the need to download chains rather than
individual transactions, and do so without round-tripping to the remote
node for each block? Bloom filtering currently pulls down blocks in batches
without much client/server interaction and that is useful for performance.

Like I said, I'd rather just junk the whole notion of chain scanning and
get to a point where clients are only syncing headers. If nodes were
calculating a script->(outpoint, merkle branch) map in LevelDB and allowing
range queries over it, then you could quickly pull down relevant UTXOs
along with the paths that indicated they did at one point exist. Nodes can
still withhold evidence that those outputs were spent, but the same is true
today and in practice this doesn't seem to be an issue.

The primary advantage of that approach is it does not require a change to
the consensus rules. But there are lots of unanswered questions about how
it interacts with HD lookahead and so on.
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Adam Back
Mike Hearn wrote:
> Adam Back wrote:
> > Its seems surprising no one thought of it
> > that way before (as it seems obvious when you hear it) but that seems
> > to address the privacy issues as the user can fetch the block bloom
> > filters and then scan it in complete privacy.
>
> And then what? So you know the block matches. But with reasonable FP
> rates every block will match at least a few transactions (this is already the
> case - the FP rate is low but high enough that we get back FPs on nearly
 > every block). So you end up downloading every block?

I mean because the user is scanning he can binary search which set of
addresses from his wallet are possibly in the block and then request
the specific addresses and some will be false positives and some real,
but with the bloom commitment (and UTXO trie organised commitment) he
can verify that the positive hits are correct via the merkle path, and
that the false positives are not being wrongly withheld by obtaining
merkle path proof that they are not in the trie.

Adam

On 20 February 2015 at 16:54, Mike Hearn  wrote:
> Hey Adam,
>
>>
>> Mike had posted a detailed response on the topic on why its complex
>> and becomes bandwidth inefficient to improve it usefully.
>
>
> To clarify, we could improve privacy and still preserve usefully high
> performance, it's just a lot of complicated programming work. You need to
> find out from the OS how much bandwidth you have to play with, for example,
> and do all the very complex tracking to surf the wave and keep yourself in
> roughly the right place.
>
>> The basic summary of which I think is that its not even intended to
>> provide any practical privacy protection, its just about compacting
>> the query for a set of addresses.
>
>
> The original intent of Bloom filtering was to allow both. We want our cake
> and we want to eat it.
>
> The protocol can still do that, with sufficiently smart clients. The problem
> is that being sufficiently smart in this regard has never come to the top of
> the TODO list - users are always complaining about other things, so those
> things are what gets priority.
>
> It's not IMO a protocol issue per se. It's a code complexity and manpower
> issue.
>
>>
>> Its seems surprising no one thought of it
>> that way before (as it seems obvious when you hear it) but that seems
>> to address the privacy issues as the user can fetch the block bloom
>> filters and then scan it in complete privacy.
>
>
> And then what? So you know the block matches. But with reasonable FP rates
> every block will match at least a few transactions (this is already the case
> - the FP rate is low but high enough that we get back FPs on nearly every
> block). So you end up downloading every block? That won't work.
>
> Eventually, wallets need to stop doing linear scans of the entire block
> chain to find tx data. That worked fine when blocks were 10kb, it's still
> working OK even though we scaled through two orders of magnitude, but we can
> imagine that if we reach 10mb blocks then this whole approach will just be
> too slow.
>
> The main reason wallets are scanning the chain today (beyond lack of
> protocol support for querying the UTXO set by script), is that they want to
> show users time-ordered lists of transactions. Financial apps should show
> you payment histories, everyone knows this, and without knowing roughly when
> a tx happened and which inputs/outputs were mine, providing a useful
> rendering is hard. Even with this data the UI is pretty useless, but at
> least it's not actually missing.
>
> By combining Subspace and BIP70 we can finally replace the payments list UI
> with actual proper metadata that isn't extracted from the block chain, and
> at that point non-scanning architectures become a lot more deployable.

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Tamas Blummer
On Feb 20, 2015, at 5:18 PM, Wladimir  wrote:

> On Fri, 20 Feb 2015, Adam Back wrote:
> 
>> So I was wondering what about changing to committing a bloom filter of
>> the addresses in the block.  Its seems surprising no one thought of it
>> that way before (as it seems obvious when you hear it) but that seems

Such a bloom filter was present in the Bits of Proof block store in its last 
public version, so the idea obvious, but not new.

It did support well scanning for BIP32 addresses as the query set extends 
while progressing. 

Tamas Blummer



signature.asc
Description: Message signed with OpenPGP using GPGMail
--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] bloom filtering, privacy

2015-02-20 Thread Wladimir
Hello Adam,

On Fri, 20 Feb 2015, Adam Back wrote:

> So I was wondering what about changing to committing a bloom filter of
> the addresses in the block.  Its seems surprising no one thought of it
> that way before (as it seems obvious when you hear it) but that seems
> to address the privacy issues as the user can fetch the block bloom
> filters and then scan it in complete privacy.  (Someone appeared on
> bitcoin wizards IRC a while back and made this observation.)

I have heard this idea of inverting the bloom filter before (possibly in 
#bitcoin-wizards), and as I see it it would indeed improve the privacy. 
Apart from privacy it would also lower the burden for nodes. A block scan 
with bloom filter is effectively a cheap DoS on a node.

In addition to that it will also avoid the 'transaction withholding 
attack' that is possible with the current bloom filtering, at least if the 
filter is e.g. committed to in the block header.

The drawback would be overhead - the bloom filter per block will have a 
significant size (to avoid false positives), and the client would have to 
fetch entire blocks that have its transactions in it.

I don't think that is so bad in practice, after all the % of blocks that 
will have transactions for a given wallet will generally be low, so the 
block size is amortized in a way. Of course, if the block size would be 
increased this would become worse.

Wladimir

--
Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
from Actuate! Instantly Supercharge Your Business Reports and Dashboards
with Interactivity, Sharing, Native Excel Exports, App Integration & more
Get technology previously reserved for billion-dollar corporations, FREE
http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development