Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Olaoluwa Osuntokun via bitcoin-dev
On Thu, May 17, 2018 at 2:44 PM Jim Posen via bitcoin-dev  Monitoring inputs by scriptPubkey vs input-txid also has a massive
>> advantage for parallel filtering:  You can usually known your pubkeys
>> well in advance, but if you have to change what you're watching block
>>  N+1 for based on the txids that paid you in N you can't filter them
>> in parallel.
>>
>
> Yes, I'll grant that this is a benefit of your suggestion.
>

Yeah parallel filtering would be pretty nice. We've implemented a serial
filtering for btcwallet [1] for the use-case of rescanning after a seed
phrase import. Parallel filtering would help here, but also we don't yet
take advantage of batch querying for the filters themselves. This would
speed up the scanning by quite a bit.

I really like the filtering model though, it really simplifies the code,
and we can leverage identical logic for btcd (which has RPCs to fetch the
filters) as well.

[1]:
https://github.com/Roasbeef/btcwallet/blob/master/chain/neutrino.go#L180
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Olaoluwa Osuntokun via bitcoin-dev
Riccardo wrote:
> The BIP recall some go code for how the parameter has been selected which
> I can hardly understand and run

The code you're linking to is for generating test vectors (to allow
implementations to check the correctness of their gcs filters. The name of
the file is 'gentestvectors.go'. It produces CSV files which contain test
vectors of various testnet blocks and at various false positive rates.

> it's totally my fault but if possible I would really like more details on
> the process, like charts and explanations

When we published the BIP draft last year (wow, time flies!), we put up code
(as well as an interactive website) showing the process we used to arrive at
the current false positive rate. The aim was to minimize the bandwidth
required to download each filter plus the expected bandwidth from
downloading "large-ish" full segwit blocks. The code simulated a few wallet
types (in terms of number of addrs, etc) focusing on a "mid-sized" wallet.
One could also model the selection as a Bernoulli process where we attempt
to compute the probability that after k queries (let's say you have k
addresses) we have k "successes". A success would mean the queries item
wasn't found in the filter, while a failure is a filter match (false
positive or not). A failure in the process requires fetching the entire
block.

-- Laolu

On Fri, May 18, 2018 at 5:35 AM Riccardo Casatta via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Another parameter which heavily affects filter size is the false positive
> rate which is empirically set
> 
> to 2^-20
> The BIP recall some go code
> 
> for how the parameter has been selected which I can hardly understand and
> run, it's totally my fault but if possible I would really like more details
> on the process, like charts and explanations (for example, which is the
> number of elements to search for which the filter has been optimized for?)
>
> Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
> will shrink the total filter size by gigabytes at the cost of having to
> wastefully download just some megabytes of blocks.
>
>
> 2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
> bitcoin-dev@lists.linuxfoundation.org>:
>
>> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>>  wrote:
>> > I believe (1) could be skipped entirely - there is almost no reason why
>> > you'd not be able to filter for, eg, the set of output scripts in a
>> > transaction you know about
>>
>> I think this is convincing for the txids themselves.
>>
>> What about also making input prevouts filter based on the scriptpubkey
>> being _spent_?  Layering wise in the processing it's a bit ugly, but
>> if you validated the block you have the data needed.
>>
>> This would eliminate the multiple data type mixing entirely.
>> ___
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>
>
>
> --
> Riccardo Casatta - @RCasatta 
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Pieter Wuille via bitcoin-dev
On Fri, May 18, 2018, 19:57 Olaoluwa Osuntokun via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Greg wrote:
> > What about also making input prevouts filter based on the scriptpubkey
> being
> > _spent_?  Layering wise in the processing it's a bit ugly, but if you
> > validated the block you have the data needed.
>
> AFAICT, this would mean that in order for a new node to catch up the filter
> index (index all historical blocks), they'd either need to: build up a
> utxo-set in memory during indexing, or would require a txindex in order to
> look up the prev out's script. The first option increases the memory load
> during indexing, and the second requires nodes to have a transaction index
> (and would also add considerable I/O load). When proceeding from tip, this
> doesn't add any additional load assuming that your synchronously index the
> block as you validate it, otherwise the utxo set will already have been
> updated (the spent scripts removed).
>

I was wondering about that too, but it turns out that isn't necessary. At
least in Bitcoin Core, all the data needed for such a filter is in the
block + undo files (the latter contain the scriptPubKeys of the outputs
being spent).

I have a script running to compare the filter sizes assuming the regular
> filter switches to include the prev out's script rather than the prev
> outpoint itself. The script hasn't yet finished (due to the increased I/O
> load to look up the scripts when indexing), but I'll report back once it's
> finished.
>

That's very helpful, thank you.

Cheers,

-- 
Pieter
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Olaoluwa Osuntokun via bitcoin-dev
Greg wrote:
> What about also making input prevouts filter based on the scriptpubkey
being
> _spent_?  Layering wise in the processing it's a bit ugly, but if you
> validated the block you have the data needed.

AFAICT, this would mean that in order for a new node to catch up the filter
index (index all historical blocks), they'd either need to: build up a
utxo-set in memory during indexing, or would require a txindex in order to
look up the prev out's script. The first option increases the memory load
during indexing, and the second requires nodes to have a transaction index
(and would also add considerable I/O load). When proceeding from tip, this
doesn't add any additional load assuming that your synchronously index the
block as you validate it, otherwise the utxo set will already have been
updated (the spent scripts removed).

I have a script running to compare the filter sizes assuming the regular
filter switches to include the prev out's script rather than the prev
outpoint itself. The script hasn't yet finished (due to the increased I/O
load to look up the scripts when indexing), but I'll report back once it's
finished.

-- Laolu


On Thu, May 17, 2018 at 9:37 AM Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>  wrote:
> > I believe (1) could be skipped entirely - there is almost no reason why
> > you'd not be able to filter for, eg, the set of output scripts in a
> > transaction you know about
>
> I think this is convincing for the txids themselves.
>
> What about also making input prevouts filter based on the scriptpubkey
> being _spent_?  Layering wise in the processing it's a bit ugly, but
> if you validated the block you have the data needed.
>
> This would eliminate the multiple data type mixing entirely.
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Olaoluwa Osuntokun via bitcoin-dev
Matt wrote:
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about

Depending on the use-case, the txid is more precise than searching for the
output script as it doesn't need to deal with duplicated output scripts. To
my knowledge, lnd is the only major project that currently utilizes BIP
157+158. At this point, we use the txid in the regular filter for
confirmations (channel confirmed, sweep tx confirmed, cltv confirmed, etc).
Switching to use output scripts instead wouldn't be _too_ invasive w.r.t
changes required in the codebase, only the need to deal with output script
duplication could be annoying.

> (2) and (3) may want to be split out - many wallets may wish to just find
> transactions paying to them, as transactions spending from their outputs
> should generally be things they've created.

FWIW, in the "rescan after importing by seed phrase" both are needed in
order to ensure the wallet ends up with the proper output set after the
scan. In lnd we actively use both (2) to detect deposits to the internal
wallet, and (3) to be notified when our channel outputs are spent on-chain
(and also generally when any of our special scripts are spent).

> In general, I'm concerned about the size of the filters making existing
SPV
> clients less willing to adopt BIP 158 instead of the existing bloom filter
> garbage and would like to see a further exploration of ways to split out
> filters to make them less bandwidth intensive.

Agreed that the current filter size may prevent adoption amongst wallets.
However, the other factor that will likely prevent adoption amongst current
BIP-37 mobile wallets is the lack of support for notifying _unconfirmed_
transactions. When we drafted up the protocol last year and asked around,
this was one of the major points of contention amongst existing mobile
wallets that utilize BIP 37.

On the other hand, the two "popular" BIP 37 wallets I'm aware of
(Breadwallet, and Andreas Schildbach's Bitcoin Wallet) have lagged massively
behind the existing set of wallet related protocol upgrades. For example,
neither of them have released versions of their applications that take
advantage of segwit in any manner. Breadwallet has more or less "pivoted"
(they did an ICO and have a token) and instead is prioritizing things like
adding random ICO tokens over catching up with the latest protocol updates.
Based on this behavior, even if the filter sizes were even _more_ bandwidth
efficient that BIP 37, I don't think they'd adopt the protocol.

> Some further ideas we should probably play with before finalizing moving
> forward is providing filters for certain script templates, eg being able
to
> only get outputs that are segwit version X or other similar ideas.

Why should this block active deployment of BIP 157+158 as is now? As
defined, the protocol already allows future updates to add additional filter
types. Before the filters are committed, each filter type requires a new
filter header. We could move to a single filter header that commits to the
hashes of _all_ filters, but that would mean that a node couldn't serve the
headers unless they had all currently defined features, defeating the
optionality offered.

Additionally, more filters entails more disk utilization for nodes serving
these filters. Nodes have the option to instead create the filters at "query
time", but then this counters the benefit of simply slinging the filters
from disk (or a memory map or w/e). IMO, it's a desirable feature that
serving light clients no longer requires active CPU+I/O and instead just
passive I/O (nodes could even write the filters to disk in protocol msg
format).

To get a feel for the current filter sizes, a txid-only filter size, and a
regular filter w/o txid's, I ran some stats on the last 10k blocks:

regular size:217107653  bytes
regular avg: 21710.7653 bytes
regular median:  22332  bytes
regular max: 61901  bytes

txid-only size:34518463  bytes
txid-only avg: 3451.8463 bytes
txid-only median:  3258  bytes
txid-only max: 10193 bytes

reg-no-txid size:182663961  bytes
reg-no-txid avg: 18266.3961 bytes
reg-no-txid median:  19198  bytes
reg-no-txid max: 60172  bytes

So the median regular filter size over the past 10k blocks is 20KB. If we
extract the txid from the regular filter and add a txid-only filter, the
median size of that is 3.2KB. Finally, the median size of a modified regular
filter (no txid) is 19KB.

-- Laolu


On Thu, May 17, 2018 at 8:33 AM Matt Corallo via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> BIP 158 currently includes the following in the "basic" filter: 1)
> txids, 2) output scripts, 3) input prevouts.
>
> I believe (1) could be skipped entirely - there is almost no reason why
> you'd not be able to filter for, eg, the set of output scripts in a
> transaction you know about and (2) 

Re: [bitcoin-dev] UHS: Full-node security without maintaining a full UTXO set

2018-05-18 Thread Alex Mizrahi via bitcoin-dev
You should read this:
https://bitcointalk.org/index.php?topic=153662.10

On Wed, May 16, 2018 at 7:36 PM, Cory Fields via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Tl;dr: Rather than storing all unspent outputs, store their hashes.
> Untrusted
> peers can supply the full outputs when needed, with very little overhead.
> Any attempt to spoof those outputs would be apparent, as their hashes
> would not
> be present in the hash set. There are many advantages to this, most
> apparently
> in disk and memory savings, as well as a validation speedup. The primary
> disadvantage is a small increase in network traffic. I believe that the
> advantages outweigh the disadvantages.
>
> --
>
> Bitcoin’s unspent transaction output set (usually referred to as “The UTXO
> set”) has two primary roles: providing proof that previous outputs exist
> to be
> spent, and providing the actual previous output data for verification when
> new
> transactions attempts to spend them. These roles are not usually discussed
> independently, but as Bram Cohen's TXO Bitfield [0] idea hints, there are
> compelling reasons to consider them this way.
>
> To see why, consider running a node with the following changes:
>
> - For each new output, gather all extra data that will be needed for
>   verification when spending it later as an input: the amount,
> scriptPubKey,
>   creation height, coinbaseness, and output type (p2pkh, p2sh, p2wpkh,
> etc.).
>   Call this the Dereferenced Prevout data.
> - Create a hash from the concatenation of the new outpoint and the
> dereferenced
>   prevout data. Call this a Unspent Transaction Output Hash.
> - Rather than storing the full dereferenced prevout entries in a UTXO set
> as is
>   currently done, instead store their hashes to an Unspent Transaction
> Output
>   Hash Set, or UHS.
> - When relaying a transaction, append the dereferenced prevout for each
> input.
>
> Now when a transaction is received, it contains everything needed for
> verification, including the input amount, height, and coinbaseness, which
> would
> have otherwise required a lookup the UTXO set.
>
> To verify an input's unspentness, again create a hash from the
> concatenation of
> the referenced outpoint and the provided dereferenced prevout, and check
> for
> its presence in the UHS. The hash will only be present if a hash of the
> exact
> same data was previously added to (and not since removed from) the UHS. As
> such, we are protected from a peer attempting to lie about the dereferenced
> prevout data.
>
> ### Some benefits of the UHS model
>
> - Requires no consensus changes, purely a p2p/implementation change.
>
> - UHS is substantially smaller than a full UTXO set (just over half for the
>   main chain, see below). In-memory caching can be much more effective as a
>   result.
>
> - A block’s transactions can be fully verified before doing a potentially
>   expensive database lookup for the previous output data. The UHS can be
>   queried afterwards (or in parallel) to verify previous output inclusion.
>
> - Entire blocks could potentially be verified out-of-order because all
> input
>   data is provided; only the inclusion checks have to be in-order.
> Admittedly
>   this is likely too complicated to be realistic.
>
> - pay-to-pubkey outputs are less burdensome on full nodes, since they use
> no
>   more space on-disk than pay-to-pubkey-hash or pay-to-script-hash.
> Taproot and
>   Graftroot outputs may share the same benefits.
>
> - The burden of holding UTXO data is technically shifted from the
> verifiers to
>   the spender. In reality, full nodes will likely always have a copy as
> well,
>   but conceptually it's a slight improvement to the incentive model.
>
> - Block data from peers can also be used to roll backwards during a reorg.
> This
>   potentially enables an even more aggressive pruning mode.
>
> - UTXO storage size grows exactly linearly with UTXO count, as opposed to
>   growing linearly with UTXO data size. This may be relevant when
> considering
>   new larger output types which would otherwise cause the UTXO Set size to
>   increase more quickly.
>
> - The UHS is a simple set, no need for a key-value database. LevelDB could
>   potentially be dropped as a dependency in some distant future.
>
> - Potentially integrates nicely with Pieter Wuille's Rolling UTXO set
> hashes
>   [1]. Unspent Transaction Output Hashes would simply be mapped to points
> on a
>   curve before adding them to the set.
>
> - With the help of inclusion proofs and rolling hashes, libbitcoinconsensus
>   could potentially safely verify entire blocks. The size of the required
>   proofs would be largely irrelevant as they would be consumed locally.
>
> - Others?
>
> ### TxIn De-duplication
>
> Setting aside the potential benefits, the obvious drawback of using a UHS
> is a
> significant network traffic increase. Fortunately, some properties of
> transactions can be exploited to offset most of the difference.
>
> For 

Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Riccardo Casatta via bitcoin-dev
Another parameter which heavily affects filter size is the false positive
rate which is empirically set

to 2^-20
The BIP recall some go code

for how the parameter has been selected which I can hardly understand and
run, it's totally my fault but if possible I would really like more details
on the process, like charts and explanations (for example, which is the
number of elements to search for which the filter has been optimized for?)

Instinctively I feel 2^-20 is super low and choosing a lot higher alpha
will shrink the total filter size by gigabytes at the cost of having to
wastefully download just some megabytes of blocks.


2018-05-17 18:36 GMT+02:00 Gregory Maxwell via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org>:

> On Thu, May 17, 2018 at 3:25 PM, Matt Corallo via bitcoin-dev
>  wrote:
> > I believe (1) could be skipped entirely - there is almost no reason why
> > you'd not be able to filter for, eg, the set of output scripts in a
> > transaction you know about
>
> I think this is convincing for the txids themselves.
>
> What about also making input prevouts filter based on the scriptpubkey
> being _spent_?  Layering wise in the processing it's a bit ugly, but
> if you validated the block you have the data needed.
>
> This would eliminate the multiple data type mixing entirely.
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>



-- 
Riccardo Casatta - @RCasatta 
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] BIP 158 Flexibility and Filter Size

2018-05-18 Thread Karl-Johan Alm via bitcoin-dev
On Fri, May 18, 2018 at 12:25 AM, Matt Corallo via bitcoin-dev
 wrote:
> In general, I'm concerned about the size of the filters making existing
> SPV clients less willing to adopt BIP 158 instead of the existing bloom
> filter garbage and would like to see a further exploration of ways to
> split out filters to make them less bandwidth intensive. Some further
> ideas we should probably play with before finalizing moving forward is
> providing filters for certain script templates, eg being able to only
> get outputs that are segwit version X or other similar ideas.

There is also the idea of multi-block filters. The idea is that light
clients would download a pair of filters for blocks X..X+255 and
X+256..X+511, check if they have any matches and then grab pairs for
any that matched, e.g. X..X+127 & X+128..X+255 if left matched, and
iterate down until it ran out of hits-in-a-row or it got down to
single-block level.

This has an added benefit where you can accept a slightly higher false
positive rate for bigger ranges, because the probability of a specific
entry having a false positive in each filter is (empirically speaking)
independent. I.e. with a FP probability of 1% in the 256 range block
and a FP probability of 0.1% in the 128 range block would mean the
probability is actually 0.001%.

Wrote about this here: https://bc-2.jp/bfd-profile.pdf (but the filter
type is different in my experiments)
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev