Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Rusty Russell via bitcoin-dev
Pieter Wuille via bitcoin-dev  writes:
> On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
>> Hi all,
>> 
>> The following is a BIP-formatted design spec for compact block relay
>> designed to limit on wire bytes during block relay. You can find the
>> latest version of this document at
>> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.
>
> Hi Matt,
>
> thank you for working on this!

Indeed!  Sorry for the delayed feedback.

>> |shortids||List of uint64_ts||8*shortids_length bytes||Little
>> Endian||The short transaction IDs calculated from the transactions which
>> were not provided explicitly in prefilledtxn
>
> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).

I did this for IBLT testing.

I used variable-length bit encodings, and used the shortest encoding
which is unique to you (including mempool).  It's a little more work,
but for an average node transmitting a block with 1300 txs and another
~3000 in the mempool, you expect about 12 bits per transaction.  IOW,
about 1/5 of your current size.  Critically, we might be able to fit in
two or three TCP packets.

The wire encoding of all those bit arrays was:
  [varint-min-numbits] - Shortest bit array length
  [varint-array-size]  - Number of bit arrays.
  [varint-num] - Number of entries in array N (x varint-array-size)
  [packed-bit-arrays...]

  Last byte was padded with zeros.
  See: 
https://github.com/rustyrussell/bitcoin-iblt/blob/master/wire_encode.cpp#L12

I would also avoid the nonce to save recalculating for each node, and
instead define an id as:

[<64-bit-short-id>][txid]

Since you only ever send as many bits as needed to distinguish, this only
makes a difference if there actually are collisions.

As Peter R points out, we could later enhance receiver to brute force
collisions (you could speed that by sending a XOR of all the txids, but
really if there are more than a few collisions, give up).

And a prototype could just always send 64-bit ids to start.

Cheers,
Rusty.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Peter R via bitcoin-dev
[9 May 16 @ 6:40 PDT]

For those interested in the hash collision attack discussion, it turns out 
there is a faster way to scan your set to find the collision:  you’d keep a 
sorted list of the hashes for each TX you generate and then use binary search 
to check that list for a collision for each new TX you randomly generate. 
Performing these operations can probably be reduced to N lg N complexity, which 
is doable for N ~2^32.   In other words, I now agree that the attack is 
feasible.  

Cheers,
Peter

hat tip to egs

> On May 9, 2016, at 4:37 PM, Peter R via bitcoin-dev 
>  wrote:
> 
> Greg Maxwell wrote:
> 
>> What are you talking about? You seem profoundly confused here...
>> 
>> I obtain some txouts. I write a transaction spending them in malleable
>> form (e.g. sighash single and an op_return output).. then grind the
>> extra output to produce different hashes.  After doing this 2^32 times
>> I am likely to find two which share the same initial 8 bytes of txid.
> 
> [9 May 16 @ 4:30 PDT]
> 
> I’m trying to understand the collision attack that you're explaining to Tom 
> Zander.  
> 
> Mathematica is telling me that if I generated 2^32 random transactions, that 
> the chances that the initial 64-bits on one of the pairs of transactions is 
> about 40%.  So I am following you up to this point.  Indeed, there is a good 
> chance that a pair of transactions from a set of 2^32 will have a collision 
> in the first 64 bits.  
> 
> But how do you actually find that pair from within your large set?  The only 
> way I can think of is to check if the first 64-bits is equal for every 
> possible pair until I find it.  How many possible pairs are there?  
> 
> It is a standard result that there are 
> 
>m! / [n! (m-n)!] 
> 
> ways of picking n numbers from a set of m numbers, so there are
> 
>(2^32)! / [2! (2^32 - 2)!] ~ 2^63
> 
> possible pairs in a set of 2^32 transactions.  So wouldn’t you have to 
> perform approximately 2^63 comparisons in order to identify which pair of 
> transactions are the two that collide?
> 
> Perhaps I made an error or there is a faster way to scan your set to find the 
> collision.  Happy to be corrected…
> 
> Best regards,
> Peter
> 
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Gregory Maxwell via bitcoin-dev
On Mon, May 9, 2016 at 11:37 PM, Peter R  wrote:
> It is a standard result that there are
> m! / [n! (m-n)!]
> ways of picking n numbers from a set of m numbers, so there are
>
> (2^32)! / [2! (2^32 - 2)!] ~ 2^63
> possible pairs in a set of 2^32 transactions.  So wouldn’t you have to 
> perform approximately 2^63 comparisons in order to identify which pair of 
> transactions are the two that collide?
>
> Perhaps I made an error or there is a faster way to scan your set to find the 
> collision.  Happy to be corrected…

$ echo -n Perhaps. f2736d91 |sha256sum
359dfa6d4c2eb2ac81535392d68af4b5e1cb6d9c6321e8f111d3244329b6a4d8
$ echo -n Perhaps. 11ac0388 |sha256sum
359dfa6d4c2eb2ac44d54d0ceeb2212500cb34617b9360695432f6c0fde9b006

Try search term "collision", or there may be an undergrad Data
structures and algorithms coarse online-- you want something covering
"cycle finding".

(Though even ignoring efficient cycle finding, your factorial argument
doesn't hold... you can simply sort the data... Search term
"quicksort" for a relevant algorithm).
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Peter R via bitcoin-dev
Greg Maxwell wrote:

> What are you talking about? You seem profoundly confused here...
> 
> I obtain some txouts. I write a transaction spending them in malleable
> form (e.g. sighash single and an op_return output).. then grind the
> extra output to produce different hashes.  After doing this 2^32 times
> I am likely to find two which share the same initial 8 bytes of txid.

[9 May 16 @ 4:30 PDT]

I’m trying to understand the collision attack that you're explaining to Tom 
Zander.  

Mathematica is telling me that if I generated 2^32 random transactions, that 
the chances that the initial 64-bits on one of the pairs of transactions is 
about 40%.  So I am following you up to this point.  Indeed, there is a good 
chance that a pair of transactions from a set of 2^32 will have a collision in 
the first 64 bits.  

But how do you actually find that pair from within your large set?  The only 
way I can think of is to check if the first 64-bits is equal for every possible 
pair until I find it.  How many possible pairs are there?  

It is a standard result that there are 

m! / [n! (m-n)!] 

ways of picking n numbers from a set of m numbers, so there are

(2^32)! / [2! (2^32 - 2)!] ~ 2^63

possible pairs in a set of 2^32 transactions.  So wouldn’t you have to perform 
approximately 2^63 comparisons in order to identify which pair of transactions 
are the two that collide?

Perhaps I made an error or there is a faster way to scan your set to find the 
collision.  Happy to be corrected…

Best regards,
Peter

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Peter R via bitcoin-dev
Hi Pieter,

> I tried to derive what length of short ids is actually necessary (some
> write-up is on
> https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
> incomplete).
> 
> For any reasonable numbers I can come up with (in a very wide range),
> the number of bits needed is very well approximated by:
> 
>  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
> acceptable_per_block_failure_rate)
> 
> For example, with 2 mempool transactions, 2500 transactions in a
> block, 95% hitrate, and a chance of 1 in 1 blocks to fail to
> reconstruct, needed_bits = log2(2 * 2500 * (1 - 0.95) / 0.0001) =
> 34.54, or 5 byte txids would suffice.
> 
> Note that 1 in 1 failures may sound like a lot, but this is for each
> individual connection, and since every transmission uses separately
> salted identifiers, occasional failures should not affect global
> propagation. Given that transmission failures due to timeouts, network
> connectivity, ... already occur much more frequently than once every few
> gigabytes (what 1 blocks corresponds to), that's probably already
> more than enough.
> 
> In short: I believe 5 or 6 byte txids should be enough, but perhaps it
> makes sense to allow the sender to choose (so he can weigh trying
> multiple nonces against increasing the short txid length).

[9 May 16 @ 11am PDT]  

We worked on this with respect to “Xthin" for Bitcoin Unlimited, and came to a 
similar conclusion.  

But we (I think it was theZerg) also noticed another trick: if the node 
receiving the thin blocks has a small number of collisions with transactions in 
its mempool (e.g., 1 or 2), then it can test each possible block against the 
Merkle root in the block header to determine the correct one.  Using this 
technique, it should be possible to further reduce the number of bytes used for 
the txids.  That being said, even thin blocks built from 64-bit short IDs 
represent a tremendous savings compared to standard block propagation.  So we 
(Bitcoin Unlimited) decided not to pursue this optimization any further at that 
time.

***

It’s also interesting to ask what the information-theoretic minimum amount of 
information necessary for a node to re-construct a block is. The way I’m 
thinking about this currently[1] is that the node needs all of the transactions 
in the block that were not initially part of its mempool, plus enough 
information to select and ordered subset from that mempool that represents the 
block.  If m is the number of transactions in mempool and n is the number of 
transactions in the block, then the number of possible subsets (C') is given by 
the binomial coefficient:

  C' =  m! / [n! (m - n)!]

Since there are n! possible orderings for each subset, the total number of 
possible blocks (C) of size n from a mempool of size m is

  C = n! C’ = m! / (m-n)!

Assuming that all possible blocks are equally likely, the Shannon entropy (the 
information that must be communicated) is the base-2 logarithm of the number of 
possible blocks.  After making some approximations, this works out very close to

   minimum information ~= n * log2(m),

which for your case of 20,000 transactions in mempool (m = 20,000) and a 
2500-transaction block (n = 2500), yields

   minimum information = 2500 * log2(20,000) ~ 2500 * 15 bits.

In other words, a lower bound on the information required is about 2 bytes per 
transactions for every transaction in the block that the node is already aware 
of, as well as all the missing transactions in full. 

Of course, this assumes an unlimited number of round trips, and it is probably 
complicated by other factors that I haven’t considered (queue the “spherical 
cow” jokes :), but I thought it was interesting that a technique like Xthin or 
compact blocks is already pretty close to this limit.  

Cheers,
Peter 

[1] There are still some things that I can’t wrap my mind around that I’d love 
to discuss with another math geek :)


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Pieter Wuille via bitcoin-dev
On 05/03/2016 12:13 AM, lf-lists at mattcorallo.com (Matt Corallo) wrote:
> Hi all,
> 
> The following is a BIP-formatted design spec for compact block relay
> designed to limit on wire bytes during block relay. You can find the
> latest version of this document at
> https://github.com/TheBlueMatt/bips/blob/master/bip-TODO.mediawiki.

Hi Matt,

thank you for working on this!

> ===New data structures===
> Several new data structures are added to the P2P network to relay
> compact blocks: PrefilledTransaction, HeaderAndShortIDs,
> BlockTransactionsRequest, and BlockTransactions. Additionally, we
> introduce a new variable-length integer encoding for use in these data
> structures.
> 
> For the purposes of this section, CompactSize refers to the
> variable-length integer encoding used across the existing P2P protocol
> to encode array lengths, among other things, in 1, 3, 5 or 9 bytes.

This is a not, but I think it's a bit strange to have two separate
variable length integers in the same specification. I understand is one
is already the default for variable-length integers currently, and there
are reasons to use the other one for efficiency reasons in some places,
but perhaps we should aim to get everything using the latter?

> New VarInt
> Variable-length integers: bytes are a MSB base-128 encoding of the number.
> The high bit in each byte signifies whether another digit follows. To make
> sure the encoding is one-to-one, one is subtracted from all but the last
> digit.

Maybe it's worth mentioning that it is based on ASN.1 BER's compressed
integer format (see
https://www.itu.int/ITU-T/studygroups/com17/languages/X.690-0207.pdf
section 8.1.3.5), though with a small modification to make every integer
have a single unique encoding.

> HeaderAndShortIDs
> A HeaderAndShortIDs structure is used to relay a block header, the short
> transactions IDs used for matching already-available transactions, and a
> select few transactions which we expect a peer may be missing.
> 
> |shortids||List of uint64_ts||8*shortids_length bytes||Little
> Endian||The short transaction IDs calculated from the transactions which
> were not provided explicitly in prefilledtxn

I tried to derive what length of short ids is actually necessary (some
write-up is on
https://gist.github.com/sipa/b2eb2e486156b5509ac711edd16153ed but it's
incomplete).

For any reasonable numbers I can come up with (in a very wide range),
the number of bits needed is very well approximated by:

  log2(#receiver_mempool_txn * #block_txn_not_in_receiver_mempool /
acceptable_per_block_failure_rate)

For example, with 2 mempool transactions, 2500 transactions in a
block, 95% hitrate, and a chance of 1 in 1 blocks to fail to
reconstruct, needed_bits = log2(2 * 2500 * (1 - 0.95) / 0.0001) =
34.54, or 5 byte txids would suffice.

Note that 1 in 1 failures may sound like a lot, but this is for each
individual connection, and since every transmission uses separately
salted identifiers, occasional failures should not affect global
propagation. Given that transmission failures due to timeouts, network
connectivity, ... already occur much more frequently than once every few
gigabytes (what 1 blocks corresponds to), that's probably already
more than enough.

In short: I believe 5 or 6 byte txids should be enough, but perhaps it
makes sense to allow the sender to choose (so he can weigh trying
multiple nonces against increasing the short txid length).

> Short transaction IDs
> Short transaction IDs are used to represent a transaction without
> sending a full 256-bit hash. They are calculated by:
> # single-SHA256 hashing the block header with the nonce appended (in
> little-endian)
> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> each corresponding 8-byte chunk of the hash from the previous step
> # Adding each of the XORed 8-byte chunks together (in little-endian)
> iteratively to find the short transaction ID

An alternative would be using SipHash-1-3 (a form of SipHash with
reduced iteration counts; the default is SipHash-2-4). SipHash was
designed as a Message Authentication Code, where the security
requirements are much stronger than in our case (in particular, we don't
care about observers being able to finding the key, as the key is just
public knowledge here). One of the designers of SipHash has commented
that SipHash-1-3 for collision resistance in hash tables may be enough:
https://github.com/rust-lang/rust/issues/29754#issuecomment-156073946

Using SipHash-1-3 on modern hardware would take ~32 CPU cycles per txid.

> ===Implementation Notes===

There are a few more heuristics that MAY be used to improve performance:

* Receivers should treat short txids in blocks that match multiple
mempool transactions as non-matches, and request the transactions. This
significantly reduces the failure to reconstruct.

* When constructing a compact block to send, the sender can verify it
against its own mempoo

Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Bryan Bishop via bitcoin-dev
On Mon, May 9, 2016 at 8:57 AM, Tom via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> The moderators failed to catch his aggressive tone while moderating my post
> (see archives) for being too aggressive.
>

IIRC you were previously informed by moderators (on the same reddit thread
to which you refer) that it would seem you had canceled your email from the
moderation queue, contrary to your retelling above. This is now reaching
far into off-topic and further posts on this subject should be sent to
bitcoin-disc...@lists.linuxfoundation.org or
bitcoin-dev-own...@lists.linuxfoundation.org instead of the bitcoin-dev
mailing list.

- Bryan
http://heybryan.org/
1 512 203 0507
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Tom via bitcoin-dev
On Monday 09 May 2016 13:40:55 Peter Todd wrote:
> >> [It's a little disconcerting that you appear to be maintaining a fork
> >> and are unaware of this.]
> >
> >ehm...
> 
> Can you please explain why you moved the above part of gmaxwell's reply to
> here,

A personal attack had no place in the technical discussion, I moved it out.



Initially I asked him to please avoid personal attacks, but I thought better 
of it and edited my reply to just "ehm...".


The moderators failed to catch his aggressive tone while moderating my post 
(see archives) for being too aggressive.

I'm sure this message will also not be allowed through. I would not even blame 
the moderators since this, and Peters, messages were both off-topic.

I thank you for todays talks, it makes me certain of the thing I said this 
weekend on Reddit that this list is not a suitable place for all the different 
stakeholders to talk on a level playing field.

If any of you agree, please urge the approach that we replace the entire 
moderation team with a new one. This will be the least painful solution for 
everyone in the ecosystem.

Thanks again.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Peter Todd via bitcoin-dev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA512



On 9 May 2016 07:32:59 GMT-04:00, Tom via bitcoin-dev 
 wrote:
>On Monday 09 May 2016 10:43:02 Gregory Maxwell wrote:
>> Service bits are not generally a good mechanism for negating optional
>> peer-local parameters.
>
>Service bits are exactly the right solution to indicate additional p2p
>feature-support.
>
>
>> [It's a little disconcerting that you appear to be maintaining a fork
>> and are unaware of this.]
>
>ehm...

Can you please explain why you moved the above part of gmaxwell's reply to 
here, when previously it was right after:

>> > Wait, you didn't steal the variable length encoding from an
>existing
>> > standard and you programmed a new one?
>>
>> This is one of the two variable length encodings used for years in
>> Bitcoin Core. This is just the first time it's shown up in a BIP.

here?

Editing gmaxwells reply like that changes the tone of the message significantly.
-BEGIN PGP SIGNATURE-

iQE9BAEBCgAnIBxQZXRlciBUb2RkIDxwZXRlQHBldGVydG9kZC5vcmc+BQJXMJNd
AAoJEGOZARBE6K+yz4MH/0fQNM8SQdT7a1zljOSJW17ZLs6cEwVXZc/fOtvrNnOa
CkzXqylPrdT+BWBhPOwDlrzRa/2w5JAJDHRFoR8ZEidasxNDuSfhT3PwulBxmBqs
qoXhg0ujzRv9736vKENzMI4y2HbfHmqOrlLSZrlk8zqBGmlp1fMqVjFriQN66dnV
6cYFVyMVz0x/e4mXw8FigSQxkDAJ6gnfSInecQuZLT7H4g2xomIs6kQbqULHAylS
sFaK4uXy7Vr/sgBbitEQPDHGwywRoA+7EhExb2XpvL6hdyQbL1G1i6SPxGkwKg7R
MAuBPku/FraGo+qfcaA8R7eYKmyP4qZfZly317Aoo6Q=
=NtSN
-END PGP SIGNATURE-

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Tom via bitcoin-dev
On Monday 09 May 2016 10:43:02 Gregory Maxwell wrote:
> On Mon, May 9, 2016 at 9:35 AM, Tom Zander via bitcoin-dev
> 
>  wrote:
> > You misunderstand the networking effects.
> > The fact that your node is required to choose which one to set the
> > announce
> > bit on implies that it needs to predict which node will have the best data
> > in the future.
> 
> Not required. It may. 

It is required, in the reference of wanting to actually use compact block 
relay.


> Testing on actual nodes in the actual network (not a "lab") shows

Apologies, I thought that the term was wider known.  "Laboratory situations" 
is used where I am from as the opposite of real-world messy and unpredictable 
situations.

So, your measurements may be true, but are not useful to decide how well it 
behaves under less optimal situations. aka "the real world".

> This also _increases_ robustness. Right now a single peer failing at
> the wrong time will delay blocks with a long time out.

If your peers that were supposed to send you a compact block fail, then you'll 
end up in exactly that same situation again.  Only with various timeouts in 
between before you get your block making it a magnitude slower.

In networking this is solved by reacting instead of predicting. The network is 
not stable. Your protocol design assumes it to be.


> > Another problem with your solution is that nodes send a much larger amount
> > of unsolicited data to peers in the form of the thin-block compared to
> > the normal inv or header-first data.
> 
> "High bandwidth" mode 

Another place where I may have explained better.
This is not about the difference about the two modes of your design.
This is about the design as a whole. As compared to current.


> > Am I to understand that you choose the solution based on the fact that
> > service bits are too expensive to extend? (if not, please respond to my
> > previous question actually answering the suggestion)
> > 
> > That sounds like a rather bad way of doing design. Maybe you can add a
> > second service bits field of message instead and then do the compact
> > blocks correctly.
> Service bits are not generally a good mechanism for negating optional
> peer-local parameters.

Service bits are exactly the right solution to indicate additional p2p 
feature-support.


> [It's a little disconcerting that you appear to be maintaining a fork
> and are unaware of this.]

ehm...


> > Wait, you didn't steal the variable length encoding from an existing
> > standard and you programmed a new one?
> 
> This is one of the two variable length encodings used for years in
> Bitcoin Core. This is just the first time it's shown up in a BIP.
>
> > Look at UTF-8 on wikipedia, you may have "invented" the same encoding that
> > IBM published in 1992.
> 
> The similarity with UTF-8 is that both are variable length and some
> control information is in the high bits. The similarity ends there.

That's all fine and well, it doesn't at any point take away from my point that 
any specification should NOT invent something new that has for decades had a 
great specification already.

If you make a spec to be used by all nodes, on the wire, don't base it on your 
proprietary implementation. Please.


> > Just the first (highest) 8 bytes of a sha256 hash.
> > 
> > The amount of collisions will not be less if you start xoring the rest.
> > The whole reason for doing this extra work is also irrelevant as a spam
> > protection.
> 
> Then you expose it to a trivial collision attack:  To find two 64 bit
> hashes that collide I need perform only roughly 2^32 computation. Then
> I can send them to the network.

No, you still need to have done a POW.

Next to that, your scheme is 2^32 computations *and* some XORs. The XORs are 
percentage wise a rounding error on the total time. So your argument also 
destroys your own addition.

> This issue is eliminated by salting the hash. 

The issue is better eliminated by not allowing nodes to send uninvited large 
messages.

I don't think we're getting anywhere.

I'm not sold on your design and I explained why. I tried explaining in this 
email some misconceptions that may have appeared after my initial emails. I 
hope things are more clear.


___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Fwd: Compact Block Relay BIP

2016-05-09 Thread Gregory Maxwell via bitcoin-dev
On Mon, May 9, 2016 at 11:32 AM, Tom  wrote:
> On Monday 09 May 2016 10:43:02 Gregory Maxwell wrote:
>> On Mon, May 9, 2016 at 9:35 AM, Tom Zander via bitcoin-dev
>>  wrote:
>> > You misunderstand the networking effects.
>> > The fact that your node is required to choose which one to set the
>> > announce
>> > bit on implies that it needs to predict which node will have the best data
>> > in the future.
>>
>> Not required. It may.
>
> It is required, in the reference of wanting to actually use compact block
> relay.

I cannot parse this sentence.

A node implementing this does not have to ask peers to send blocks
without further solicitation.

If they don't, their minimum transfer time increases to the current
1.5 RTT (but sending massively less data).

> Apologies, I thought that the term was wider known.  "Laboratory situations"
> is used where I am from as the opposite of real-world messy and unpredictable
> situations.
>
> So, your measurements may be true, but are not useful to decide how well it
> behaves under less optimal situations. aka "the real world".

My measurements were made in the real world, on a collection of nodes
around the network which were not setup for this purpose and are
running standard configurations, over many weeks of logs.

This doesn't guarantee that they're representative of everything-- but
they don't need to be.

>> This also _increases_ robustness. Right now a single peer failing at
>> the wrong time will delay blocks with a long time out.
>
> If your peers that were supposed to send you a compact block fail, then you'll
> end up in exactly that same situation again.  Only with various timeouts in
> between before you get your block making it a magnitude slower.

That is incorrect.

If a header shows up and a compact block has not shown up, a compact
block will be requested.

If compactblock shows up reconstruction will be attempted.

If any of the requested compact blocks show up (the three in advance,
if high bandwidth mode is used, or a requested one, if there was one)
then reconstruction proceeds without delay.

The addition of the unsolicited input causes no additional timeouts or
delays (ignoring bandwidth usage). It does use some more bandwidth
than not having it, but still massively less than the status quo.

>> > Another problem with your solution is that nodes send a much larger amount
>> > of unsolicited data to peers in the form of the thin-block compared to
>> > the normal inv or header-first data.
>>
>> "High bandwidth" mode
>
> Another place where I may have explained better.
> This is not about the difference about the two modes of your design.
> This is about the design as a whole. As compared to current.

It is massively more efficient than the current protocol, even under
fairly poor conditions. In the absolute worst possible case (miner
sends a block of completely unexpected transactions, and three peers
send compact blocks, it adds about 6% overhead)

> Service bits are exactly the right solution to indicate additional p2p
> feature-support.

With this kind of unsubstantiated axiomatic assertion, I don't think
further discussion with you is likely to be productive-- at least I
gave a reason.

> That's all fine and well, it doesn't at any point take away from my point that
> any specification should NOT invent something new that has for decades had a
> great specification already.

UTF-8 would be a poor fit here for the reasons I explained and others
less significant ones (including the additional error cases that must
be handled resulting from the inefficient encoding; -- poor handing of
invalid UTF-8 have even resulted in security issues in some
applications).

I am a bit baffled that you'd suggest using UTF-8 as a general compact
integer encoding in a binary protocol in the first place.

>> > Just the first (highest) 8 bytes of a sha256 hash.
>> >
>> > The amount of collisions will not be less if you start xoring the rest.
>> > The whole reason for doing this extra work is also irrelevant as a spam
>> > protection.
>>
>> Then you expose it to a trivial collision attack:  To find two 64 bit
>> hashes that collide I need perform only roughly 2^32 computation. Then
>> I can send them to the network.
>
> No, you still need to have done a POW.
>
> Next to that, your scheme is 2^32 computations *and* some XORs. The XORs are
> percentage wise a rounding error on the total time. So your argument also
> destroys your own addition.
>
>> This issue is eliminated by salting the hash.
>
> The issue is better eliminated by not allowing nodes to send uninvited large
> messages.

What are you talking about? You seem profoundly confused here. There
is no proof of work involved anywhere.

I obtain some txouts. I write a transaction spending them in malleable
form (e.g. sighash single and an op_return output).. then grind the
extra output to produce different hashes.  After doing this 2^32 times
I am likely to find two which share the same initial 8 bytes of txid.

I

Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Gregory Maxwell via bitcoin-dev
On Mon, May 9, 2016 at 9:35 AM, Tom Zander via bitcoin-dev
 wrote:
> You misunderstand the networking effects.
> The fact that your node is required to choose which one to set the announce
> bit on implies that it needs to predict which node will have the best data in
> the future.

Not required. It may. If it chooses fortunately, latency is reduced--
to 0.5 RTT in many cases. If not-- nothing harmful happens.

Testing on actual nodes in the actual network (not a "lab") shows that
blocks are normally requested from one of the last three peers they
were requested from 70% of the time, with no special affordances or
skipping samples when peers disconnected.

(77% for last 4, 88% for last 8)

This also _increases_ robustness. Right now a single peer failing at
the wrong time will delay blocks with a long time out. In high
bandwidth mode the redundancy means that node will be much more likely
to make progress without timeout delays-- so long at least one of the
the selected opportunistic mode peers was successful.

Because the decision is non-normative to the protocol, nodes can
decide based on better criteria if better criteria is discovered in
the future.

> Another problem with your solution is that nodes send a much larger amount of
> unsolicited data to peers in the form of the thin-block compared to the normal
> inv or header-first data.

"High bandwidth" mode uses somewhat more bandwidth than low
bandwidth... but still >>10 times less than an ordinary getdata relay
which is used ubiquitously today.

If a node is trying to minimize bandwidth usage, it can choose to not
request the "high bandwidth" mode.

The latency bound cannot be achieved without unsolicited data. The
best we can while achieving 0.5 RTT is try to arrange things so that
the information received is maximally useful and as small as
reasonably possible.

If receivers implemented joint decoding (combining multiple
comprblocks in the event of faild decoding) 4 byte IDs would be
completely reasonable, and were what I originally suggested (along
with forward error correction data, in that case).

> Am I to understand that you choose the solution based on the fact that service
> bits are too expensive to extend? (if not, please respond to my previous
> question actually answering the suggestion)
>
> That sounds like a rather bad way of doing design. Maybe you can add a second
> service bits field of message instead and then do the compact blocks 
> correctly.

Service bits are not generally a good mechanism for negating optional
peer-local parameters.

The settings for compactblocks can change at runtime, having to
reconnect to change them would be obnoxious.

> Wait, you didn't steal the variable length encoding from an existing standard
> and you programmed a new one?

This is one of the two variable length encodings used for years in
Bitcoin Core. This is just the first time it's shown up in a BIP.

[It's a little disconcerting that you appear to be maintaining a fork
and are unaware of this.]

> Look at UTF-8 on wikipedia, you may have "invented" the same encoding that IBM
> published in 1992.

The similarity with UTF-8 is that both are variable length and some
control information is in the high bits. The similarity ends there.

UTF-8 is more complex and less efficient for this application (coding
small numbers), as it has to handle things like resynchronization
which are critical in text but irrelevant in our framed, checksummed,
reliably transported binary protocol.

> Just the first (highest) 8 bytes of a sha256 hash.
>
> The amount of collisions will not be less if you start xoring the rest.
> The whole reason for doing this extra work is also irrelevant as a spam
> protection.

Then you expose it to a trivial collision attack:  To find two 64 bit
hashes that collide I need perform only roughly 2^32 computation. Then
I can send them to the network.  You cannot reason about these systems
just by assuming that bad things happen only according to pure chance.

This issue is eliminated by salting the hash.  Moreover, with
per-source randomization of the hash, when a rare chance collision
happens it only impacts a single node at a time, so the propagation
doesn't stall network wide on an unlucky block; it just goes slower on
a tiny number of links a tiny percent of the time (instead of breaking
everywhere an even tinyer amount of the time)-- in the non-attacker,
chance event case.
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] Compact Block Relay BIP

2016-05-09 Thread Tom Zander via bitcoin-dev
On Sunday, May 08, 2016 03:24:22 AM Matt Corallo wrote:
> >> ===Intended Protocol Flow===
> > 
> > I'm not a fan of the solution that a CNode should keep state and talk to
> > its remote nodes differently while announcing new blocks.
> > Its too complicated and ultimately counter-productive.
> > 
> > The problem is that an individual node needs to predict network behaviour
> > in advance. With the downside that if it guesses wrong that both nodes
> > end up paying for the wrong guess.
> > This is not a good way to design a p2p layer.
> 
> Nodes don't need to predict much in advance, and the cost for predicting
> wrong is 0 if your peers receive blocks with a few hundred ms between
> them (as we should expect) and you haven't set the announce bit on more
> than a few peers (as the spec requires for this reason).

You misunderstand the networking effects.
The fact that your node is required to choose which one to set the announce 
bit on implies that it needs to predict which node will have the best data in 
the future.
It needs to predict which nodes will not start being incommunicado and it 
requires them to predict all the things that are not possible to predict in a 
network.
In networking it is even more true than in stocks; results of the past are no 
guarantee for the future.

This means you are creating a fragile system. Your system will only work in 
laboratory situations.  It will fail spectacularly when the network or the 
internet is under stress or some parts fall away.


Another problem with your solution is that nodes send a much larger amount of 
unsolicited data to peers in the form of the thin-block compared to the normal 
inv or header-first data.

Saying this is mitigated by only subscribing on this data from a small 
subsection of nodes means you position yourself in a situation that I 
displayed above. A tradeoff of fragile and fast.  With no possible way to make 
a node automatically decide on a good equilibrium.


> It seems I forgot to add a suggested peer-preforwarding-selection
> algorithm in the text, but the intended use-case is to set the bit on
> peers which recently provided you blocks faster than other peers, up to
> only one or three peers. This is both simple and should be incredibly
> effective.

Network autorepair systems have been researched for decades, no real solution 
has as of yet appeared. 
PHDs are written on the subject and you want to make this a design for Bitcoin 
based on "[it] should be incredibly effective", I think you are underestimating 
the subject matter you are dealing with.


> > I would suggest that a new block is announced to all nodes equally and
> > then
> > individual nodes can respond with a request of either a 'compact' or a
> > normal block.
> > This is much more in line with the current design as well.
> > 
> > Detection if remote nodes support compact blocks, for the purpose of
> > requesting a compact-block, can be done either via a network-bit or just a
> > protocol version. Or something else entirely, if you have better
> > suggestions.
> 
> In line with recent trends, neither service bits nor protocol versions
> are particularly well-suited for this purpose.

Am I to understand that you choose the solution based on the fact that service 
bits are too expensive to extend? (if not, please respond to my previous 
question actually answering the suggestion)

That sounds like a rather bad way of doing design. Maybe you can add a second 
service bits field of message instead and then do the compact blocks correctly.


> >> Variable-length integers: bytes are a MSB base-128 encoding of the
> >> number.
> >> The high bit in each byte signifies whether another digit follows.
> >> [snip bitwise spec]
> > 
> > I suggest just referring to UTF-8 which describes this just fine.
> > it is good practice to refer to existing specs when possible and not copy
> > the details.
> 
> Hmm? There is no UTF anywhere in this protocol. Indeed this section
> needs to be rewritten, as indicated. I'd recommend you read the code
> until I update the section with better text if you're confused.

Wait, you didn't steal the variable length encoding from an existing standard 
and you programmed a new one?
I strongly suggest you don't reinvent this kind of protocol level encodings 
but instead steal from something like UTF8. Which has been around for decades.

Please base your standard on other standards where possible.

Look at UTF-8 on wikipedia, you may have "invented" the same encoding that IBM 
published in 1992.


> >> Short transaction IDs
> >> Short transaction IDs are used to represent a transaction without
> >> sending a full 256-bit hash. They are calculated by:
> >> # single-SHA256 hashing the block header with the nonce appended (in
> >> little-endian)
> >> # XORing each 8-byte chunk of the double-SHA256 transaction hash with
> >> each corresponding 8-byte chunk of the hash from the previous step
> >> # Adding each of the XORed 8-byte chunks together 

Re: [bitcoin-dev] Committed bloom filters for improved wallet performance and SPV security

2016-05-09 Thread Gregory Maxwell via bitcoin-dev
On Mon, May 9, 2016 at 8:26 AM, bfd--- via bitcoin-dev
 wrote:
> We introduce several concepts that rework the lightweight Bitcoin
> client model in a manner which is secure, efficient and privacy
> compatible.
[...]
> A Bloom Filter Digest is deterministically created of every block

I think this is a fantastic idea.

Some napkin work shows that it has pretty good communications
bandwidth so long as you assume that the wallet has many keys (e.g.
more than the number of the outputs in the block)-- otherwise BIP37
uses less bandwidth, but you note its terrible privacy problems.

You should be aware that when the filter is transmitted but not
updated, as it is in these filtering applications, the bloom filter is
not the most communication efficient data structure.

The most efficient data structure is similar to a bloom filter, but
you use more bits and only one hash function. The result will be
mostly zero bits. Then you entropy code it using RLE+Rice coding or an
optimal binomial packer (e.g.
https://people.xiph.org/~greg/binomial_codec.c).  This is about 45%
more space efficient than a bloom filter. ... it's just a PITA to
update, though that is inapplicable here.  Entropy coding for this can
be quite fast, if many lookups are done the decompression could even
be faster than having to use two dozen hash functions for each lookup.

The intuition is that this kind of simple hash-bitmap is great, but
space inefficient if you don't have compression since most of the bits
are 0 you end up spending a bit to send less than a bit of
information. A bloom filter improve the situation by using the
multiple filters to increase the ones density to 50%, but the
increased collisions create overhead. This is important when its a
in-memory data-structure that you're updating often, but not here.

One thing to do with matching blocks is after finding the matches the
node could potentially consult some PIR to get the blocks it cares
about... thus preventing a leak of which blocks it was interested in,
but not taking PIR costs for the whole chain or requiring the
implementation of PIR tree search (which is theoretically simple but
in practice hard to implement).
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


[bitcoin-dev] Committed bloom filters for improved wallet performance and SPV security

2016-05-09 Thread bfd--- via bitcoin-dev

We introduce several concepts that rework the lightweight Bitcoin
client model in a manner which is secure, efficient and privacy
compatible.

Thea properties of BIP37 SPV [0] are unfortunately not as strong as
originally thought:

* The expected privacy of the probabilistic nature of bloom
  filters does not exist [1][2], any user with a BIP37 SPV wallet
  should be operating under no expectation of privacy.
  Implementation flaws make this effect significantly worse, the
  behavior meaning that no matter how high the false positive
  rate (up to simply downloading the whole blocks verbatim) the
  intent of the client connection is recoverable.

* Significant processing load is placed on nodes in the Bitcoin
  network by lightweight clients, a single syncing wallet causes
  (at the time of writing) 80GB of disk reads and a large amount
  of CPU time to be consumed processing this data. This carries
  significant denial of service risk [3], non-distinguishable
  clients can repeatedly request taxing blocks causing
  reprocessing on every request. Processed data is unique to every
  client, and can not be cached or made more efficient while
  staying within specification.

* Wallet clients can not have strong consistency or security
  expectations, BIP37 merkle paths allow for a wallet to validate
  that an output was spendable at some point in time but does not
  prove that this output is not spent today.

* Nodes in the network can denial of service attack all BIP37 SPV
  wallet clients by simply returning null filter results for
  requests, the wallet has no way of discerning if it has been
  lied to and may be made simply unaware that any payment has been
  made to them. Many nodes can be queried in a probabilistic manor
  but this increases the already heavy network load with little
  benefit.



We propose a new concept which can work towards addressing these
shortcomings.


A Bloom Filter Digest is deterministically created of every block
encompassing the inputs and outputs of the containing transactions,
the filter parameters being tuned such that the filter is a small
portion of the size of the total block data. To determine if a block
has contents which may be interesting a second bloom filter of all
relevant key material is created. A binary comparison between the two
filters returns true if there is probably matching transactions, and
false if there is certainly no matching transactions. Any matched
blocks can be downloaded in full and processed for transactions which
may be relevant.

The BFD can be used verbatim in replacement of BIP37, where the filter
can be cached between clients without needing to be recomputed. It can
also be used by normal pruned nodes to do re-scans locally of their
wallet without needing to have the block data available to scan, or
without reading the entire block chain from disk.

-

For improved probabilistic security the bloom filters can be presented
to lightweight clients by semi-trusted oracles. A client wallet makes
an assumption that they trust a set, or subset of remote parties
(wallet vendors, services) which all all sign the BFD for each block.
The BFD can be downloaded from a single remote source, and the hash of
the filters compared against others in the trust set. Agreement is a
weak suggestion that the filter has not been tampered with, assuming
that these parties are not conspiring to defraud the client.

The oracles do not learn any additional information about the client
wallet, the client can download the block data from either nodes on
the network, HTTP services, NTTP, or any other out of band
communication method that provides the privacy desired by the client.

-

The security model of the oracle bloom filter can be vastly improved
by instead committing a hash of the BFD inside every block as a soft-
fork consensus rule change. After this, every node in the network would
build the filter and validate that the hash in the block is correct,
then make a conscious choice discard it for space savings or cache the
data to disk.

With a commitment to the filter it becomes impossible to lie to
lightweight clients by omission. Lightweight clients are provided with
a block header, merkle path, and the BFD. Altering the BFD invalidates
the merkle proof, it's validity is a strong indicator that the client
has an unadulterated picture of the UTXO condition without needing to
build one itself. A strong assurance that the hash of the BFD means
that the filters can be downloaded out of band along with the block
data at the leisure of the client, allowing for significantly greater
privacy and taking load away from the P2P Bitcoin network.

Committing the BFD is not a hard forking change, and does not require
alterations to mining software so long as the coinbase transaction
scriptSig is not included in the bloom filter.


[0] https://github.com/bitcoin/bips