Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-13 Thread Tier Nolan
On Wed, May 13, 2015 at 6:19 AM, Daniel Kraft d...@domob.eu wrote:

 2) Divide the range of all blocks into intervals with exponentially
 growing size.  I. e., something like this:

 1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...


Interesting.  This can be combined with the system I suggested.

A node broadcasts 3 pieces of information

Seed (16 bits): This is the seed
M_bits_lsb (1 bit):  Used to indicate M during a transition
N (7 bits):  This is the count of the last range held (or partially held)

M = 1  M_bits

M should be set to the lowest power of 2 greater than double the block
chain height

That gives M = 1 million at the moment.  During changing M, some nodes will
be using the higher M and others will use the lower M.

The M_bits_lsb field allows those to be distinguished.

As the block height approaches 512k, nodes can begin to upgrade.  For a
period around block 512k, some nodes could use M = 1 million and others
could use M = 2 million.

Assuming M is around 3 times higher than the block height, then the odds of
a start being less than the block height is around 35%.  If they runs by
25% each step, then that is approx a double for each hit.

Size(n) = ((4 + (n  0x3))  (n  2)) * 2.5MB

This gives an exponential increase, but groups of 4 are linearly
interpolated.


*Size(0) = 10 MB*
Size(1) = 12.5MB
Size(2) = 15 MB
Size(3) = 17.5MB
Size(4) = 20MB

*Size(5) = 25MB*
Size(6) = 30MB
Size(7) = 35MB

*Size(8) = 40MB*

Start(n) = Hash(seed + n) mod M

A node should store as much of its last start as possible.  Assuming start
0, 5, and 8 were hits but the node had a max size of 60MB.  It can store
0 and 5 and have 25MB left.  That isn't enough to store all of run 8, but
it should store 25MB of the blocks in run 8 anyway.

Size(255) = pow(2, 31) * 17.5MB = 35,840 TB

Decreasing N only causes previously accepted runs to be invalidated.

When a node approaches a transition point for N, it would select a block
height within 25,000 of the transition point.  Once it reaches that block,
it will begin downloading the new runs that it needs.  When updating, it
can set N to zero.  This spreads out the upgrade (over around a year), with
only a small number of nodes upgrading at any time.

New nodes should use the higher M, if near a transition point (say within
100,000).
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread gabe appleton
Yes, but that just increases the incentive for partially-full nodes. It
would add to the assumed-small number of full nodes.

Or am I misunderstanding?

On Tue, May 12, 2015 at 12:05 PM, Jeff Garzik jgar...@bitpay.com wrote:

 A general assumption is that you will have a few archive nodes with the
 full blockchain, and a majority of nodes are pruned, able to serve only the
 tail of the chains.


 On Tue, May 12, 2015 at 8:26 AM, gabe appleton gapplet...@gmail.com
 wrote:

 Hi,

 There's been a lot of talk in the rest of the community about how the
 20MB step would increase storage needs, and that switching to pruned nodes
 (partially) would reduce network security. I think I may have a solution.

 There could be a hybrid option in nodes. Selecting this would do the
 following:
 Flip the --no-wallet toggle
 Select a section of the blockchain to store fully (percentage based,
 possibly on hash % sections?)
 Begin pruning all sections not included in 2
 The idea is that you can implement it similar to how a Koorde is done, in
 that the network will decide which sections it retrieves. So if the user
 prompts it to store 50% of the blockchain, it would look at its peers, and
 at their peers (if secure), and choose the least-occurring options from
 them.

 This would allow them to continue validating all transactions, and still
 store a full copy, just distributed among many nodes. It should overall
 have little impact on security (unless I'm mistaken), and it would
 significantly reduce storage needs on a node.

 It would also allow for a retroactive --max-size flag, where it will
 prune until it is at the specified size, and continue to prune over time,
 while keeping to the sections defined by the network.

 What sort of side effects or network vulnerabilities would this
 introduce? I know some said it wouldn't be Sybil resistant, but how would
 this be less so than a fully pruned node?


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development




 --
 Jeff Garzik
 Bitcoin core developer and open source evangelist
 BitPay, Inc.  https://bitpay.com/

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Peter Todd
On Tue, May 12, 2015 at 09:05:44AM -0700, Jeff Garzik wrote:
 A general assumption is that you will have a few archive nodes with the
 full blockchain, and a majority of nodes are pruned, able to serve only the
 tail of the chains.

Hmm?

Lots of people are tossing around ideas for partial archival nodes that
would store a subset of blocks, such that collectively the whole
blockchain would be available even if no one node had the entire chain.

-- 
'peter'[:-1]@petertodd.org
156d2069eeebb3309455f526cfe50efbf8a85ec630df7f7c


signature.asc
Description: Digital signature
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Jeff Garzik
A general assumption is that you will have a few archive nodes with the
full blockchain, and a majority of nodes are pruned, able to serve only the
tail of the chains.


On Tue, May 12, 2015 at 8:26 AM, gabe appleton gapplet...@gmail.com wrote:

 Hi,

 There's been a lot of talk in the rest of the community about how the 20MB
 step would increase storage needs, and that switching to pruned nodes
 (partially) would reduce network security. I think I may have a solution.

 There could be a hybrid option in nodes. Selecting this would do the
 following:
 Flip the --no-wallet toggle
 Select a section of the blockchain to store fully (percentage based,
 possibly on hash % sections?)
 Begin pruning all sections not included in 2
 The idea is that you can implement it similar to how a Koorde is done, in
 that the network will decide which sections it retrieves. So if the user
 prompts it to store 50% of the blockchain, it would look at its peers, and
 at their peers (if secure), and choose the least-occurring options from
 them.

 This would allow them to continue validating all transactions, and still
 store a full copy, just distributed among many nodes. It should overall
 have little impact on security (unless I'm mistaken), and it would
 significantly reduce storage needs on a node.

 It would also allow for a retroactive --max-size flag, where it will prune
 until it is at the specified size, and continue to prune over time, while
 keeping to the sections defined by the network.

 What sort of side effects or network vulnerabilities would this introduce?
 I know some said it wouldn't be Sybil resistant, but how would this be less
 so than a fully pruned node?


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development




-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.  https://bitpay.com/
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Jeff Garzik
True.  Part of the issue rests on the block sync horizon/cliff.  There is a
value X which is the average number of blocks the 90th percentile of nodes
need in order to sync.  It is sufficient for the [semi-]pruned nodes to
keep X blocks, after which nodes must fall back to archive nodes for older
data.

There is simply far, far more demand for recent blocks, and the demand for
old blocks very rapidly falls off.

There was even a more radical suggestion years ago - refuse to sync if too
old (2 weeks?), and force the user to download ancient data via torrent.



On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell gmaxw...@gmail.com wrote:

 On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik jgar...@bitpay.com wrote:
  One general problem is that security is weakened when an attacker can
 DoS a
  small part of the chain by DoS'ing a small number of nodes - yet the
 impact
  is a network-wide DoS because nobody can complete a sync.

 It might be more interesting to think of that attack as a bandwidth
 exhaustion DOS attack on the archive nodes... if you can't get a copy
 without them, thats where you'll go.

 So the question arises: does the option make some nodes that would
 have been archive not be? Probably some-- but would it do so much that
 it would offset the gain of additional copies of the data when those
 attacks are not going no. I suspect not.

 It's also useful to give people incremental ways to participate even
 when they can't swollow the whole pill; or choose to provide the
 resource thats cheap for them to provide.  In particular, if there is
 only two kinds of full nodes-- archive and pruned; then the archive
 nodes take both a huge disk and bandwidth cost; where as if there are
 fractional then archives take low(er) bandwidth unless the fractionals
 get DOS attacked.




-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.  https://bitpay.com/
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread gabe appleton
Yet this holds true in our current assumptions of the network as well: that
it will become a collection of pruned nodes with a few storage nodes.

A hybrid option makes this better, because it spreads the risk, rather than
concentrating it in full nodes.
On May 12, 2015 3:38 PM, Jeff Garzik jgar...@bitpay.com wrote:

 One general problem is that security is weakened when an attacker can DoS
 a small part of the chain by DoS'ing a small number of nodes - yet the
 impact is a network-wide DoS because nobody can complete a sync.


 On Tue, May 12, 2015 at 12:24 PM, gabe appleton gapplet...@gmail.com
 wrote:

 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
 give the signed (by sender) hash of the first and last block in your range.
 This is less data dense than the idea above, but it might work better.

 That said, this is likely a less secure way to do it. To improve upon
 that, a node could request a block of random height within that range and
 verify it, but that violates point 2. And the scheme in itself definitely
 violates point 7.
 On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote:

 It's a little frustrating to see this just repeated without even
 paying attention to the desirable characteristics from the prior
 discussions.

 Summarizing from memory:

 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

 (5) The communication about what blocks a node has should be compact.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.

 I've previously proposed schemes which come close but fail one of the
 above.

 (e.g. a scheme based on reservoir sampling that gives uniform
 selection of contiguous ranges, communicating only 64 bits of data to
 know what blocks a node claims to have, remaining totally uniform as
 the chain grows, without any need to refetch -- but needs O(height)
 work to figure out what blocks a peer has from the data it
 communicated.;   or another scheme based on consistent hashes that has
 log(height) computation; but sometimes may result in a node needing to
 go refetch an old block range it previously didn't store-- creating
 re-balancing traffic.)

 So far something that meets all those criteria (and/or whatever ones
 I'm not remembering) has not been discovered; but I don't really think
 much time has been spent on it. I think its very likely possible.


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development



 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 

Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Tier Nolan
On Tue, May 12, 2015 at 6:16 PM, Peter Todd p...@petertodd.org wrote:


 Lots of people are tossing around ideas for partial archival nodes that
 would store a subset of blocks, such that collectively the whole
 blockchain would be available even if no one node had the entire chain.


A compact way to describe which blocks are stored helps to mitigate against
fingerprint attacks.

It also means that a node could compactly indicate which blocks it stores
with service bits.

The node could pick two numbers

W = window = a power of 2
P = position = random value less than W

The node would store all blocks with a height of P mod W.  The block hash
could be used too.

This has the nice feature that the node can throw away half of its data and
still represent what is stored.

W_new = W * 2
P_new = (random_bool()) ? P + W/2 : P;

Half of the stored blocks would match P_new mod W_new and the other half
could be deleted.  This means that the store would use up between 50% and
100% of the allocated size.

Another benefit is that it increases the probability that at least someone
has every block.

If N nodes each store 1% of the blocks, then the odds of a block being
stored is pow(0.99, N).  For 1000 nodes, that gives odds of 1 in 23,164
that a block will be missing.  That means that around 13 out of 300,000
blocks would be missing.  There would likely be more nodes than that, and
also storage nodes, so it is not a major risk.

If everyone is storing 1% of blocks, then they would set W to 128.  As long
as all of the 128 buckets is covered by some nodes, then all blocks are
stored.  With 1000 nodes, that gives odds of 0.6% that at least one bucket
will be missed.  That is better than around 13 blocks being missing.

Nodes could inform peers of their W and P parameters on connection.  The
version message could be amended or a getparams message of some kind
could be added.

W could be encoded with 4 bits and P could be encoded with 16 bits, for 20
in total.  W = 1  bits[19:16] and P = bits[14:0].  That gives a maximum W
of 32768, which is likely to many bits for P.

Initial download would be harder, since new nodes would have to connect to
at least 100 different nodes.  They could download from random nodes, and
just download the ones they are missing from storage nodes.  Even storage
nodes could have a range of W values.
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Gregory Maxwell
It's a little frustrating to see this just repeated without even
paying attention to the desirable characteristics from the prior
discussions.

Summarizing from memory:

(0) Block coverage should have locality; historical blocks are
(almost) always needed in contiguous ranges.   Having random peers
with totally random blocks would be horrific for performance; as you'd
have to hunt down a working peer and make a connection for each block
with high probability.

(1) Block storage on nodes with a fraction of the history should not
depend on believing random peers; because listening to peers can
easily create attacks (e.g. someone could break the network; by
convincing nodes to become unbalanced) and not useful-- it's not like
the blockchain is substantially different for anyone; if you're to the
point of needing to know coverage to fill then something is wrong.
Gaps would be handled by archive nodes, so there is no reason to
increase vulnerability by doing anything but behaving uniformly.

(2) The decision to contact a node should need O(1) communications,
not just because of the delay of chasing around just to find who has
someone; but because that chasing process usually makes the process
_highly_ sybil vulnerable.

(3) The expression of what blocks a node has should be compact (e.g.
not a dense list of blocks) so it can be rumored efficiently.

(4) Figuring out what block (ranges) a peer has given should be
computationally efficient.

(5) The communication about what blocks a node has should be compact.

(6) The coverage created by the network should be uniform, and should
remain uniform as the blockchain grows; ideally it you shouldn't need
to update your state to know what blocks a peer will store in the
future, assuming that it doesn't change the amount of data its
planning to use. (What Tier Nolan proposes sounds like it fails this
point)

(7) Growth of the blockchain shouldn't cause much (or any) need to
refetch old blocks.

I've previously proposed schemes which come close but fail one of the above.

(e.g. a scheme based on reservoir sampling that gives uniform
selection of contiguous ranges, communicating only 64 bits of data to
know what blocks a node claims to have, remaining totally uniform as
the chain grows, without any need to refetch -- but needs O(height)
work to figure out what blocks a peer has from the data it
communicated.;   or another scheme based on consistent hashes that has
log(height) computation; but sometimes may result in a node needing to
go refetch an old block range it previously didn't store-- creating
re-balancing traffic.)

So far something that meets all those criteria (and/or whatever ones
I'm not remembering) has not been discovered; but I don't really think
much time has been spent on it. I think its very likely possible.

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Jeff Garzik
One general problem is that security is weakened when an attacker can DoS a
small part of the chain by DoS'ing a small number of nodes - yet the impact
is a network-wide DoS because nobody can complete a sync.


On Tue, May 12, 2015 at 12:24 PM, gabe appleton gapplet...@gmail.com
wrote:

 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
 give the signed (by sender) hash of the first and last block in your range.
 This is less data dense than the idea above, but it might work better.

 That said, this is likely a less secure way to do it. To improve upon
 that, a node could request a block of random height within that range and
 verify it, but that violates point 2. And the scheme in itself definitely
 violates point 7.
 On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote:

 It's a little frustrating to see this just repeated without even
 paying attention to the desirable characteristics from the prior
 discussions.

 Summarizing from memory:

 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

 (5) The communication about what blocks a node has should be compact.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.

 I've previously proposed schemes which come close but fail one of the
 above.

 (e.g. a scheme based on reservoir sampling that gives uniform
 selection of contiguous ranges, communicating only 64 bits of data to
 know what blocks a node claims to have, remaining totally uniform as
 the chain grows, without any need to refetch -- but needs O(height)
 work to figure out what blocks a peer has from the data it
 communicated.;   or another scheme based on consistent hashes that has
 log(height) computation; but sometimes may result in a node needing to
 go refetch an old block range it previously didn't store-- creating
 re-balancing traffic.)

 So far something that meets all those criteria (and/or whatever ones
 I'm not remembering) has not been discovered; but I don't really think
 much time has been spent on it. I think its very likely possible.


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development



 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development




-- 
Jeff Garzik
Bitcoin core developer and open source evangelist
BitPay, Inc.  

Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread gabe appleton
0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie,
give the signed (by sender) hash of the first and last block in your range.
This is less data dense than the idea above, but it might work better.

That said, this is likely a less secure way to do it. To improve upon that,
a node could request a block of random height within that range and verify
it, but that violates point 2. And the scheme in itself definitely violates
point 7.
On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote:

 It's a little frustrating to see this just repeated without even
 paying attention to the desirable characteristics from the prior
 discussions.

 Summarizing from memory:

 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

 (5) The communication about what blocks a node has should be compact.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.

 I've previously proposed schemes which come close but fail one of the
 above.

 (e.g. a scheme based on reservoir sampling that gives uniform
 selection of contiguous ranges, communicating only 64 bits of data to
 know what blocks a node claims to have, remaining totally uniform as
 the chain grows, without any need to refetch -- but needs O(height)
 work to figure out what blocks a peer has from the data it
 communicated.;   or another scheme based on consistent hashes that has
 log(height) computation; but sometimes may result in a node needing to
 go refetch an old block range it previously didn't store-- creating
 re-balancing traffic.)

 So far something that meets all those criteria (and/or whatever ones
 I'm not remembering) has not been discovered; but I don't really think
 much time has been spent on it. I think its very likely possible.


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread gabe appleton
I suppose this begs two questions:

1) why not have a partial archive store the most recent X% of the
blockchain by default?

2) why not include some sort of torrent in QT, to mitigate this risk? I
don't think this is necessarily a good idea, but I'd like to hear the
reasoning.
On May 12, 2015 4:11 PM, Jeff Garzik jgar...@bitpay.com wrote:

 True.  Part of the issue rests on the block sync horizon/cliff.  There is
 a value X which is the average number of blocks the 90th percentile of
 nodes need in order to sync.  It is sufficient for the [semi-]pruned nodes
 to keep X blocks, after which nodes must fall back to archive nodes for
 older data.

 There is simply far, far more demand for recent blocks, and the demand for
 old blocks very rapidly falls off.

 There was even a more radical suggestion years ago - refuse to sync if too
 old (2 weeks?), and force the user to download ancient data via torrent.



 On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell gmaxw...@gmail.com
 wrote:

 On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik jgar...@bitpay.com wrote:
  One general problem is that security is weakened when an attacker can
 DoS a
  small part of the chain by DoS'ing a small number of nodes - yet the
 impact
  is a network-wide DoS because nobody can complete a sync.

 It might be more interesting to think of that attack as a bandwidth
 exhaustion DOS attack on the archive nodes... if you can't get a copy
 without them, thats where you'll go.

 So the question arises: does the option make some nodes that would
 have been archive not be? Probably some-- but would it do so much that
 it would offset the gain of additional copies of the data when those
 attacks are not going no. I suspect not.

 It's also useful to give people incremental ways to participate even
 when they can't swollow the whole pill; or choose to provide the
 resource thats cheap for them to provide.  In particular, if there is
 only two kinds of full nodes-- archive and pruned; then the archive
 nodes take both a huge disk and bandwidth cost; where as if there are
 fractional then archives take low(er) bandwidth unless the fractionals
 get DOS attacked.




 --
 Jeff Garzik
 Bitcoin core developer and open source evangelist
 BitPay, Inc.  https://bitpay.com/

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Gregory Maxwell
On Tue, May 12, 2015 at 8:10 PM, Jeff Garzik jgar...@bitpay.com wrote:
 True.  Part of the issue rests on the block sync horizon/cliff.  There is a
 value X which is the average number of blocks the 90th percentile of nodes
 need in order to sync.  It is sufficient for the [semi-]pruned nodes to keep
 X blocks, after which nodes must fall back to archive nodes for older data.


Prior discussion had things like the definition of pruned means you
have and will serve at least the last 288 from your tip (which is
what I put in the pruned service bip text); and another flag for I
have at least the last 2016.  (2016 should be reevaluated-- it was
just a round number near where sipa's old data showed the fetch
probability flatlined.

But that data was old,  but what it showed that the probability of a
block being fetched vs depth looked like a exponential drop-off (I
think with a 50% at 3-ish days); plus a constant low probability.
Which is probably what we should have expected.

 There was even a more radical suggestion years ago - refuse to sync if too
 old (2 weeks?), and force the user to download ancient data via torrent.

I'm not fond of this; it makes the system dependent on centralized
services (e.g. trackers and sources of torrents). A torrent also
cannot very efficiently handle fractional copies; cannot efficiently
grow over time. Bitcoin should be complete-- plus, many nodes already
have the data.

--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Tier Nolan
On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell gmaxw...@gmail.com wrote:


 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

 (5) The communication about what blocks a node has should be compact.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.


M = 1,000,000
N = number of starts

S(0) = hash(seed) mod M
...
S(n) = hash(S(n-1)) mod M

This generates a sequence of start points.  If the start point is less than
the block height, then it counts as a hit.

The node stores the 50MB of data starting at the block at height S(n).

As the blockchain increases in size, new starts will be less than the block
height.  This means some other runs would be deleted.

A weakness is that it is random with regards to block heights.  Tiny blocks
have the same priority as larger blocks.

0) Blocks are local, in 50MB runs
1) Agreed, nodes should download headers-first (or some other compact way
of finding the highest POW chain)
2) M could be fixed, N and the seed are all that is required.  The seed
doesn't have to be that large.  If 1% of the blockchain is stored, then 16
bits should be sufficient so that every block is covered by seeds.
3) N is likely to be less than 2 bytes and the seed can be 2 bytes
4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
That is 10 hashes.  They don't even necessarily need to be crypt hashes
5) Isn't this the same as 3?
6) Every block has the same odds of being included.  There inherently needs
to be an update when a node deletes some info due to exceeding its cap.  N
can be dropped one run at a time.
7) When new starts drop below the tip height, N can be decremented and that
one run is deleted.

There would need to be a special rule to ensure the low height blocks are
covered.  Nodes should keep the first 50MB of blocks with some probability
(10%?)
--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.
http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___
Bitcoin-development mailing list
Bitcoin-development@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bitcoin-development


Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread gabe appleton
This is exactly the sort of solution I was hoping for. It seems this is the
minimal modification to make it work, and, if someone was willing to work
with me, I would love to help implement this.

My only concern would be if the - - max-size flag is not included than this
delivers significantly less benefit to the end user. Still a good chunk,
but possibly not enough.
On May 12, 2015 6:03 PM, Tier Nolan tier.no...@gmail.com wrote:



 On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell gmaxw...@gmail.com
 wrote:


 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

 (5) The communication about what blocks a node has should be compact.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.


 M = 1,000,000
 N = number of starts

 S(0) = hash(seed) mod M
 ...
 S(n) = hash(S(n-1)) mod M

 This generates a sequence of start points.  If the start point is less
 than the block height, then it counts as a hit.

 The node stores the 50MB of data starting at the block at height S(n).

 As the blockchain increases in size, new starts will be less than the
 block height.  This means some other runs would be deleted.

 A weakness is that it is random with regards to block heights.  Tiny
 blocks have the same priority as larger blocks.

 0) Blocks are local, in 50MB runs
 1) Agreed, nodes should download headers-first (or some other compact way
 of finding the highest POW chain)
 2) M could be fixed, N and the seed are all that is required.  The seed
 doesn't have to be that large.  If 1% of the blockchain is stored, then 16
 bits should be sufficient so that every block is covered by seeds.
 3) N is likely to be less than 2 bytes and the seed can be 2 bytes
 4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run.
 That is 10 hashes.  They don't even necessarily need to be crypt hashes
 5) Isn't this the same as 3?
 6) Every block has the same odds of being included.  There inherently
 needs to be an update when a node deletes some info due to exceeding its
 cap.  N can be dropped one run at a time.
 7) When new starts drop below the tip height, N can be decremented and
 that one run is deleted.

 There would need to be a special rule to ensure the low height blocks are
 covered.  Nodes should keep the first 50MB of blocks with some probability
 (10%?)


 --
 One dashboard for servers and applications across Physical-Virtual-Cloud
 Widest out-of-the-box monitoring support with 50+ applications
 Performance metrics, stats and reports that give you Actionable Insights
 Deep dive visibility with transaction tracing using APM Insight.
 http://ad.doubleclick.net/ddm/clk/290420510;117567292;y
 ___
 Bitcoin-development mailing list
 Bitcoin-development@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bitcoin-development


--
One dashboard for servers and applications across Physical-Virtual-Cloud 
Widest out-of-the-box monitoring support with 50+ applications
Performance metrics, stats and reports that give you Actionable Insights
Deep dive visibility with transaction tracing using APM Insight.

Re: [Bitcoin-development] Proposed additional options for pruned nodes

2015-05-12 Thread Daniel Kraft
Hi all!

On 2015-05-12 21:03, Gregory Maxwell wrote:
 Summarizing from memory:

In the context of this discussion, let me also restate an idea I've
proposed in Bitcointalk for this.  It is probably not perfect and could
surely be adapted (I'm interested in that), but I think it meets
most/all of the criteria stated below.  It is similar to the idea with
start points, but gives O(log height) instead of O(height) for
determining which blocks a node has.

Let me for simplicity assume that the node wants to store 50% of all
blocks.  It is straight-forward to extend the scheme so that this is
configurable:

1) Create some kind of seed that can be compact and will be sent to
other peers to define which blocks the node has.  Use it to initialise a
PRNG of some sort.

2) Divide the range of all blocks into intervals with exponentially
growing size.  I. e., something like this:

1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ...

With this, only O(log height) intervals are necessary to cover height
blocks.

3) Using the PRNG, *one* of the two intervals of each length is
selected.  The node stores these blocks and discards the others.
(Possibly keeping the last 200 or 2,016 or whatever blocks additionally.)

 (0) Block coverage should have locality; historical blocks are
 (almost) always needed in contiguous ranges.   Having random peers
 with totally random blocks would be horrific for performance; as you'd
 have to hunt down a working peer and make a connection for each block
 with high probability.

You get contiguous block ranges (with at most O(log height) breaks).
Also ranges of newer blocks are longer, which may be an advantage if
those blocks are needed more often.

 (1) Block storage on nodes with a fraction of the history should not
 depend on believing random peers; because listening to peers can
 easily create attacks (e.g. someone could break the network; by
 convincing nodes to become unbalanced) and not useful-- it's not like
 the blockchain is substantially different for anyone; if you're to the
 point of needing to know coverage to fill then something is wrong.
 Gaps would be handled by archive nodes, so there is no reason to
 increase vulnerability by doing anything but behaving uniformly.

With my proposal, each node determines randomly and on its own which
blocks to store.  No believing anyone.

 (2) The decision to contact a node should need O(1) communications,
 not just because of the delay of chasing around just to find who has
 someone; but because that chasing process usually makes the process
 _highly_ sybil vulnerable.

Not exactly sure what you mean by that, but I think that's fulfilled.
You can (locally) compute in O(log height) from a node's seed whether or
not it has the blocks you need.  This needs only communication about the
node's seed.

 (3) The expression of what blocks a node has should be compact (e.g.
 not a dense list of blocks) so it can be rumored efficiently.

See above.

 (4) Figuring out what block (ranges) a peer has given should be
 computationally efficient.

O(log height).  Not O(1), but that's probably not a big issue.

 (5) The communication about what blocks a node has should be compact.

See above.

 (6) The coverage created by the network should be uniform, and should
 remain uniform as the blockchain grows; ideally it you shouldn't need
 to update your state to know what blocks a peer will store in the
 future, assuming that it doesn't change the amount of data its
 planning to use. (What Tier Nolan proposes sounds like it fails this
 point)

Coverage will be uniform if the seed is created randomly and the PRNG
has good properties.  No need to update the seed if the other node's
fraction is unchanged.  (Not sure if you suggest for nodes to define a
fraction or rather an absolute size.)

 (7) Growth of the blockchain shouldn't cause much (or any) need to
 refetch old blocks.

No need to do that with the scheme.

What do you think about this idea?  Some random thoughts from myself:

*) I need to formulate it in a more general way so that the fraction can
be arbitrary and not just 50%.  This should be easy to do, and I can do
it if there's interest.

*) It is O(log height) and not O(1), but that should not be too
different for the heights that are relevant.

*) Maybe it would be better / easier to not use the PRNG at all; just
decide to *always* use the first or the second interval with a given
size.  Not sure about that.

*) With the proposed scheme, the node's actual fraction of stored blocks
will vary between 1/2 and 2/3 (if I got the mathematics right, it is
still early) as the blocks come in.  Not sure if that's a problem.  I
can do a precise analysis of this property for an extended scheme if you
are interested in it.

Yours,
Daniel

-- 
http://www.domob.eu/
OpenPGP: 1142 850E 6DFF 65BA 63D6  88A8 B249 2AC4 A733 0737
Namecoin: id/domob - https://nameid.org/?name=domob
--
Done:  Arc-Bar-Cav-Hea-Kni-Ran-Rog-Sam-Tou-Val-Wiz
To go: Mon-Pri



signature.asc
Description: