Re: [Bitcoin-development] Proposed additional options for pruned nodes
On Wed, May 13, 2015 at 6:19 AM, Daniel Kraft d...@domob.eu wrote: 2) Divide the range of all blocks into intervals with exponentially growing size. I. e., something like this: 1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ... Interesting. This can be combined with the system I suggested. A node broadcasts 3 pieces of information Seed (16 bits): This is the seed M_bits_lsb (1 bit): Used to indicate M during a transition N (7 bits): This is the count of the last range held (or partially held) M = 1 M_bits M should be set to the lowest power of 2 greater than double the block chain height That gives M = 1 million at the moment. During changing M, some nodes will be using the higher M and others will use the lower M. The M_bits_lsb field allows those to be distinguished. As the block height approaches 512k, nodes can begin to upgrade. For a period around block 512k, some nodes could use M = 1 million and others could use M = 2 million. Assuming M is around 3 times higher than the block height, then the odds of a start being less than the block height is around 35%. If they runs by 25% each step, then that is approx a double for each hit. Size(n) = ((4 + (n 0x3)) (n 2)) * 2.5MB This gives an exponential increase, but groups of 4 are linearly interpolated. *Size(0) = 10 MB* Size(1) = 12.5MB Size(2) = 15 MB Size(3) = 17.5MB Size(4) = 20MB *Size(5) = 25MB* Size(6) = 30MB Size(7) = 35MB *Size(8) = 40MB* Start(n) = Hash(seed + n) mod M A node should store as much of its last start as possible. Assuming start 0, 5, and 8 were hits but the node had a max size of 60MB. It can store 0 and 5 and have 25MB left. That isn't enough to store all of run 8, but it should store 25MB of the blocks in run 8 anyway. Size(255) = pow(2, 31) * 17.5MB = 35,840 TB Decreasing N only causes previously accepted runs to be invalidated. When a node approaches a transition point for N, it would select a block height within 25,000 of the transition point. Once it reaches that block, it will begin downloading the new runs that it needs. When updating, it can set N to zero. This spreads out the upgrade (over around a year), with only a small number of nodes upgrading at any time. New nodes should use the higher M, if near a transition point (say within 100,000). -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
Yes, but that just increases the incentive for partially-full nodes. It would add to the assumed-small number of full nodes. Or am I misunderstanding? On Tue, May 12, 2015 at 12:05 PM, Jeff Garzik jgar...@bitpay.com wrote: A general assumption is that you will have a few archive nodes with the full blockchain, and a majority of nodes are pruned, able to serve only the tail of the chains. On Tue, May 12, 2015 at 8:26 AM, gabe appleton gapplet...@gmail.com wrote: Hi, There's been a lot of talk in the rest of the community about how the 20MB step would increase storage needs, and that switching to pruned nodes (partially) would reduce network security. I think I may have a solution. There could be a hybrid option in nodes. Selecting this would do the following: Flip the --no-wallet toggle Select a section of the blockchain to store fully (percentage based, possibly on hash % sections?) Begin pruning all sections not included in 2 The idea is that you can implement it similar to how a Koorde is done, in that the network will decide which sections it retrieves. So if the user prompts it to store 50% of the blockchain, it would look at its peers, and at their peers (if secure), and choose the least-occurring options from them. This would allow them to continue validating all transactions, and still store a full copy, just distributed among many nodes. It should overall have little impact on security (unless I'm mistaken), and it would significantly reduce storage needs on a node. It would also allow for a retroactive --max-size flag, where it will prune until it is at the specified size, and continue to prune over time, while keeping to the sections defined by the network. What sort of side effects or network vulnerabilities would this introduce? I know some said it wouldn't be Sybil resistant, but how would this be less so than a fully pruned node? -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc. https://bitpay.com/ -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
On Tue, May 12, 2015 at 09:05:44AM -0700, Jeff Garzik wrote: A general assumption is that you will have a few archive nodes with the full blockchain, and a majority of nodes are pruned, able to serve only the tail of the chains. Hmm? Lots of people are tossing around ideas for partial archival nodes that would store a subset of blocks, such that collectively the whole blockchain would be available even if no one node had the entire chain. -- 'peter'[:-1]@petertodd.org 156d2069eeebb3309455f526cfe50efbf8a85ec630df7f7c signature.asc Description: Digital signature -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
A general assumption is that you will have a few archive nodes with the full blockchain, and a majority of nodes are pruned, able to serve only the tail of the chains. On Tue, May 12, 2015 at 8:26 AM, gabe appleton gapplet...@gmail.com wrote: Hi, There's been a lot of talk in the rest of the community about how the 20MB step would increase storage needs, and that switching to pruned nodes (partially) would reduce network security. I think I may have a solution. There could be a hybrid option in nodes. Selecting this would do the following: Flip the --no-wallet toggle Select a section of the blockchain to store fully (percentage based, possibly on hash % sections?) Begin pruning all sections not included in 2 The idea is that you can implement it similar to how a Koorde is done, in that the network will decide which sections it retrieves. So if the user prompts it to store 50% of the blockchain, it would look at its peers, and at their peers (if secure), and choose the least-occurring options from them. This would allow them to continue validating all transactions, and still store a full copy, just distributed among many nodes. It should overall have little impact on security (unless I'm mistaken), and it would significantly reduce storage needs on a node. It would also allow for a retroactive --max-size flag, where it will prune until it is at the specified size, and continue to prune over time, while keeping to the sections defined by the network. What sort of side effects or network vulnerabilities would this introduce? I know some said it wouldn't be Sybil resistant, but how would this be less so than a fully pruned node? -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc. https://bitpay.com/ -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
True. Part of the issue rests on the block sync horizon/cliff. There is a value X which is the average number of blocks the 90th percentile of nodes need in order to sync. It is sufficient for the [semi-]pruned nodes to keep X blocks, after which nodes must fall back to archive nodes for older data. There is simply far, far more demand for recent blocks, and the demand for old blocks very rapidly falls off. There was even a more radical suggestion years ago - refuse to sync if too old (2 weeks?), and force the user to download ancient data via torrent. On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell gmaxw...@gmail.com wrote: On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik jgar...@bitpay.com wrote: One general problem is that security is weakened when an attacker can DoS a small part of the chain by DoS'ing a small number of nodes - yet the impact is a network-wide DoS because nobody can complete a sync. It might be more interesting to think of that attack as a bandwidth exhaustion DOS attack on the archive nodes... if you can't get a copy without them, thats where you'll go. So the question arises: does the option make some nodes that would have been archive not be? Probably some-- but would it do so much that it would offset the gain of additional copies of the data when those attacks are not going no. I suspect not. It's also useful to give people incremental ways to participate even when they can't swollow the whole pill; or choose to provide the resource thats cheap for them to provide. In particular, if there is only two kinds of full nodes-- archive and pruned; then the archive nodes take both a huge disk and bandwidth cost; where as if there are fractional then archives take low(er) bandwidth unless the fractionals get DOS attacked. -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc. https://bitpay.com/ -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
Yet this holds true in our current assumptions of the network as well: that it will become a collection of pruned nodes with a few storage nodes. A hybrid option makes this better, because it spreads the risk, rather than concentrating it in full nodes. On May 12, 2015 3:38 PM, Jeff Garzik jgar...@bitpay.com wrote: One general problem is that security is weakened when an attacker can DoS a small part of the chain by DoS'ing a small number of nodes - yet the impact is a network-wide DoS because nobody can complete a sync. On Tue, May 12, 2015 at 12:24 PM, gabe appleton gapplet...@gmail.com wrote: 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie, give the signed (by sender) hash of the first and last block in your range. This is less data dense than the idea above, but it might work better. That said, this is likely a less secure way to do it. To improve upon that, a node could request a block of random height within that range and verify it, but that violates point 2. And the scheme in itself definitely violates point 7. On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote: It's a little frustrating to see this just repeated without even paying attention to the desirable characteristics from the prior discussions. Summarizing from memory: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. I've previously proposed schemes which come close but fail one of the above. (e.g. a scheme based on reservoir sampling that gives uniform selection of contiguous ranges, communicating only 64 bits of data to know what blocks a node claims to have, remaining totally uniform as the chain grows, without any need to refetch -- but needs O(height) work to figure out what blocks a peer has from the data it communicated.; or another scheme based on consistent hashes that has log(height) computation; but sometimes may result in a node needing to go refetch an old block range it previously didn't store-- creating re-balancing traffic.) So far something that meets all those criteria (and/or whatever ones I'm not remembering) has not been discovered; but I don't really think much time has been spent on it. I think its very likely possible. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight.
Re: [Bitcoin-development] Proposed additional options for pruned nodes
On Tue, May 12, 2015 at 6:16 PM, Peter Todd p...@petertodd.org wrote: Lots of people are tossing around ideas for partial archival nodes that would store a subset of blocks, such that collectively the whole blockchain would be available even if no one node had the entire chain. A compact way to describe which blocks are stored helps to mitigate against fingerprint attacks. It also means that a node could compactly indicate which blocks it stores with service bits. The node could pick two numbers W = window = a power of 2 P = position = random value less than W The node would store all blocks with a height of P mod W. The block hash could be used too. This has the nice feature that the node can throw away half of its data and still represent what is stored. W_new = W * 2 P_new = (random_bool()) ? P + W/2 : P; Half of the stored blocks would match P_new mod W_new and the other half could be deleted. This means that the store would use up between 50% and 100% of the allocated size. Another benefit is that it increases the probability that at least someone has every block. If N nodes each store 1% of the blocks, then the odds of a block being stored is pow(0.99, N). For 1000 nodes, that gives odds of 1 in 23,164 that a block will be missing. That means that around 13 out of 300,000 blocks would be missing. There would likely be more nodes than that, and also storage nodes, so it is not a major risk. If everyone is storing 1% of blocks, then they would set W to 128. As long as all of the 128 buckets is covered by some nodes, then all blocks are stored. With 1000 nodes, that gives odds of 0.6% that at least one bucket will be missed. That is better than around 13 blocks being missing. Nodes could inform peers of their W and P parameters on connection. The version message could be amended or a getparams message of some kind could be added. W could be encoded with 4 bits and P could be encoded with 16 bits, for 20 in total. W = 1 bits[19:16] and P = bits[14:0]. That gives a maximum W of 32768, which is likely to many bits for P. Initial download would be harder, since new nodes would have to connect to at least 100 different nodes. They could download from random nodes, and just download the ones they are missing from storage nodes. Even storage nodes could have a range of W values. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
It's a little frustrating to see this just repeated without even paying attention to the desirable characteristics from the prior discussions. Summarizing from memory: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. I've previously proposed schemes which come close but fail one of the above. (e.g. a scheme based on reservoir sampling that gives uniform selection of contiguous ranges, communicating only 64 bits of data to know what blocks a node claims to have, remaining totally uniform as the chain grows, without any need to refetch -- but needs O(height) work to figure out what blocks a peer has from the data it communicated.; or another scheme based on consistent hashes that has log(height) computation; but sometimes may result in a node needing to go refetch an old block range it previously didn't store-- creating re-balancing traffic.) So far something that meets all those criteria (and/or whatever ones I'm not remembering) has not been discovered; but I don't really think much time has been spent on it. I think its very likely possible. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
One general problem is that security is weakened when an attacker can DoS a small part of the chain by DoS'ing a small number of nodes - yet the impact is a network-wide DoS because nobody can complete a sync. On Tue, May 12, 2015 at 12:24 PM, gabe appleton gapplet...@gmail.com wrote: 0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie, give the signed (by sender) hash of the first and last block in your range. This is less data dense than the idea above, but it might work better. That said, this is likely a less secure way to do it. To improve upon that, a node could request a block of random height within that range and verify it, but that violates point 2. And the scheme in itself definitely violates point 7. On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote: It's a little frustrating to see this just repeated without even paying attention to the desirable characteristics from the prior discussions. Summarizing from memory: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. I've previously proposed schemes which come close but fail one of the above. (e.g. a scheme based on reservoir sampling that gives uniform selection of contiguous ranges, communicating only 64 bits of data to know what blocks a node claims to have, remaining totally uniform as the chain grows, without any need to refetch -- but needs O(height) work to figure out what blocks a peer has from the data it communicated.; or another scheme based on consistent hashes that has log(height) computation; but sometimes may result in a node needing to go refetch an old block range it previously didn't store-- creating re-balancing traffic.) So far something that meets all those criteria (and/or whatever ones I'm not remembering) has not been discovered; but I don't really think much time has been spent on it. I think its very likely possible. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc.
Re: [Bitcoin-development] Proposed additional options for pruned nodes
0, 1, 3, 4, 5, 6 can be solved by looking at chunks chronologically. Ie, give the signed (by sender) hash of the first and last block in your range. This is less data dense than the idea above, but it might work better. That said, this is likely a less secure way to do it. To improve upon that, a node could request a block of random height within that range and verify it, but that violates point 2. And the scheme in itself definitely violates point 7. On May 12, 2015 3:07 PM, Gregory Maxwell gmaxw...@gmail.com wrote: It's a little frustrating to see this just repeated without even paying attention to the desirable characteristics from the prior discussions. Summarizing from memory: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. I've previously proposed schemes which come close but fail one of the above. (e.g. a scheme based on reservoir sampling that gives uniform selection of contiguous ranges, communicating only 64 bits of data to know what blocks a node claims to have, remaining totally uniform as the chain grows, without any need to refetch -- but needs O(height) work to figure out what blocks a peer has from the data it communicated.; or another scheme based on consistent hashes that has log(height) computation; but sometimes may result in a node needing to go refetch an old block range it previously didn't store-- creating re-balancing traffic.) So far something that meets all those criteria (and/or whatever ones I'm not remembering) has not been discovered; but I don't really think much time has been spent on it. I think its very likely possible. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
I suppose this begs two questions: 1) why not have a partial archive store the most recent X% of the blockchain by default? 2) why not include some sort of torrent in QT, to mitigate this risk? I don't think this is necessarily a good idea, but I'd like to hear the reasoning. On May 12, 2015 4:11 PM, Jeff Garzik jgar...@bitpay.com wrote: True. Part of the issue rests on the block sync horizon/cliff. There is a value X which is the average number of blocks the 90th percentile of nodes need in order to sync. It is sufficient for the [semi-]pruned nodes to keep X blocks, after which nodes must fall back to archive nodes for older data. There is simply far, far more demand for recent blocks, and the demand for old blocks very rapidly falls off. There was even a more radical suggestion years ago - refuse to sync if too old (2 weeks?), and force the user to download ancient data via torrent. On Tue, May 12, 2015 at 1:02 PM, Gregory Maxwell gmaxw...@gmail.com wrote: On Tue, May 12, 2015 at 7:38 PM, Jeff Garzik jgar...@bitpay.com wrote: One general problem is that security is weakened when an attacker can DoS a small part of the chain by DoS'ing a small number of nodes - yet the impact is a network-wide DoS because nobody can complete a sync. It might be more interesting to think of that attack as a bandwidth exhaustion DOS attack on the archive nodes... if you can't get a copy without them, thats where you'll go. So the question arises: does the option make some nodes that would have been archive not be? Probably some-- but would it do so much that it would offset the gain of additional copies of the data when those attacks are not going no. I suspect not. It's also useful to give people incremental ways to participate even when they can't swollow the whole pill; or choose to provide the resource thats cheap for them to provide. In particular, if there is only two kinds of full nodes-- archive and pruned; then the archive nodes take both a huge disk and bandwidth cost; where as if there are fractional then archives take low(er) bandwidth unless the fractionals get DOS attacked. -- Jeff Garzik Bitcoin core developer and open source evangelist BitPay, Inc. https://bitpay.com/ -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
On Tue, May 12, 2015 at 8:10 PM, Jeff Garzik jgar...@bitpay.com wrote: True. Part of the issue rests on the block sync horizon/cliff. There is a value X which is the average number of blocks the 90th percentile of nodes need in order to sync. It is sufficient for the [semi-]pruned nodes to keep X blocks, after which nodes must fall back to archive nodes for older data. Prior discussion had things like the definition of pruned means you have and will serve at least the last 288 from your tip (which is what I put in the pruned service bip text); and another flag for I have at least the last 2016. (2016 should be reevaluated-- it was just a round number near where sipa's old data showed the fetch probability flatlined. But that data was old, but what it showed that the probability of a block being fetched vs depth looked like a exponential drop-off (I think with a 50% at 3-ish days); plus a constant low probability. Which is probably what we should have expected. There was even a more radical suggestion years ago - refuse to sync if too old (2 weeks?), and force the user to download ancient data via torrent. I'm not fond of this; it makes the system dependent on centralized services (e.g. trackers and sources of torrents). A torrent also cannot very efficiently handle fractional copies; cannot efficiently grow over time. Bitcoin should be complete-- plus, many nodes already have the data. -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell gmaxw...@gmail.com wrote: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. M = 1,000,000 N = number of starts S(0) = hash(seed) mod M ... S(n) = hash(S(n-1)) mod M This generates a sequence of start points. If the start point is less than the block height, then it counts as a hit. The node stores the 50MB of data starting at the block at height S(n). As the blockchain increases in size, new starts will be less than the block height. This means some other runs would be deleted. A weakness is that it is random with regards to block heights. Tiny blocks have the same priority as larger blocks. 0) Blocks are local, in 50MB runs 1) Agreed, nodes should download headers-first (or some other compact way of finding the highest POW chain) 2) M could be fixed, N and the seed are all that is required. The seed doesn't have to be that large. If 1% of the blockchain is stored, then 16 bits should be sufficient so that every block is covered by seeds. 3) N is likely to be less than 2 bytes and the seed can be 2 bytes 4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run. That is 10 hashes. They don't even necessarily need to be crypt hashes 5) Isn't this the same as 3? 6) Every block has the same odds of being included. There inherently needs to be an update when a node deletes some info due to exceeding its cap. N can be dropped one run at a time. 7) When new starts drop below the tip height, N can be decremented and that one run is deleted. There would need to be a special rule to ensure the low height blocks are covered. Nodes should keep the first 50MB of blocks with some probability (10%?) -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development
Re: [Bitcoin-development] Proposed additional options for pruned nodes
This is exactly the sort of solution I was hoping for. It seems this is the minimal modification to make it work, and, if someone was willing to work with me, I would love to help implement this. My only concern would be if the - - max-size flag is not included than this delivers significantly less benefit to the end user. Still a good chunk, but possibly not enough. On May 12, 2015 6:03 PM, Tier Nolan tier.no...@gmail.com wrote: On Tue, May 12, 2015 at 8:03 PM, Gregory Maxwell gmaxw...@gmail.com wrote: (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. (5) The communication about what blocks a node has should be compact. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. M = 1,000,000 N = number of starts S(0) = hash(seed) mod M ... S(n) = hash(S(n-1)) mod M This generates a sequence of start points. If the start point is less than the block height, then it counts as a hit. The node stores the 50MB of data starting at the block at height S(n). As the blockchain increases in size, new starts will be less than the block height. This means some other runs would be deleted. A weakness is that it is random with regards to block heights. Tiny blocks have the same priority as larger blocks. 0) Blocks are local, in 50MB runs 1) Agreed, nodes should download headers-first (or some other compact way of finding the highest POW chain) 2) M could be fixed, N and the seed are all that is required. The seed doesn't have to be that large. If 1% of the blockchain is stored, then 16 bits should be sufficient so that every block is covered by seeds. 3) N is likely to be less than 2 bytes and the seed can be 2 bytes 4) A 1% cover of 50GB of blockchain would have 10 starts @ 50MB per run. That is 10 hashes. They don't even necessarily need to be crypt hashes 5) Isn't this the same as 3? 6) Every block has the same odds of being included. There inherently needs to be an update when a node deletes some info due to exceeding its cap. N can be dropped one run at a time. 7) When new starts drop below the tip height, N can be decremented and that one run is deleted. There would need to be a special rule to ensure the low height blocks are covered. Nodes should keep the first 50MB of blocks with some probability (10%?) -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight. http://ad.doubleclick.net/ddm/clk/290420510;117567292;y ___ Bitcoin-development mailing list Bitcoin-development@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bitcoin-development -- One dashboard for servers and applications across Physical-Virtual-Cloud Widest out-of-the-box monitoring support with 50+ applications Performance metrics, stats and reports that give you Actionable Insights Deep dive visibility with transaction tracing using APM Insight.
Re: [Bitcoin-development] Proposed additional options for pruned nodes
Hi all! On 2015-05-12 21:03, Gregory Maxwell wrote: Summarizing from memory: In the context of this discussion, let me also restate an idea I've proposed in Bitcointalk for this. It is probably not perfect and could surely be adapted (I'm interested in that), but I think it meets most/all of the criteria stated below. It is similar to the idea with start points, but gives O(log height) instead of O(height) for determining which blocks a node has. Let me for simplicity assume that the node wants to store 50% of all blocks. It is straight-forward to extend the scheme so that this is configurable: 1) Create some kind of seed that can be compact and will be sent to other peers to define which blocks the node has. Use it to initialise a PRNG of some sort. 2) Divide the range of all blocks into intervals with exponentially growing size. I. e., something like this: 1, 1, 2, 2, 4, 4, 8, 8, 16, 16, ... With this, only O(log height) intervals are necessary to cover height blocks. 3) Using the PRNG, *one* of the two intervals of each length is selected. The node stores these blocks and discards the others. (Possibly keeping the last 200 or 2,016 or whatever blocks additionally.) (0) Block coverage should have locality; historical blocks are (almost) always needed in contiguous ranges. Having random peers with totally random blocks would be horrific for performance; as you'd have to hunt down a working peer and make a connection for each block with high probability. You get contiguous block ranges (with at most O(log height) breaks). Also ranges of newer blocks are longer, which may be an advantage if those blocks are needed more often. (1) Block storage on nodes with a fraction of the history should not depend on believing random peers; because listening to peers can easily create attacks (e.g. someone could break the network; by convincing nodes to become unbalanced) and not useful-- it's not like the blockchain is substantially different for anyone; if you're to the point of needing to know coverage to fill then something is wrong. Gaps would be handled by archive nodes, so there is no reason to increase vulnerability by doing anything but behaving uniformly. With my proposal, each node determines randomly and on its own which blocks to store. No believing anyone. (2) The decision to contact a node should need O(1) communications, not just because of the delay of chasing around just to find who has someone; but because that chasing process usually makes the process _highly_ sybil vulnerable. Not exactly sure what you mean by that, but I think that's fulfilled. You can (locally) compute in O(log height) from a node's seed whether or not it has the blocks you need. This needs only communication about the node's seed. (3) The expression of what blocks a node has should be compact (e.g. not a dense list of blocks) so it can be rumored efficiently. See above. (4) Figuring out what block (ranges) a peer has given should be computationally efficient. O(log height). Not O(1), but that's probably not a big issue. (5) The communication about what blocks a node has should be compact. See above. (6) The coverage created by the network should be uniform, and should remain uniform as the blockchain grows; ideally it you shouldn't need to update your state to know what blocks a peer will store in the future, assuming that it doesn't change the amount of data its planning to use. (What Tier Nolan proposes sounds like it fails this point) Coverage will be uniform if the seed is created randomly and the PRNG has good properties. No need to update the seed if the other node's fraction is unchanged. (Not sure if you suggest for nodes to define a fraction or rather an absolute size.) (7) Growth of the blockchain shouldn't cause much (or any) need to refetch old blocks. No need to do that with the scheme. What do you think about this idea? Some random thoughts from myself: *) I need to formulate it in a more general way so that the fraction can be arbitrary and not just 50%. This should be easy to do, and I can do it if there's interest. *) It is O(log height) and not O(1), but that should not be too different for the heights that are relevant. *) Maybe it would be better / easier to not use the PRNG at all; just decide to *always* use the first or the second interval with a given size. Not sure about that. *) With the proposed scheme, the node's actual fraction of stored blocks will vary between 1/2 and 2/3 (if I got the mathematics right, it is still early) as the blocks come in. Not sure if that's a problem. I can do a precise analysis of this property for an extended scheme if you are interested in it. Yours, Daniel -- http://www.domob.eu/ OpenPGP: 1142 850E 6DFF 65BA 63D6 88A8 B249 2AC4 A733 0737 Namecoin: id/domob - https://nameid.org/?name=domob -- Done: Arc-Bar-Cav-Hea-Kni-Ran-Rog-Sam-Tou-Val-Wiz To go: Mon-Pri signature.asc Description: