Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes

2017-04-21 Thread Gregory Maxwell via bitcoin-dev
On Mon, Apr 17, 2017 at 6:54 AM, David Vorick via bitcoin-dev
 wrote:
> Rationale:
>
> A node that stores the full blockchain (I will use the term archival node)
> requires over 100GB of disk space, which I believe is one of the most
> significant barriers to more people running full nodes. And I believe the
> ecosystem would benefit substantially if more users were running full nodes.

Hi David,

I've been thinking about this subject for a long time (Tier Nolan
linked some of the threads; see also the coloration part of
https://en.bitcoin.it/wiki/User:Gmaxwell/block_network_coding) and
about using FEC with the history for the last year or so. (we use FEC
in practice in fibre for relay at the tip now, and that has a design
history going back to 2013).

As Jonas points out, we're all set to also offer the 'recent blocks'
separately, so that is obviously going to happen and will help. The
free design parameter is the definition of recent, but we have good
measurements for setting that now. But about history...

I think I might have designed myself into a corner and perhaps you've
shown a way out-- I think there are some serious limits in your
proposal but perhaps we can fix them.  Let me give you what I had been
thinking about, then hand you at least a couple improvements to your
thinking:

As you hopefully now know (per Tier Nolan's) post: I'd previously been
thinking about nodes keeping a deterministic random, independently
selected subset which is self-leveling based on a small seed.   The
connection overhead can can be mitigated by working with chunks of
blocks rather than single blocks.

But as you've observed, the failure probabilities are rather high,
especially if an active attacker targets nodes carrying less commonly
available blocks.  So I thought, okay we can use FEC to help improve
the recovery odds.

So I considered using a sparse random code (E.g. an LDPC erasure code
like in RFC 5170) the advantage of these codes is that they are very
fast to decode, and they support having enormous codewords, so you can
a high probability of every peer having a unique ID, so there would
effectively never need to be any duplication.

But then I ran into the problem that I had no reasonable way to
recover from bad data, short of pulling a single group from an archive
then banning whatever peers gave you bad chunks.

So that is kind of where I got stuck.

I didn't even consider the advantage of only being able to use a few
peers total, as I still assumed that it would be doing the random
groups thing as well. That's a great point.

So on your design:

Being able to have a lower bound of 20% seems like a serious
limitation to me: it would be fine today, but the chain will quite
possibly be twice the current size in a year and in four years
we're back to where we are now. What I'd been thinking would be able
to scale arbitrarily low.

20% is small, but is it small enough? -- today that would get us back
into common SSDs being able to hold the whole chain, but that property
will probably be lost again in a couple years. I think with any fixed
fraction we'll constantly be fighting against the opportunity cost of
upgrading storage, if not the cost of the storage itself. (and I agree
this is the vast majority of the response from actual users,  sync
time and ongoing bandwidth usage seeming to tie for second; the latter
of which can be mitigated in other ways but for the former see my
remarks at the end)

The fact that having only a few required blocks needed lets you brute
force out the decode is a great point.  I hadn't considered that, and
the schemed I'd been considering are not communications efficient with
only a few blocks required e.g. they sometimes require a couple extra
blocks to successfully decode, which is a lot of overhead if you're
only splitting 5 ways.

> I also believe that alternative erasure coding schemes exist which actually
> are able to identify the bad pieces given sufficient good pieces, however I
> don't know if they have the same computational performance as the best
> Reed-Solomon coding implementations.

Actually RS codes are _the_ codes you want to use for with decoding
with errors but the 'computationally efficient' error correcting
decoding (which is itself heinously slow) requires 2*errors + data
number of blocks to decode. Not so useful if you're only looking at 5
parts.

RS decoding is pretty slow generally, esp compared to binary sparse
codes.  We were unable to make RS coding make sense for use in fast
block relay for this reason-- decoding time bottlenecked
reconstruction. The most highly optimized RS codes make a special
optimization which is not useful for your proposal: They're much
faster to decode if you're decoding from the first couple correction
blocks.  Without these optimizations speeds from from 1GB/s to more
like 100MB/s or worse.  Though I suppose with assumevalid initial sync
now is pretty cpu-idle on most hardware.  (If 

Re: [bitcoin-dev] Small Nodes: A Better Alternative to Pruned Nodes

2017-04-21 Thread Leandro Coutinho via bitcoin-dev
Maybe it already exists ...

#9484  812714f
 Introduce assumevalid
setting to skip validation presumed valid scripts (gmaxwell)
https://github.com/bitcoin/bitcoin/pull/9484

..., but ...
It would be very interesting if a new node could decide to be a pruned node:
  - it would need to trust one or more peers for the initial blockchain
download, because the blocks downloaded would not be validated
  - it would decide a time from when to get the blocks, like a week before
  - once a day a routine would run that would prune blocks older than the
chosen time

"

*The unspent transaction outputs (which is the only essential piece ofdata
necessary for validation) are already kept in a separate database,so
technically removing old blocks is perfectly possible.*" Pieter Wuille
https://bitcoin.stackexchange.com/questions/11170/why-is-pruning-not-considered-already-at-the-moment


On Fri, Apr 21, 2017 at 10:35 AM, David Kaufman via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> Hi Danny,
>
> On Mon, Apr 17, 2017 at 3:11 AM, Danny Thorpe wrote:
> >
> > 1TB HDD is now available for under $40 USD.  How is the 100GB storage
> > requirement preventing anyone from setting up full nodes?
>
> Yeah, but that's because most people (well, using myself as the
> "target market" anyway) are upgrading to SSD's for the faster boot and
> response times.  Modern consumer OS's run incredibly slow on
> non-ssd drives!  And since the vast majority of consumer laptops sold
> today fall into the $400 to $700 range, a 200 - 500gb SSD is about the
> most storage upgrade people can afford.
>
> And so I think David's premise, that having to devote only 30GB to
> running a full node instead of 100, would remove a major obstacle that
> prevents many more people running full bitcoin nodes.
>
> My only suggestion is, does it scale?  I mean, if the bitcoin network
> volume grows exponentially and in 2 years the blockchain is 500GB, can
> the "small node" be adjusted down from one fifth of the blockchain to
> just one-tenth, or one twentieth?  Can different smalInesses
> interoperate? Can I choose to store a small node with 20 - 30% of the
> blockchain, while others chose to share just 5% or 10% of it? Can I run
> "less small" node today that's 50GB?
>
> Can the default install be a "small node" that requires about 30GB of
> storage (if that is indeed the sweet spot for enticing many more users to
> bringing nodes online), but allow the user at install time, to choose *how*
> small? To, say, drag a slider anywhere up and down the range from
> 10GB to 100GB?
>
> If not, then it will have to be revisited constantly as the blockchain
> grows, and disk storage prices drop.  I suspect the blockchain will
> grow in size, at some point in the not too distant future, much faster
> than storage prices drop, so making small, smaller and smallest nodes
> that can be configured to store more or less of it will be necessary
> to motivate most users to run nodes at all.  But when that happens,
> there is likely to be exponentially *more* people using bitcoin, too!
> So an exponentially growing number of users running (smaller and
> smaller) nodes would take up the slack.
>
> Then, the blockchain would begin to look a lot more like a bittorrent,
> right? ;-) but -- happily -- one that you never need to download fully.
>
> -dave
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev