Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-11 Thread Peter Tschipper via bitcoin-dev
Here are the latest results on compression ratios for the first 295,000
blocks, compressionlevel=6.  I think there are more than enough
datapoints for statistical significance. 

Results are very much similar to the previous test.   I'll work on
getting a comparison between how much time savings/loss in time there is
when syncing the blockchains: compressed vs uncompressed.  Still, I
think it's clear that serving up compressed blocks, at least historical
blocks, will be of benefit for those that have bandwidth caps on their
internet connections.

The proposal, so far is fairly simple:
1) compress blocks with some compression library: currently zlib but I
can investigate other possiblities
2) As a fall back we need to advertise compression as a service.  That
way we can turn off compression AND decompression completely if needed.
3) Do the compression at the datastream level in the code.  CDataStream
is the obvious place.


Test Results:

range = block size range
ubytes = average size of uncompressed blocks
cbytes = average size of compressed blocks
ctime = average time to compress
dtime = average time to decompress
cmp_ratio% = compression ratio
datapoints = number of datapoints taken

range   ubytescbytesctimedtimecmp_ratio%datapoints
0-250b  2151890.0010.00012.40 91280
250-500b4384040.0010.0007.85 13217
500-1KB 7617010.0010.0007.86   11434
1KB-10KB414935470.0010.000  14.51 52180
10KB-100KB  41934326040.0050.00122.25 82890
100KB-200KB 1463031080800.0160.00126.1329886
200KB-300KB 2432991792810.0250.00226.3125066
300KB-400KB 3446362661770.0360.00322.774956
400KB-500KB 4632013568620.0460.00422.963167
500KB-600KB 5451234298540.0560.00521.15366
600KB-700KB 6477365109310.0650.00621.12254
700KB-800KB 7465405872870.0730.00821.33294
800KB-900KB 8681216826500.0870.00821.36199
900KB-1MB   9457477263070.0910.01023.20304

On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further
> wrapping should probably be done at the stream level.
>
> 2) zlib has crappy security track record.
>
> 3) A fallback path to non-compressed is required, should compression
> fail or crash.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly
> common bit-patterns, which contributes to useful compression even at
> smaller sizes.  Peter Ts's most recent numbers bear this out.  zlib
> has a dictionary (32K?) which works well with repeated patterns such
> as those you see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
>  > wrote:
>
>
>
> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
> > wrote:
>
> There are better ways of sending new blocks, that's certainly
> true but for sending historical blocks and seding transactions
> I don't think so.  This PR is really designed to save
> bandwidth and not intended to be a huge performance
> improvement in terms of time spent sending.
>
>
> If the main point is for historical data, then sticking to just
> blocks is the best plan.
>
> Since small blocks don't compress well, you could define a
> "cblocks" message that handles multiple blocks (just concatenate
> the block messages as payload before compression). 
>
> The sending peer could combine blocks so that each cblock is
> compressing at least 10kB of block data (or whatever is optimal). 
> It is probably worth specifying a maximum size for network buffer
> reasons (either 1MB or 1 block maximum).
>
> Similarly, transactions could be combined together and compressed
> "ctxs".  The inv messages could be modified so that you can
> request groups of 10-20 transactions.  That would depend on how
> much of an improvement compressed transactions would represent.
>
> More generally, you could define a message which is a compressed
> message holder.  That is probably to complex to be worth the
> effort though.
>
>  
>
>>
>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>> bitcoin-dev > > wrote:
>>
>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>> 

Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-11 Thread Peter Tschipper via bitcoin-dev
If that were true then we wouldn't need to gzip large files before
sending them over the internet.  Data compression generally helps
transmission speed as long as the amount of compression is high enough
and the time it takes is low enough to make it worthwhile.  On a
corporate LAN it's generally not worthwhile unless you're dealing with
very large files, but over a corporate WAN or the internet where network
latency can be high it is IMO a worthwhile endevor.



On 11/11/2015 10:49 AM, Marco Pontello wrote:
> A random thought: aren't most communication over a data link already
> compressed, at some point?
> When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL
> connections using PPPoE, surely are. And so on.
> I'm not sure another level of generic, data agnostic kind of
> compression will really give us some real-life practical advantage
> over that.
>
> Something that could take advantage of of special knowledge of the
> specific data, instead, would be an entirely different matter.
>
> Just my 2c.
>
> On Wed, Nov 11, 2015 at 7:35 PM, Peter Tschipper via bitcoin-dev
>  > wrote:
>
> Here are the latest results on compression ratios for the first
> 295,000 blocks, compressionlevel=6.  I think there are more than
> enough datapoints for statistical significance. 
>
> Results are very much similar to the previous test.   I'll work on
> getting a comparison between how much time savings/loss in time
> there is when syncing the blockchains: compressed vs
> uncompressed.  Still, I think it's clear that serving up
> compressed blocks, at least historical blocks, will be of benefit
> for those that have bandwidth caps on their internet connections.
>
> The proposal, so far is fairly simple:
> 1) compress blocks with some compression library: currently zlib
> but I can investigate other possiblities
> 2) As a fall back we need to advertise compression as a service. 
> That way we can turn off compression AND decompression completely
> if needed.
> 3) Do the compression at the datastream level in the code. 
> CDataStream is the obvious place.
>
>
> Test Results:
>
> range = block size range
> ubytes = average size of uncompressed blocks
> cbytes = average size of compressed blocks
> ctime = average time to compress
> dtime = average time to decompress
> cmp_ratio% = compression ratio
> datapoints = number of datapoints taken
>
> range   ubytescbytesctimedtimecmp_ratio%   
> datapoints
> 0-250b  2151890.0010.00012.40
> 91280
> 250-500b4384040.0010.0007.85  
>   13217
> 500-1KB 7617010.0010.000   
> 7.86   11434
> 1KB-10KB414935470.0010.000  14.51
> 52180
> 10KB-100KB  41934326040.0050.00122.25 82890
> 100KB-200KB 1463031080800.0160.00126.1329886
> 200KB-300KB 2432991792810.0250.00226.3125066
> 300KB-400KB 3446362661770.0360.00322.774956
> 400KB-500KB 4632013568620.0460.00422.963167
> 500KB-600KB 5451234298540.0560.00521.15366
> 600KB-700KB 6477365109310.0650.00621.12254
> 700KB-800KB 7465405872870.0730.00821.33294
> 800KB-900KB 8681216826500.0870.00821.36199
> 900KB-1MB   9457477263070.0910.01023.20304
>
> On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
>> Comments:
>>
>> 1) cblock seems a reasonable way to extend the protocol.  Further
>> wrapping should probably be done at the stream level.
>>
>> 2) zlib has crappy security track record.
>>
>> 3) A fallback path to non-compressed is required, should
>> compression fail or crash.
>>
>> 4) Most blocks and transactions have runs of zeroes and/or highly
>> common bit-patterns, which contributes to useful compression even
>> at smaller sizes.  Peter Ts's most recent numbers bear this out.
>>  zlib has a dictionary (32K?) which works well with repeated
>> patterns such as those you see with concatenated runs of
>> transactions.
>>
>> 5) LZO should provide much better compression, at a cost of CPU
>> performance and using a less-reviewed, less-field-tested library.
>>
>>
>>
>>
>>
>> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
>> > > wrote:
>>
>>
>>
>> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
>> > > wrote:
>>
>> There are better ways of sending 

Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-11 Thread Jonathan Toomim via bitcoin-dev
Data compression adds latency and reduces predictability, so engineers have 
decided to leave compression to application layers instead of transport layer 
or lower in order to let the application designer decide what tradeoffs to make.

On Nov 11, 2015, at 10:49 AM, Marco Pontello via bitcoin-dev 
 wrote:

> A random thought: aren't most communication over a data link already 
> compressed, at some point?
> When I used a modem, we had the V.42bis protocol. Now, nearly all ADSL 
> connections using PPPoE, surely are. And so on.
> I'm not sure another level of generic, data agnostic kind of compression will 
> really give us some real-life practical advantage over that.
> 
> Something that could take advantage of of special knowledge of the specific 
> data, instead, would be an entirely different matter.
> 
> Just my 2c.



signature.asc
Description: Message signed with OpenPGP using GPGMail
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-10 Thread Peter Tschipper via bitcoin-dev
On 10/11/2015 8:11 AM, Peter Tschipper wrote:
> On 10/11/2015 1:44 AM, Tier Nolan via bitcoin-dev wrote:
>> The network protocol is not quite consensus critical, but it is
>> important.
>>
>> Two implementations of the decompressor might not be bug for bug
>> compatible.  This (potentially) means that a block could be designed
>> that won't decode properly for some version of the client but would
>> work for another.  This would fork the network.
>>
>> A "raw" network library is unlikely to have the same problem.
>>
>> Rather than just compress the stream, you could compress only block
>> messages only.  A new "cblock" message could be created that is a
>> compressed block.  This shouldn't reduce efficiency by much.
>>
> I chose the more generic datastream compression so we could in the
> future apply to possibly to transactions but currently all that is
> planned, is to compress blocks, and that was really my only original
> intent until I saw that there might be some bandwidth savings for
> transactions as well. 
>
> The compression  however could be applied to any datastream but is not
> *forced* .  Basically it would just be a method call in CDatastream so
> we could do ss.compress and ss.decompress and apply that to blocks and
> possibly transactions if worthwhile and only IF compression is turned
> on.  But there is no intend to apply this to every type of message
> since most would be too small to benefit from compression.
>
> Here are some results of using the code in the PR to
> compress/decompress blocks using zlib compression level = 6.  This
> data was taken from the first 275K blocks in the mainnet blockchain. 
> Clearly once we get past 10KB we get pretty decent compression but
> even below that there is some benefit.  I'm still collecting data and
> will get the same for the whole blockchain.
>
> range = block size range
> ubytes = average size of uncompressed blocks
> cbytes = average size of compressed blocks
> ctime = average time to compress
> dtime = average time to decompress
> cmp_ratio% = compression ratio
> datapoints = number of datapoints taken
>
> range   ubytescbytesctimedtimecmp_ratio%datapoints
> 0-250b  215 1890.0010.00012.4179498
> 250-500b440 4050.0010.0007.8211903
> 500-1KB 762 7020.0010.0007.8310448
> 1KB-10KB416635610.0010.00014.5150572
> 10KB-100KB  40820315970.0050.00122.597
> 100KB-200KB 1462381063200.0150.00127.3025024
> 200KB-300KB 2429131754820.0250.00227.7620450
> 300KB-400KB 3434302517600.0340.00326.692069
> 400KB-500KB 4574483434950.0450.00424.911889
> 500KB-600KB 5407364242550.0560.00721.5490
> 600KB-700KB 6478515068880.0630.00721.7659
> 700KB-800KB 7495135865510.0730.00721.7448
> 800KB-900KB 8594396521660.0860.00824.1239
> 900KB-1MB   9523337251910.0890.00923.8578
>
>> If a client fails to decode a cblock, then it can ask for the block
>> to be re-sent as a standard "block" message. 
> interesting idea.
>>
>> This means that it is a pure performance improvement.  If problems
>> occur, then the client can just switch back to uncompressed mode for
>> that block.
>>
>> You should look into the block relay system.  This gives a larger
>> improvement than simply compressing the stream.  The main benefit is
>> latency but it means that actual blocks don't have to be sent, so
>> gives a potential 50% compression ratio.  Normally, a node receives
>> all the transactions and then those transactions are included later
>> in the block.
>>
> There are better ways of sending new blocks, that's certainly true but
> for sending historical blocks and seding transactions I don't think
> so.  This PR is really designed to save bandwidth and not intended to
> be a huge performance improvement in terms of time spent sending.
>>
>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via bitcoin-dev
>>  wrote:
>>
>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>> > > wrote:
>>  
>>
>> I think 25% bandwidth savings is certainly considerable,
>> especially for people running full nodes in countries like
>> Australia where internet bandwidth is lower and there are
>> data caps.
>>
>>
>> ​This reinforces the idea that such trade-off decisions should be
>> be local and negotiated between peers, not a required feature of
>> the network P2P.​
>>  
>>
>> -- 
>> Johnathan Corgan
>> Corgan Labs - SDR Training and Development Services
>> 

Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-10 Thread Peter Tschipper via bitcoin-dev
On 10/11/2015 8:46 AM, Jeff Garzik via bitcoin-dev wrote:
> Comments:
>
> 1) cblock seems a reasonable way to extend the protocol.  Further
> wrapping should probably be done at the stream level.
agreed.
>
> 2) zlib has crappy security track record.
>
Zlib had a bad buffer overflow bug but that was in 2005 and it got a lot
of press at the time.  It's was fixed in version 1.2.3...we're on 1.2.8
now.  I'm not aware of any other current issues with zlib. Do you have a
citation?

> 3) A fallback path to non-compressed is required, should compression
> fail or crash.
agreed.
>
> 4) Most blocks and transactions have runs of zeroes and/or highly
> common bit-patterns, which contributes to useful compression even at
> smaller sizes.  Peter Ts's most recent numbers bear this out.  zlib
> has a dictionary (32K?) which works well with repeated patterns such
> as those you see with concatenated runs of transactions.
>
> 5) LZO should provide much better compression, at a cost of CPU
> performance and using a less-reviewed, less-field-tested library.
I don't think LZO will give as good compression here but I will do some
benchmarking when I can.

>
>
>
>
>
> On Tue, Nov 10, 2015 at 11:30 AM, Tier Nolan via bitcoin-dev
>  > wrote:
>
>
>
> On Tue, Nov 10, 2015 at 4:11 PM, Peter Tschipper
> > wrote:
>
> There are better ways of sending new blocks, that's certainly
> true but for sending historical blocks and seding transactions
> I don't think so.  This PR is really designed to save
> bandwidth and not intended to be a huge performance
> improvement in terms of time spent sending.
>
>
> If the main point is for historical data, then sticking to just
> blocks is the best plan.
>
> Since small blocks don't compress well, you could define a
> "cblocks" message that handles multiple blocks (just concatenate
> the block messages as payload before compression). 
>
> The sending peer could combine blocks so that each cblock is
> compressing at least 10kB of block data (or whatever is optimal). 
> It is probably worth specifying a maximum size for network buffer
> reasons (either 1MB or 1 block maximum).
>
> Similarly, transactions could be combined together and compressed
> "ctxs".  The inv messages could be modified so that you can
> request groups of 10-20 transactions.  That would depend on how
> much of an improvement compressed transactions would represent.
>
> More generally, you could define a message which is a compressed
> message holder.  That is probably to complex to be worth the
> effort though.
>
>  
>
>>
>> On Tue, Nov 10, 2015 at 5:40 AM, Johnathan Corgan via
>> bitcoin-dev > > wrote:
>>
>> On Mon, Nov 9, 2015 at 5:58 PM, gladoscc via bitcoin-dev
>> > > wrote:
>>  
>>
>> I think 25% bandwidth savings is certainly
>> considerable, especially for people running full
>> nodes in countries like Australia where internet
>> bandwidth is lower and there are data caps.
>>
>>
>> ​ This reinforces the idea that such trade-off decisions
>> should be be local and negotiated between peers, not a
>> required feature of the network P2P.​
>>  
>>
>> -- 
>> Johnathan Corgan
>> Corgan Labs - SDR Training and Development Services
>> http://corganlabs.com
>>
>> ___
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> 
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>>
>>
>>
>>
>> ___
>> bitcoin-dev mailing list
>> bitcoin-dev@lists.linuxfoundation.org
>> 
>> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> 
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
>
>
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev

___
bitcoin-dev mailing list

Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-09 Thread gladoscc via bitcoin-dev
I think 25% bandwidth savings is certainly considerable, especially for
people running full nodes in countries like Australia where internet
bandwidth is lower and there are data caps.

I absolutely would not dismiss 25% compression. gzip and bzip2 compression
is relatively standard, and I'd consider the point of implementation
complexity tradeoff to be somewhere along 5-10%.

On Tue, Nov 10, 2015 at 8:04 AM, Bob McElrath via bitcoin-dev <
bitcoin-dev@lists.linuxfoundation.org> wrote:

> I would expect that since a block contains mostly hashes and crypto
> signatures,
> it would be almost totally incompressible.  I just calculated compression
> ratios:
>
> zlib-15%(file is LARGER)
> gzip 28%
> bzip225%
>
> So zlib compression is right out.  How much is ~25% bandwidth savings
> worth to
> people?  This seems not worth it to me.  :-/
>
> Peter Tschipper via bitcoin-dev [bitcoin-dev@lists.linuxfoundation.org]
> wrote:
> > This is my first time through this process so please bear with me.
> >
> > I opened a PR #6973 this morning for Zlib Block Compression for block
> > relay and at the request of @sipa  this should have a BIP associated
> > with it.   The idea is simple, to compress the datastream before
> > sending, initially for blocks only but it could theoretically be done
> > for transactions as well.  Initial results show an average of 20% block
> > compression and taking 90 milliseconds for a full block (on a very slow
> > laptop) to compress.  The savings will be mostly in terms of less
> > bandwidth used, but I would expect there to be a small performance gain
> > during the transmission of the blocks particularly where network latency
> > is higher.
> >
> > I think the BIP title, if accepted should be the more generic, "Support
> > for Datastream Compression"  rather than the PR title of "Zlib
> > Compression for block relay" since it could also be used for
> > transactions as well at a later time.
> >
> > Thanks for your time...
> > ___
> > bitcoin-dev mailing list
> > bitcoin-dev@lists.linuxfoundation.org
> > https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> >
> >
> > !DSPAM:5640ff47206804314022622!
> --
> Cheers, Bob McElrath
>
> "For every complex problem, there is a solution that is simple, neat, and
> wrong."
> -- H. L. Mencken
>
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
>
___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev


Re: [bitcoin-dev] request BIP number for: "Support for Datastream Compression"

2015-11-09 Thread Bob McElrath via bitcoin-dev
I would expect that since a block contains mostly hashes and crypto signatures,
it would be almost totally incompressible.  I just calculated compression 
ratios:

zlib-15%(file is LARGER)
gzip 28%
bzip225%

So zlib compression is right out.  How much is ~25% bandwidth savings worth to
people?  This seems not worth it to me.  :-/

Peter Tschipper via bitcoin-dev [bitcoin-dev@lists.linuxfoundation.org] wrote:
> This is my first time through this process so please bear with me. 
> 
> I opened a PR #6973 this morning for Zlib Block Compression for block
> relay and at the request of @sipa  this should have a BIP associated
> with it.   The idea is simple, to compress the datastream before
> sending, initially for blocks only but it could theoretically be done
> for transactions as well.  Initial results show an average of 20% block
> compression and taking 90 milliseconds for a full block (on a very slow
> laptop) to compress.  The savings will be mostly in terms of less
> bandwidth used, but I would expect there to be a small performance gain
> during the transmission of the blocks particularly where network latency
> is higher. 
> 
> I think the BIP title, if accepted should be the more generic, "Support
> for Datastream Compression"  rather than the PR title of "Zlib
> Compression for block relay" since it could also be used for
> transactions as well at a later time.
> 
> Thanks for your time...
> ___
> bitcoin-dev mailing list
> bitcoin-dev@lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev
> 
> 
> !DSPAM:5640ff47206804314022622!
--
Cheers, Bob McElrath

"For every complex problem, there is a solution that is simple, neat, and 
wrong."
-- H. L. Mencken 

___
bitcoin-dev mailing list
bitcoin-dev@lists.linuxfoundation.org
https://lists.linuxfoundation.org/mailman/listinfo/bitcoin-dev