Re: [pypy-dev] vmprof compression

Stuart Axon Fri, 27 Mar 2015 04:45:13 -0700

Hi,   It's worth adding lzop to the list, of compressors to test, as it's built 
specifically to have a low CPU overhead, at the cost of some compression ratio.


http://www.lzop.org/
 S++ 


     On Thursday, March 26, 2015 11:29 PM, John Camara 
<john.m.cam...@gmail.com> wrote:
   
 

 Hi Fijal,
To recap and continue the discussion from irc.
We already discussed that the stack id are based on a counter which is good but 
I also want to confirm that the ids have locality associated with the code.  
That is similar areas of the code will have similar ids.  Just to make sure are 
not random with respect to the code otherwise compression will not be helpful.  
If the ids are random that would need to be corrected first.
Right now the stack traces are written to the file repeating the following 
sequence
MARKER_STACKTRACE
countdepthstackstack...stack
In order to get a high compression ratio it would be better to combine multiple 
stacktraces and rearrange the data as follows
MARKER_COMPRESSED_STACKTRACEScounts_compressed_lengthcounts_compresseddepths_compressed_lengthdepths_compressedstacks_compressed_lengthstacks_compressed
In order to build the compress data you will want to 3 pairs of 2 buffers.  A 
pair of buffers for counts, depths, and stacks.  Your profiller would be 
writing to one set of buffers and another thread would be responsible for 
compressing buffers that are full and writing them to the file.  Once a set of 
buffers are full the profiller would start filling up the other set of buffers.
For each set of buffers you need a variable to hold the previous count, depth, 
and stack id.  They will be initialized to 0 before any data is written to an 
empty buffer.  In stead of writing the actual count value into the counts 
buffer you will write the difference between the current count and the previous 
count.  The reason for doing this is that the delta values will mostly be 
around 0 which will significantly improve the compression ratio without adding 
much overhead.  Of course you would do the same for depths and stack ids.
When you compress the data you compress each buffer individually to make sure 
like data is being compressed.  Like data compresses better the unlike data and 
by saving deltas very few bits will be required to represent the data and you 
are likely to have long strings of 0s and 1s.
I'm sure now you can see why I don't want stack ids being random.  As if they 
are random then the deltas will be all over the place so you wont end up with 
long strings of 0s and 1s and random data itself does not compress.
To test this out I wouldn't bother modifying the c code but instead try it out 
in Python to first make sure the compression is providing huge gains and figure 
out how to tune the algorithm without having to mess with the signal handlers 
and writing the code for the separate thread and dealing issues such as making 
sure you don't start writing to a buffer before the thread finished writing the 
data to the file, etc.  I would just read an existing profile file and rewrite 
it to a different file by rearranging the data and compressing the delta as I 
described.  You can get away with one set of buffers as you wouldn't be 
profiling at the same time.
To tune this process you will need to determine the appropriate number of stack 
traces that is small enough to keep memory down but large enough so that the 
overhead associated with compression small.  Maybe start of with about 8000 
stack traces.  I would try gzip, bz2, and lzma and look at their compression 
ratios and times.  Gzip is general faster than bz2 and lzma is the slowest.  On 
the other hand lzma provides the best compression and gzip the worse.  Since 
you will be compressing deltas you most likely can get away with using the 
fastest compression options under each compressor and not effect the 
compression ratio.  But I would test it to verify this as it does depend on the 
data being compressed whether or not this is true.  Also one option that is 
available in lzma is the ability to set the width of the data to look at when 
looking for patterns.  Since you are saving 32 or 64 bit ints depending on the 
platform you can set the option to either 4 or 8 bytes based on the platform.  
I don't believe qzip or bz2 have this option.  By setting this option in lzma 
you will likely improve the compression ratio.
You may find that counts and depths give similar compression, between the 3 
compression types in which case just use the fastest which will likely be gzip. 
 On the other hand maybe the stack ids will be better off using lzma.  This is 
also another reason to separate out, like data, as it gives you an option to 
use the fastest compressors for some data types while using others to provide 
for better compression.
I would not be surprised if this approach achieves a compression ratio better 
than 100x but that will be heavily dependent on how local the stack ids are.  
Also don't forget about simple things like not using 64 bit ints when you can 
get away with smaller ones.
Also for a slight variation to the above.  If you find most of your deltas are 
< 127 you could write them out as 1 byte and when greater than 127 write them 
out as a 4 byte int with the high bit set.  If you do this then don't set the 
lzma option to 4 or 8 byte boundaries as now your data is a mixture of 1 and 4 
byte values.  This sometimes can provide huge reductions in compression times 
without much effect on the overall compression ratio.
Hope you find this helpful.
John
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
https://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] vmprof compression

Reply via email to