Re: [PATCH V2] Implemented almost-in-place kernel decompression

Waldek Kozaczuk Sat, 29 Sep 2018 11:59:22 -0700


On Wednesday, September 26, 2018 at 8:21:44 AM UTC-4, Nadav Har'El wrote:
>
> Thanks. Looks good to me - I only have a few minor comment and whitespace 
> comments below.
> You have my blessing to commit the next version yourself :-)
>
>
> On Sun, Sep 9, 2018 at 8:06 AM Waldemar Kozaczuk <jwkoz...@gmail.com 
> <javascript:>> wrote:
>
>> This patch implements almost-in-decompression of kernel as
>> described by #985 with caveat that the segments are decompressed
>> starting with the last one. More detailed explanation of how
>> this new decompression strategy works can be found in extensive
>> comments added to fastlz source code.
>>
>> This new decompression strategy has following benefits:
>> - minimized amount of RAM to execute OSv images (the new minimum
>> RAM to execute native example with ROFS is 27.5 MB instead of 31MB).
>> The difference is almost twofold when executing ramfs images with
>> java-example.
>> - no longer one needs to adjust manually lzkernel_base to
>> build bigger ramfs images.
>>
>
> Nice. I assume there is no (significant) slowdown or larger size with the 
> new piecewise compression, right?
>
> Nope. Actually given that this skips "uncompressable" 1MB chunk it might 
be slightly faster.


>
>> Fixes #985
>>
>> Signed-off-by: Waldemar Kozaczuk <jwkoz...@gmail.com <javascript:>>
>> ---
>>  Makefile                    | 12 ++-----
>>  arch/x64/lzloader.ld        |  1 +
>>  fastlz/fastlz.h             | 20 +++++++++++
>>  fastlz/lz.cc                | 69 ++++++++++++++++++++++++++++++-------
>>  fastlz/lzloader.cc          | 61 +++++++++++++++++++++++++++-----
>>  scripts/check-image-size.sh | 15 +++-----
>>  6 files changed, 137 insertions(+), 41 deletions(-)
>>
>> diff --git a/Makefile b/Makefile
>> index 7f0a06fa..0d9e94c2 100644
>> --- a/Makefile
>> +++ b/Makefile
>> @@ -419,14 +419,8 @@ ifeq ($(arch),x64)
>>
>>  # kernel_base is where the kernel will be loaded after uncompression.
>>  # lzkernel_base is where the compressed kernel is loaded from disk.
>> -# As issue #872 explains, lzkernel_base must be chosen high enough for
>> -# (lzkernel_base - kernel_base) to be bigger than the kernel's size.
>> -# On the other hand, don't increase lzkernel_base too much, because it 
>> puts
>> -# a lower limit on the VM's RAM size.
>> -# Below we verify that the compiled kernel isn't too big given the 
>> current
>> -# setting of these paramters; Otherwise we recommend to increase 
>> lzkernel_base.
>>  kernel_base := 0x200000
>> -lzkernel_base := 0x1800000
>> +lzkernel_base := 0x100000
>>
>>
>>  $(out)/boot.bin: arch/x64/boot16.ld $(out)/arch/x64/boot16.o
>> @@ -466,9 +460,9 @@ $(out)/fastlz/lzloader.o: fastlz/lzloader.cc | 
>> generated-headers
>>
>>  $(out)/lzloader.elf: $(out)/loader-stripped.elf.lz.o 
>> $(out)/fastlz/lzloader.o arch/x64/lzloader.ld \
>>         $(out)/fastlz/fastlz.o
>> -       $(call very-quiet, scripts/check-image-size.sh 
>> $(out)/loader-stripped.elf $(shell bash -c 'echo 
>> $$(($(lzkernel_base)-$(kernel_base)))'))
>> +       $(call very-quiet, scripts/check-image-size.sh 
>> $(out)/loader-stripped.elf)
>>         $(call quiet, $(LD) -o $@ 
>> --defsym=OSV_LZKERNEL_BASE=$(lzkernel_base) \
>> -               -Bdynamic --export-dynamic --eh-frame-hdr 
>> --enable-new-dtags \
>> +               -Bdynamic --export-dynamic --eh-frame-hdr 
>> --enable-new-dtags -z max-page-size=4096 \
>>
>
> I'm still not 100% sure why this "max-page-size" thing suddenly became 
> necessary to avoid 1MB of zeros,
> and wasn't necessary previously. Or why "lzloader.elf" is an ELF at all 
> ;-) But we can return to these questions later.
>  
>
>>                 -T arch/x64/lzloader.ld \
>>                 $(filter %.o, $^), LINK lzloader.elf)
>>         $(call quiet, truncate -s %32768 $@, ALIGN lzloader.elf)
>> diff --git a/arch/x64/lzloader.ld b/arch/x64/lzloader.ld
>> index 992698b9..82d8ddcf 100644
>> --- a/arch/x64/lzloader.ld
>> +++ b/arch/x64/lzloader.ld
>> @@ -10,6 +10,7 @@ SECTIONS
>>  {
>>      . = OSV_LZKERNEL_BASE + 0x1000;
>>      .text : { *(.text) }
>> +    . = OSV_LZKERNEL_BASE + 0x3000;
>>
>
> I think next week nobody will remember why this 0x3000 is here.
> Maybe at least mention that MAX_COMPRESSED_SEGMENT_SIZE contains another 
> appearance of the same constant.
>  
>
>>      .data : { *(.data) }
>>      .bss : { *(.bss) }
>>      .edata = .;
>> diff --git a/fastlz/fastlz.h b/fastlz/fastlz.h
>> index 8ad3336a..77f8288b 100644
>> --- a/fastlz/fastlz.h
>> +++ b/fastlz/fastlz.h
>> @@ -17,6 +17,26 @@
>>
>>  #define FASTLZ_VERSION_STRING "0.1.0"
>>
>> +/**
>> + * 1 MB was chosen for segment size to make sure that
>> + * it could fit into 2nd MB just before uncompressed
>> + * kernel located at 0x200000
>>
>
> maybe mention the kernel_base and lzkernel_base constants of the Makefile?
>
Updated the comments to make things more clear (I hope :-). 

>  
>
>> + */
>> +#define SEGMENT_SIZE (1024 * 1024)
>>
> +/**
>> + * The maximum compressed segment size needs to be slightly less
>> + * than 1 MB so that in worst case scenario first segment
>> + * which is decompressed last does not overlap with the
>> + * its target decompressed area - 2nd MB. It has to be by 4 bytes
>> + * plus 3 pages smaller than 1 MB because first 3 pages in lzloader.elf
>> + * are occupied by code and next 4 bytes store offset of the segments 
>> info
>> + * table. In case first segment is of original size and overlaps 
>> slightly with its
>> + * target 2nd MB, then data would be copied byte by byte going backwards 
>> from
>> + * last byte towards 1st one. All in all this scheme guarantees
>> + * no data is overwritten no matter what the scenario.
>> + */
>> +#define MAX_COMPRESSED_SEGMENT_SIZE (SEGMENT_SIZE - sizeof(int) - 0x3000)
>> +
>>  /**
>>    Compress a block of data in the input buffer and returns the size of
>>    compressed block. The size of input buffer is specified by length. The
>> diff --git a/fastlz/lz.cc b/fastlz/lz.cc
>> index 9cd4dfd8..63680d4c 100644
>> --- a/fastlz/lz.cc
>> +++ b/fastlz/lz.cc
>> @@ -11,11 +11,25 @@
>>
>>  using namespace std;
>>
>> -// Compress the Kernel using fastlz algorithm
>> +// Compress the kernel by splitting it into smaller 1MB segments
>> +// and compressing each one using fastlz algorithm and finally
>> +// combining into single output file
>> +//
>> +// The resulting output file has following structure:
>> +// - 4 bytes (int) field - offset where segment info table is stored
>> +// - followed by each compressed segment content
>> +// - followed by segment info table comprised of:
>> +//    - 4 bytes (int) field - number of compressed segments
>> +//    - followed by 8 bytes (2 int-wide fields) for each segment:
>> +//       - uncompressed segment size in bytes
>> +//       - compressed segment size in bytes
>> +//
>> +// If a compressed segment size is greater than 
>> MAX_COMPRESSED_SEGMENT_SIZE (~99% compression ratio)
>> +// then original segment is placed into output.
>>  int main(int argc, char* argv[])
>>  {
>> -    size_t input_length, output_length;
>> -    char *input, *output;
>> +    size_t input_length;
>> +    char *input;
>>
>>      if (argc != 2) {
>>          cout << "usage: lz inputfile\n";
>> @@ -37,22 +51,51 @@ int main(int argc, char* argv[])
>>          return EXIT_FAILURE;
>>      }
>>
>> -    output = new char[(int)(input_length*2)];
>> -    output_length = fastlz_compress(input, input_length, output);
>> -
>>      ofstream output_file((std::string(argv[1]) + ".lz").c_str(), 
>> ios::out|ios::binary|ios::trunc);
>> -
>> -    if (output_file) {
>> -        output_file.write(output, output_length);
>> -        output_file.close();
>> -    }
>> -    else {
>> +    if (!output_file) {
>>          cout << "Error opening output file\n";
>>          return EXIT_FAILURE;
>>      }
>> +    // Leave space for offset of segment info table
>> +    output_file.seekp(sizeof(int),ios::beg);
>> +
>> +    int segments_count = input_length / SEGMENT_SIZE + 1;
>> +    int *segment_sizes = new int[segments_count * 2];
>> +    char *compressed_segment = new char[SEGMENT_SIZE * 2];
>> +    //
>> +    // Iterate over each 1MB chunk (or less for last chunk) of input data
>> +    // and either append it compressed or original one as is
>> +    for(auto segment = 0; segment < segments_count; segment ++) {
>>
>
> nitpick: weird whitespace.
>  
>
>> +        auto bytes_to_compress = (segment < segments_count - 1) ? 
>> SEGMENT_SIZE : input_length % SEGMENT_SIZE;
>> +        segment_sizes[segment * 2] = bytes_to_compress;
>> +
>> +        size_t compressed_segment_length =
>> +                fastlz_compress(input + segment * SEGMENT_SIZE, 
>> bytes_to_compress, compressed_segment);
>> +        //
>> +        // Check if we actually compressed anything
>> +        if (compressed_segment_length < bytes_to_compress && 
>> compressed_segment_length <= MAX_COMPRESSED_SEGMENT_SIZE) {
>> +            output_file.write(compressed_segment, 
>> compressed_segment_length);
>> +            segment_sizes[segment * 2 + 1] = compressed_segment_length;
>> +        }
>> +        // Otherwise write original uncompressed segment data 
>> (compression ratio would have been > 99%)
>> +        else {
>> +            output_file.write(input + segment * SEGMENT_SIZE, 
>> bytes_to_compress);
>> +            segment_sizes[segment * 2 + 1] = bytes_to_compress;
>> +        }
>> +    }
>> +    //
>> +    // Write the segments info table
>> +    int info_offset = output_file.tellp();
>> +    output_file.write(reinterpret_cast<const char*>(&segments_count), 
>> sizeof(segments_count));
>> +    output_file.write(reinterpret_cast<const char*>(segment_sizes), 
>> segments_count * 2 * sizeof(int));
>> +    //
>> +    // Write offset of the segments info table in the beginning of the 
>> file
>> +    output_file.seekp(0,ios::beg);
>> +    output_file.write(reinterpret_cast<const char*>(&info_offset), 
>> sizeof(info_offset));
>> +    output_file.close();
>>
>>      delete[] input;
>> -    delete[] output;
>> +    delete[] compressed_segment;
>>
>>      return 0;
>>  }
>> diff --git a/fastlz/lzloader.cc b/fastlz/lzloader.cc
>> index e05387e7..1d12b5ae 100644
>> --- a/fastlz/lzloader.cc
>> +++ b/fastlz/lzloader.cc
>> @@ -1,5 +1,6 @@
>>  /*
>>   * Copyright (C) 2014 Eduardo Piva
>> + * Copyright (C) 2018 Waldemar Kozaczuk
>>   *
>>   * This work is open source software, licensed under the terms of the
>>   * BSD license as described in the LICENSE file in the top-level 
>> directory.
>> @@ -12,6 +13,10 @@
>>
>>  #define BUFFER_OUT (char *)OSV_KERNEL_BASE
>>
>> +#define INT_AT(byte_array,offset) (*(reinterpret_cast<int*>(byte_array + 
>> offset)))
>> +#define SEGMENT_INFO(info_array,segment) 
>> (reinterpret_cast<int*>(info_array + (1 + 2 * segment) * sizeof(int)))
>> +#define COPIED_SEGMENT_INFO(info_array,segment) 
>> (reinterpret_cast<int*>(info_array + 2 * segment * sizeof(int)))
>> +
>>  extern char _binary_loader_stripped_elf_lz_start;
>>  extern char _binary_loader_stripped_elf_lz_end;
>>  extern char _binary_loader_stripped_elf_lz_size;
>> @@ -24,13 +29,51 @@ extern "C" void *memset(void *s, int c, size_t n)
>>
>>  extern "C" void uncompress_loader()
>>  {
>> -    // We do not know the exact uncompressed size, so don't really want 
>> to
>> -    // pass a the last (maxout) parameter of fastlz_decompress. Let it
>> -    // uncompress as much as it has input. The Makefile already verifies
>> -    // that the uncompressed kernel doesn't overwrite this uncompression 
>> code.
>> -    // Sadly, "INT_MAX" is the largest number we can pass. If we ever 
>> need
>> -    // more than 2GB here, it won't work.
>> -    fastlz_decompress(&_binary_loader_stripped_elf_lz_start,
>> -            (size_t) &_binary_loader_stripped_elf_lz_size,
>> -            BUFFER_OUT, INT_MAX);
>> +    char *compressed_input = &_binary_loader_stripped_elf_lz_start;
>> +    //
>> +    // Read 1st four bytes to identify offset of the segments info table
>> +    int info_offset = INT_AT(compressed_input,0);
>> +    char *segments_info_array = compressed_input + info_offset;
>> +    int segments_count = INT_AT(compressed_input,info_offset);
>> +    //
>> +    // Calculate source offset and destination offset for last segment
>> +    size_t src_offset = sizeof(int), dst_offset = 0;
>> +    for (auto segment = 0; segment < segments_count; segment++) {
>> +        int *segment_info = SEGMENT_INFO(segments_info_array, segment);
>> +        auto decomp_segment_size = *segment_info;
>> +        auto comp_segment_size = *(segment_info + 1);
>> +        src_offset += comp_segment_size;
>> +        dst_offset += decomp_segment_size;
>> +    }
>> +    auto uncompressed_size = dst_offset;
>> +    //
>> +    // Copy info table to an area above target decompressed
>> +    // kernel, otherwise we would have written over it when decompressing
>> +    // segments
>> +    char *copied_info_array = BUFFER_OUT + (uncompressed_size + 10);
>> +    for (auto segment = 0; segment < segments_count; segment++) {
>> +        int *segment_info = SEGMENT_INFO(segments_info_array, segment);
>> +        int *target_segment_info = 
>> COPIED_SEGMENT_INFO(copied_info_array, segment);
>> +        *target_segment_info = *segment_info;
>> +        *(target_segment_info + 1) = *(segment_info + 1);
>> +    }
>> +    //
>> +    // Iterate over segments and decompress them starting with the last 
>> one
>> +    for (auto segment = segments_count - 1; segment >= 0; segment --) {
>> +        int *segment_info = COPIED_SEGMENT_INFO(copied_info_array, 
>> segment);
>> +        src_offset -= *(segment_info + 1);
>> +        dst_offset -= *segment_info;
>> +        if (*(segment_info + 1) < *segment_info) {
>> +            fastlz_decompress(compressed_input + src_offset,
>> +                              *(segment_info + 1),
>> +                              BUFFER_OUT + dst_offset,
>> +                              INT_MAX);
>> +        }
>> +        else {
>> +            //
>> +            // Copy from last byte to the first
>> +            for (auto off = *segment_info - 1; off >= 0 ; off-- )
>> +                *(BUFFER_OUT + dst_offset + off) = *(compressed_input + 
>> src_offset + off);
>> +        }
>> +    }
>>  }
>> diff --git a/scripts/check-image-size.sh b/scripts/check-image-size.sh
>> index 0f972a7a..8e9fe2db 100755
>> --- a/scripts/check-image-size.sh
>> +++ b/scripts/check-image-size.sh
>> @@ -5,21 +5,16 @@
>>  # This work is open source software, licensed under the terms of the
>>  # BSD license as described in the LICENSE file in the top-level 
>> directory.
>>  #
>> -if [ "$#" -ne 2 ]; then
>> -    echo "usage: $0 file maxsize"
>> +if [ "$#" -ne 1 ]; then
>> +    echo "usage: $0 file"
>>      exit 1
>>  fi
>>
>>  size=$(ls -l "$1" | cut -d " " -f 5)
>>
>> -if [ $size -ge 2147483647 ]; then
>> -    echo "$1 is bigger than INT_MAX limit of uncompressing kernel."
>> -    echo "If you really need such a huge kernel, fix fastlz/lzloader.cc"
>> -    exit 1
>> -fi
>> -if [ $size -ge "$2" ]; then
>> -    echo "$1 is bigger than $2 available for uncompressing kernel."
>> -    echo "Increase 'lzkernel_base' in Makefile to raise this limit."
>> +if [ $size -ge 4294967295 ]; then
>> +    echo "$1 is bigger than ULONG_MAX limit of uncompressing kernel."
>> +    echo "If you really need such a huge kernel change offsets to use 
>> 64-bit long long int numbers."
>>      exit 1
>>  fi
>>  exit 0
>> -- 
>> 2.17.1
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "OSv Development" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to osv-dev+u...@googlegroups.com <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [PATCH V2] Implemented almost-in-place kernel decompression

Reply via email to