Re: Experiment in generating multi-layer Docker images with guix pack

2020-03-29 Thread Ludovic Courtès
Hi Chris,

Christopher Baines  skribis:

[...]

>> I think a layering algorithm like Graham Christensen’s above requires
>> knowledge of the reference graph, meaning that layering can only be
>> computed on the build side, using #:references-graphs.  In that case, it
>> could be that you can’t have a host-side  record.
>
> As I understand it, you only have to do the computation on the build
> side if you're restricted to doing a single set of builds. If you first
> build the store items you want to put in the image, then look at there
> references and compute the derivation for building the image, then you
> could do this kind of computation on the client side.
>
> But yeah, this is important to work out, as how image generation should
> work, and what behaviours we want should define the structure of the
> code.
>
> I went with records to represent layers partially because I'm familiar
> with it, but also because it allows for easier manipulation of layers on
> the client side. Representing different layers as different derivations
> also allows them to potentially be built in parallel, although I'm not
> sure how beneficial this might be.

That’s a good point, it could help.  We could also use a “dynamic
dependency” like for grafts so we can compute things on the host side
anyway (tempting, but we should probably not start using that
everywhere!).

> Related to this, at the moment Docker V1 images can be generated, it
> would be good in the future to also support Docker V2 images and OCI
> images. All three container formats use a layered approach to managing
> the files, but they are all different (as far as I'm aware).

Oh, I thought these formats were all the same.  I suppose it’d be enough
to support OCI, right?

> In my mind there are three architectural approaches:
>
>  - Image generation entirely on the build side
>
>- The layers and the image are constructed through one derivation
>- The code for building images is in a module available at build time
>- Different approaches for layering are implemented in the module
>  available at build time, and parameters are passed in as
>  data/gexpressions
>
>  - Image generation entirely on the client side
>
>- Each layer is a derivation, and the image is an additional
>  derivation that takes the layers as an input
>- The code for building images is inside gexp compilers for the
>  record types representing the images and layers
>- Different approaches for layering manipulate the layer records on
>  the client side
>
>  - Image generation can be done both build and client side
>
>- Depending on the parameters, the layers and image can be a single
>  derivation, or one for each layer, and another for the image
>- The code for building images is in a module available at build
>  time, and this is also used by gexp compilers
>- Different approaches for layering have the option of either being
>  on the build side, or the client side
>
> What are peoples thoughts?

>From a pragmatic standpoint, perhaps we can first integrate what you
propose (option #2), and later adjust the code towards #1 or #3 as we
see fit.

WDYT?

Thanks,
Ludo’.



Re: Experiment in generating multi-layer Docker images with guix pack

2020-03-26 Thread Christopher Baines

Ludovic Courtès  writes:

> Christopher Baines  skribis:
>
>> I think it could be useful to support multiple different strategies for
>> generating layers for Docker images, with different trade-offs. This approach
>> using two layers should make the resulting images more efficient to use in 
>> the
>> case where like the guile example above, where the packages you run guix pack
>> with have exactly matching inputs.
>
> Did you read ?
> They came up with a pretty smart algorithm that would be worth copying.

I'm aware of it, but I haven't read it in detail yet.

>> As well as these behaviour changes, these patches also modify the
>> implementation. Rather than having some build side code that's used in the
>> pack and vm module gexpressions, these patches introduce two new record 
>> types:
>>  and . This at least structures the
>> derivations so that each layer is represented by a derivation, and then
>> there's a derivation for the image itself, which is a little more efficient 
>> in
>> terms of computation.
>
> Nice.
>
> I think a layering algorithm like Graham Christensen’s above requires
> knowledge of the reference graph, meaning that layering can only be
> computed on the build side, using #:references-graphs.  In that case, it
> could be that you can’t have a host-side  record.

As I understand it, you only have to do the computation on the build
side if you're restricted to doing a single set of builds. If you first
build the store items you want to put in the image, then look at there
references and compute the derivation for building the image, then you
could do this kind of computation on the client side.

But yeah, this is important to work out, as how image generation should
work, and what behaviours we want should define the structure of the
code.

I went with records to represent layers partially because I'm familiar
with it, but also because it allows for easier manipulation of layers on
the client side. Representing different layers as different derivations
also allows them to potentially be built in parallel, although I'm not
sure how beneficial this might be.

Related to this, at the moment Docker V1 images can be generated, it
would be good in the future to also support Docker V2 images and OCI
images. All three container formats use a layered approach to managing
the files, but they are all different (as far as I'm aware).

In my mind there are three architectural approaches:

 - Image generation entirely on the build side

   - The layers and the image are constructed through one derivation
   - The code for building images is in a module available at build time
   - Different approaches for layering are implemented in the module
 available at build time, and parameters are passed in as
 data/gexpressions

 - Image generation entirely on the client side

   - Each layer is a derivation, and the image is an additional
 derivation that takes the layers as an input
   - The code for building images is inside gexp compilers for the
 record types representing the images and layers
   - Different approaches for layering manipulate the layer records on
 the client side

 - Image generation can be done both build and client side

   - Depending on the parameters, the layers and image can be a single
 derivation, or one for each layer, and another for the image
   - The code for building images is in a module available at build
 time, and this is also used by gexp compilers
   - Different approaches for layering have the option of either being
 on the build side, or the client side

What are peoples thoughts?

Thanks,

Chris


signature.asc
Description: PGP signature


Re: Experiment in generating multi-layer Docker images with guix pack

2020-03-26 Thread Ludovic Courtès
Hello Chris,

Christopher Baines  skribis:

> These patches are very rough, and not ready, but do at least work in some
> limited capacity. I've been testing with the following commands:
>
>   guix pack --format=docker guile@2.2.6
>   guix pack --format=docker guile@2.2.7
>
> With the previous Docker image generation implementation, two different ~130MB
> images would be generated. These patches mean that each .tar.gz file generated
> by guix pack contains a ~53MB layer which contains the profile and directly
> referenced store items, and then a ~77MB layer with all the other store items
> which is identical for both the 2.2.6 and 2.2.7 pack file.

Nice!

> I think it could be useful to support multiple different strategies for
> generating layers for Docker images, with different trade-offs. This approach
> using two layers should make the resulting images more efficient to use in the
> case where like the guile example above, where the packages you run guix pack
> with have exactly matching inputs.

Did you read ?
They came up with a pretty smart algorithm that would be worth copying.

> This could often be the case if you're developing an application, packaging it
> with Guix, then using guix pack to generate a Docker image which you
> deploy. With the single layer approach, if you change the application code,
> you'll get an entirely different image. I haven't tried this out, but my hope
> is that by generating a common base layer, if you change the application code
> only the top layer of the Docker image will change, meaning you'll only have
> to deploy that, rather than having to deploy the entire image. If you're
> deploying the images across a network, having less data to send around can
> save time, and reduce the amount of space required to store the images.

Definitely.

> As well as these behaviour changes, these patches also modify the
> implementation. Rather than having some build side code that's used in the
> pack and vm module gexpressions, these patches introduce two new record types:
>  and . This at least structures the
> derivations so that each layer is represented by a derivation, and then
> there's a derivation for the image itself, which is a little more efficient in
> terms of computation.

Nice.

I think a layering algorithm like Graham Christensen’s above requires
knowledge of the reference graph, meaning that layering can only be
computed on the build side, using #:references-graphs.  In that case, it
could be that you can’t have a host-side  record.

> What do people think about generating multi-layer images, and using record
> types to represent the layers and image?

I think multi-layering is something we should definitely have, and
record for at least the image are a good idea.  :-)

Thanks for looking into this!

Ludo’.



Experiment in generating multi-layer Docker images with guix pack

2020-03-21 Thread Christopher Baines
These patches are very rough, and not ready, but do at least work in some
limited capacity. I've been testing with the following commands:

  guix pack --format=docker guile@2.2.6
  guix pack --format=docker guile@2.2.7

With the previous Docker image generation implementation, two different ~130MB
images would be generated. These patches mean that each .tar.gz file generated
by guix pack contains a ~53MB layer which contains the profile and directly
referenced store items, and then a ~77MB layer with all the other store items
which is identical for both the 2.2.6 and 2.2.7 pack file.

I think it could be useful to support multiple different strategies for
generating layers for Docker images, with different trade-offs. This approach
using two layers should make the resulting images more efficient to use in the
case where like the guile example above, where the packages you run guix pack
with have exactly matching inputs.

This could often be the case if you're developing an application, packaging it
with Guix, then using guix pack to generate a Docker image which you
deploy. With the single layer approach, if you change the application code,
you'll get an entirely different image. I haven't tried this out, but my hope
is that by generating a common base layer, if you change the application code
only the top layer of the Docker image will change, meaning you'll only have
to deploy that, rather than having to deploy the entire image. If you're
deploying the images across a network, having less data to send around can
save time, and reduce the amount of space required to store the images.

As well as these behaviour changes, these patches also modify the
implementation. Rather than having some build side code that's used in the
pack and vm module gexpressions, these patches introduce two new record types:
 and . This at least structures the
derivations so that each layer is represented by a derivation, and then
there's a derivation for the image itself, which is a little more efficient in
terms of computation.

What do people think about generating multi-layer images, and using record
types to represent the layers and image?

Thanks,

Chris

[PATCH 1/3] Rename (guix docker) to (guix build docker)
[PATCH 2/3] Make guix pack work with the new docker image
[PATCH 3/3] Generate two layers for docker images in guix pack