Re: Shallow git update in bootstrap

2023-08-06 Thread Carles Pina i Estany


Hi,

On 06 Aug 2023 at 16:56:45, Bruno Haible wrote:
> Carles Pina i Estany wrote:

> > When I say "long time" (and data transmission) in my case it's 9
> > minutes:
> > -
> > carles@pinux:[master]~/git/wget2$ time git submodule update --init 
> > Cloning into '/home/carles/git/wget2/gnulib'...
> > Submodule path 'gnulib': checked out 
> > '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6'
> > 
> > real9m1,135s
> > user6m24,309s
> > sys 0m5,020s
> > -

above was with a 4G+ connection and a laptop

> I can reproduce that the --depth option has a big impact on the
> 'git clone' execution time:
> 
>   - no --depth:   50 sec.
>   - --depth=2000: 16 sec.
>   - --depth=1: 5 sec.

In my day to day VPS server (low spec):
- no --depth:   6 minutes (majority of time in "resolving deltas") (152
MB in the cloned directory)
- --depth=2000: 1 min 30 sec (100 MB)
- --depth=1:7 seconds (88 MB)

> However, --depth=1 has the problem that it may/will cause trouble to the
> developer later, if they use more than "git pull". Namely,
>   - In 'git log' the history will be truncated,
>   - 'git bisect' may not work,
>   - 'git annotate' will show a wrong author for many lines of code.
> 
> '--depth=2000' would be a middle ground, but it still has the 'git annotate'
> problem.

agree with above

> These troubles are probably not worth the saved 'git clone' time upfront.
> 
> However, when doing automated builds, such as continuous integration,
> --depth=1 saves a lot of time, and is not problematic, since the build
> directory is getting deleted anyway 10 minutes later.
> 
> How about adding to 'bootstrap' an option '--for-build' that has the
> effect that all submodule clones will be fetched with --depth=1 ?

>From my initial point of view (slower connections, metered connections)
and also for saving CI building time and bandwidth (and
git.savannah.gnu.org bandwidth): an option '--for-build' seems very
useful.

Thanks for considering it,

-- 
Carles Pina i Estany
https://carles.pina.cat || Wiktionary translations: https://kamus.pina.cat



Re: Shallow git update in bootstrap

2023-08-06 Thread Bruno Haible
> However, --depth=1 has the problem ...

Another problem of --depth=1 is that it may fail with git versions < 2.8:
https://jira.mariadb.org/browse/MDEV-28032?workflowName=MariaDB+v4=1
https://github.com/git/git/commit/fb43e31f2b43076e7a30c9cd00d024

Therefore, it's not OK to use --depth=1 by default. But it would be OK
to do it based on a command-line option or environment variable.

Bruno






Re: Shallow git update in bootstrap

2023-08-06 Thread Bruno Haible
Carles Pina i Estany wrote:
> Actually, the first time I wondered if the connection or something else
> failed.

The latter is a user mistake, since there was a message
"Cloning into '/home/carles/git/wget2/gnulib'..." and the "..." tells
that it may take some time.

> When I say "long time" (and data transmission) in my case it's 9
> minutes:
> -
> carles@pinux:[master]~/git/wget2$ time git submodule update --init 
> Cloning into '/home/carles/git/wget2/gnulib'...
> Submodule path 'gnulib': checked out 
> '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6'
> 
> real  9m1,135s
> user  6m24,309s
> sys   0m5,020s
> -

I can reproduce that the --depth option has a big impact on the
'git clone' execution time:

  - no --depth:   50 sec.
  - --depth=2000: 16 sec.
  - --depth=1: 5 sec.

However, --depth=1 has the problem that it may/will cause trouble to the
developer later, if they use more than "git pull". Namely,
  - In 'git log' the history will be truncated,
  - 'git bisect' may not work,
  - 'git annotate' will show a wrong author for many lines of code.

'--depth=2000' would be a middle ground, but it still has the 'git annotate'
problem.

These troubles are probably not worth the saved 'git clone' time upfront.

However, when doing automated builds, such as continuous integration,
--depth=1 saves a lot of time, and is not problematic, since the build
directory is getting deleted anyway 10 minutes later.

How about adding to 'bootstrap' an option '--for-build' that has the
effect that all submodule clones will be fetched with --depth=1 ?

Tim Rühsen wrote:
> To speed things up in container CI environments:
> If containers are only used once, git clone gnulib at image creation 
> time and do "rmdir gnulib && mv /gnulib . && git submodule update 
> gnulib" in the container.

Nice trick. Let's see how it competes with with a --depth=1 option.
I would expect that if you use the same image for a year, the
'git submodule update gnulib' step gets slower and slower over that
year, until you create a new image.

Bruno






Re: Shallow git update in bootstrap

2023-08-06 Thread Tim Rühsen



On 8/6/23 00:25, Carles Pina i Estany wrote:

When I say "long time" (and data transmission) in my case it's 9
minutes:
-
carles@pinux:[master]~/git/wget2$ time git submodule update --init
Cloning into '/home/carles/git/wget2/gnulib'...
Submodule path 'gnulib': checked out '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6'

real9m1,135s
user6m24,309s
sys 0m5,020s
-


Not answering your question, but may be helpful:

If you regularly build wget2 from git, you only have to do download 
gnulib once. When the gnulib submodule becomes updated by the project 
(happens from time to time), only the missing parts are downloaded by 
git, which should be fast even on slow network connections.


Another option is to git clone gnulib into a separate directory outside 
the project directory and set the env variable GNULIB_REFDIR to this 
directory (e.g. "export GNULIB_REFDIR=/home/carles/git/gnulib").
When needed (or eventually), use "git pull" from inside the gnulib 
directory to update it.


The `./bootstrap` script in the wget2 project then fetches the needed 
gnulib commits from from $GNULIB_REFDIR.


To speed things up in container CI environments:
If containers are only used once, git clone gnulib at image creation 
time and do "rmdir gnulib && mv /gnulib . && git submodule update 
gnulib" in the container. This is still experimental, just started using 
it yesterday without experiencing any downsides so far.


Regards, Tim


OpenPGP_signature
Description: OpenPGP digital signature


Shallow git update in bootstrap

2023-08-05 Thread Carles Pina i Estany


Hi,

This is a wishlist / question regarding using "--depth 2" in "git
submodule init --" in the bootstrap file.

I was building a project that uses gnulib (with bootstrap).

./bootstrap does:

"""
if git_modules_config submodule.gnulib.url >/dev/null; then
echo "$0: getting gnulib files..."
git submodule init -- "$gnulib_path" || exit $?
git submodule update -- "$gnulib_path" || exit $?
"""

The "git submodule update" takes a long time. Would it be possible to
use "--depth 1" there? (and in other "git submodule update"s?

A few lines below it checks if "git clone -h 2>&1" has the option
--depth and use it if possible. Perhaps the same approch could be done
in the "git submodule update"s ? (in my case the default code path uses
"git submodule update" and not "git clone" with the --depth 2)

I wonder if there is any reason not to use the --depth 2 for the update.

When I say "long time" (and data transmission) in my case it's 9
minutes:
-
carles@pinux:[master]~/git/wget2$ time git submodule update --init 
Cloning into '/home/carles/git/wget2/gnulib'...
Submodule path 'gnulib': checked out '2ae6faf9c78384dc6a2674b62dd56ff153cd51f6'

real9m1,135s
user6m24,309s
sys 0m5,020s
-

Actually, the first time I wondered if the connection or something else
failed.

Thank you very much,

-- 
Carles Pina i Estany
https://carles.pina.cat || Wiktionary translations: https://kamus.pina.cat