Re: git clone or fetch via https stuck?

2026-01-05 Thread Joseph Myers via Gcc
On Mon, 5 Jan 2026, Mark Wielaard wrote:

> In general gcc.git is just really, really big. Which makes all this
> just slightly awkward (it doesn't help that git-http-backend seems to
> try to create an optimal pack for each fetch instead of having
> something generically cached). At 2.5G it is a couple of factors

Maybe we need to repack it (in particular, there have been a few occasions 
when mistakenly created branches were deleted or moved, which isn't 
optimal with the delta islands configuration).

According to my notes, a full repack should be

  git repack --window=1250 --depth=250 -b -AdFfi

(expect that to use more than 128 GB of memory and take over an hour of 
wall clock time even using all cores) and there was also a recommendation 
from Richard Earnshaw for weekly packing with

  git repack --window-memory=500m --window=250 --depth=50 -b -A -d -i

for optimal efficiency.

-- 
Joseph S. Myers
[email protected]



Re: git clone or fetch via https stuck?

2026-01-05 Thread Mark Wielaard
Hi Jonathan,

On Sun, Jan 04, 2026 at 09:45:18PM +, Jonathan Wakely wrote:
> On Sun, 4 Jan 2026, 11:54 Mark Wielaard,  wrote:
> > We switched off the "smart protocol" for http(s) and enabled the "dumb
> > protocol". The dumb git protocol works, but is somewhat inefficient and
> > for really big repos like gcc.git requires thousands of fetches (and
> > if you start fresh any failure like you reported will result in you
> > having to start from scratch again).
> >
> > The git:// or ssh:// protocols still use the smart protocol and so
> > should be more realiable.
> >
> > The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and
> > https://git.sr.ht/~sourceware/gcc are up to date and could also be used.
> >
> > It seems to bots have lost interest so maybe we can reduce the anubis
> > paranoia and re-enable the smart protocol again. I'll try, but if the
> > bots return we might have to disable it again, so using different
> > protocol or a mirror if you have to use https might be a good idea for
> > now.
> 
> Maybe we should update the web page about git access to advise against
> fetching over https, since it's slow and inefficient (and might not work at
> all).

I hope it isn't that bad. The new year seems to have less aggessive AI
scraper bots so we dialed down the anubis "protections" (no more
javascript needed) and reenabled the "smart protocol":
https://inbox.sourceware.org/[email protected]
https://fosstodon.org/@sourceware

> The download pages request people to use mirrors to reduce load on the main
> server, it makes sense to give similar advice for obtaining the sources
> over git. We could give clear instructions for fetching from a different
> source and then switching the remote to point to gcc.gnu.org after the
> initial fetch.

But that isn't a bad idea in general. The
https://forge.sourceware.org/gcc/gcc-mirror and
https://git.sr.ht/~sourceware/gcc mirrors are normally monitored and
should be up to date with ~10 minutes delay.

In general gcc.git is just really, really big. Which makes all this
just slightly awkward (it doesn't help that git-http-backend seems to
try to create an optimal pack for each fetch instead of having
something generically cached). At 2.5G it is a couple of factors
bigger than anything else out there. binutils-gdb.git is 725M,
glibc.git is 330M and most others are < 100M.

We might want to explore offering something that just contains the
history from gcc-5 plus all release branches since then. Which should
be doable in ~750M. Or even just the supported release branches (from
gcc-13 up) which gets a normal development git down to ~250M?

Cheers,

Mark


RE: git clone or fetch via https stuck?

2026-01-04 Thread Jiang, Haochen via Gcc
I see. Thanks for the help!

Let me have a change in my bisect script to change everything.

Thx,
Haochen

From: Jonathan Wakely 
Sent: Monday, January 5, 2026 5:45 AM

On Sun, 4 Jan 2026, 11:54 Mark Wielaard, 
mailto:[email protected]>> wrote:
Hi Haochen,

On Sun, Jan 04, 2026 at 02:45:17AM +, Jiang, Haochen via Gcc wrote:
> Recently I got an issue on git fetch via https, seems starting from 12/23 or 
> around,
> blocking my bisect script for a while.
> [...]
> I have also tried on other machines still similar.
>
> Is there anything strict applied to git clone via https leading to this issue?

Yes, AI scraper bots again :{
See https://inbox.sourceware.org/[email protected]
and notices posted at https://fosstodon.org/@sourceware

We switched off the "smart protocol" for http(s) and enabled the "dumb
protocol". The dumb git protocol works, but is somewhat inefficient and
for really big repos like gcc.git requires thousands of fetches (and
if you start fresh any failure like you reported will result in you
having to start from scratch again).

The git:// or ssh:// protocols still use the smart protocol and so
should be more realiable.

The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and
https://git.sr.ht/~sourceware/gcc are up to date and could also be used.

It seems to bots have lost interest so maybe we can reduce the anubis
paranoia and re-enable the smart protocol again. I'll try, but if the
bots return we might have to disable it again, so using different
protocol or a mirror if you have to use https might be a good idea for
now.


Maybe we should update the web page about git access to advise against fetching 
over https, since it's slow and inefficient (and might not work at all).

The download pages request people to use mirrors to reduce load on the main 
server, it makes sense to give similar advice for obtaining the sources over 
git. We could give clear instructions for fetching from a different source and 
then switching the remote to point to gcc.gnu.org after the 
initial fetch.


Re: git clone or fetch via https stuck?

2026-01-04 Thread Jonathan Wakely via Gcc
On Sun, 4 Jan 2026, 11:54 Mark Wielaard,  wrote:

> Hi Haochen,
>
> On Sun, Jan 04, 2026 at 02:45:17AM +, Jiang, Haochen via Gcc wrote:
> > Recently I got an issue on git fetch via https, seems starting from
> 12/23 or around,
> > blocking my bisect script for a while.
> > [...]
> > I have also tried on other machines still similar.
> >
> > Is there anything strict applied to git clone via https leading to this
> issue?
>
> Yes, AI scraper bots again :{
> See https://inbox.sourceware.org/[email protected]
> and notices posted at https://fosstodon.org/@sourceware
>
> We switched off the "smart protocol" for http(s) and enabled the "dumb
> protocol". The dumb git protocol works, but is somewhat inefficient and
> for really big repos like gcc.git requires thousands of fetches (and
> if you start fresh any failure like you reported will result in you
> having to start from scratch again).
>
> The git:// or ssh:// protocols still use the smart protocol and so
> should be more realiable.
>
> The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and
> https://git.sr.ht/~sourceware/gcc are up to date and could also be used.
>
> It seems to bots have lost interest so maybe we can reduce the anubis
> paranoia and re-enable the smart protocol again. I'll try, but if the
> bots return we might have to disable it again, so using different
> protocol or a mirror if you have to use https might be a good idea for
> now.
>


Maybe we should update the web page about git access to advise against
fetching over https, since it's slow and inefficient (and might not work at
all).

The download pages request people to use mirrors to reduce load on the main
server, it makes sense to give similar advice for obtaining the sources
over git. We could give clear instructions for fetching from a different
source and then switching the remote to point to gcc.gnu.org after the
initial fetch.

>


Re: git clone or fetch via https stuck?

2026-01-04 Thread Mark Wielaard
Hi Haochen,

On Sun, Jan 04, 2026 at 02:45:17AM +, Jiang, Haochen via Gcc wrote:
> Recently I got an issue on git fetch via https, seems starting from 12/23 or 
> around,
> blocking my bisect script for a while.
> [...]
> I have also tried on other machines still similar.
> 
> Is there anything strict applied to git clone via https leading to this issue?

Yes, AI scraper bots again :{
See https://inbox.sourceware.org/[email protected]
and notices posted at https://fosstodon.org/@sourceware

We switched off the "smart protocol" for http(s) and enabled the "dumb
protocol". The dumb git protocol works, but is somewhat inefficient and
for really big repos like gcc.git requires thousands of fetches (and
if you start fresh any failure like you reported will result in you
having to start from scratch again).

The git:// or ssh:// protocols still use the smart protocol and so
should be more realiable.

The mirrors on https://forge.sourceware.org/gcc/gcc-mirror and
https://git.sr.ht/~sourceware/gcc are up to date and could also be used.

It seems to bots have lost interest so maybe we can reduce the anubis
paranoia and re-enable the smart protocol again. I'll try, but if the
bots return we might have to disable it again, so using different
protocol or a mirror if you have to use https might be a good idea for
now.

Cheers,

Mark


Re: git clone or fetch via https stuck?

2026-01-03 Thread Ben Boeckel via Gcc
On Sun, Jan 04, 2026 at 02:45:17 +, Jiang, Haochen via Gcc wrote:
> I have also tried on other machines still similar.
> 
> Is there anything strict applied to git clone via https leading to this 
> issue? Since
> I saw this in yearly newsletter under sourceware:
> 
> "Include recent change to git over https"
> 
> If not, I have to diagnose my network on those machines even further :(

I believe that some of the mitigations for scraping have affected this
(but I may be completely wrong about this). You can fetch from the
https://github.com/gcc-mirror/gcc mirror to get the objects and then
update the refs from `gcc.gnu.org` without having to negotiate so many
objects (which is what seems to trigger the network sadness).

There was a thread in the past week or so asking about this as well.

--Ben