Hi,

Thank you for working on this!

On Fri, 5 Jun 2026 at 23:09, Andres Freund <[email protected]> wrote:
>
> I noticed that a handfull of CI runs already lead to exceeding the available
> cache space.  One can pay for more cache space, but I think the problem is
> more that what we currently do doesn't work well.
>
> With cirrus-ci all branches shared one cache, but that's not the case with
> github actions. Except for being able to read caches from the default branch
> (master in our case), other branches have completely separate cache
> namespaces.  That's probably the right call, safety wise, but makes our ccache
> approach .. not great.
>
> We should only upload a new cache when the ccache cache hit ratio of the
> existing cache entry has gotten low.

This makes sense.


> We also chose the cache key unfortunately, so that if a branch name started
> with the name of the default branch, followed by a -, we'd always end up using
> the main branches cache.
>
>
> The attached patch fixes these, and a few other problems.  See commit message
> for details.  With it I see a lot less cache churn and therefore also a higher
> hit rate once one has more than 2-3 branches.
>
>
> I'm not entirely happy with the amount of per job repetition this has. While
> staying within the confines of a single .yml file, I couldn't find a better
> way to deal with that.  We could move a fair bit of that complexity into a
> separate file, using so called "composite actions".  But that's a bit of
> additional github actions specific stuff that one would be exposed to, so I'm
> not sure we should go that way?


I think it looks okay, no need to use composite actions for this.

--------------------

I tested the patch and I confirm that it works as mentioned. Here is my review:

All the points you explained in the commit message are nice improvements!

Typo in commit message:

+ In my testing this utilizes the available cache space (10GB for personal
+ accounts) much more effictively than before.

Typo at 'effictively' in the commit message.


diff --git a/.github/workflows/pg-ci.yml b/.github/workflows/pg-ci.yml
index 8560e9389f6..86dc47de8db 100644
--- a/.github/workflows/pg-ci.yml
+++ b/.github/workflows/pg-ci.yml

+      - &ccache_decide_save_step
+        name: "ccache: Decide if cache should be uploaded"
+        id: ccache-pre-save
+        # [Decide to] store the cache whenever the cache was set up, so that
+        # incrementally addressing compiler errors/warnings doesn't have to
+        # start from scratch.
+        if: |
+          always() &&
+          steps.ccache-restore-branch.conclusion == 'success'
+        run: python3 src/tools/ci/gha_ccache_decide.py

Isn't the conclusion always true unless GitHub has some self errors?

Also, we are directly running this script with the 'python3' command
but it might not be available on the PATH. I had some problems with
this on BSD images when we were using Cirrus.  I am not sure we would
have such problems with GitHub Actions but I just wanted to mention
it.


diff --git a/src/tools/ci/gha_ccache_decide.py
b/src/tools/ci/gha_ccache_decide.py
new file mode 100644
index 00000000000..920f7bf9685
--- /dev/null
+++ b/src/tools/ci/gha_ccache_decide.py

+def main():
+    on_default_branch = os.environ["ON_DEFAULT_BRANCH"] == "true"
+    ccache_dir = os.environ["CCACHE_DIR"]

ccache_dir isn't used.


+    # compute cache hit ratio
+    hits, misses = parse_ccache_stats()
+    total = hits + misses
+    hit_pct = int(( hits / total) * 100) if total > 0 else 100

Extra space in '( hits'.


+    # If there were either barely any misses, or the cache hit ratio was high,
+    # there no point in generating a new cache entry. We have limited cache
+    # space.
+    should_save = misses > 10 and hit_pct < target_rate

We consider misses here but we don't mention it, we only mention hit
rate and target rate. I think this is not very important since we
can't possibly have a case that misses < 10 and hit_pct < target_rate.
If that is not the case, then I think we can remove misses from the
should_save calculation.


+    # Don't store ccache stats , otherwise we'd need to reset the cache access

Extra space before comma.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft


Reply via email to