Re: Current caching

2018-02-04 Thread Thomas De Schampheleire
2018-02-04 14:25 GMT+01:00 Thomas De Schampheleire :
> 2018-02-04 13:28 GMT+01:00 Dominik Ruf :
>>
>>
>> Thomas De Schampheleire  schrieb am So., 4. Feb.
>> 2018 um 13:04 Uhr:
>>>
>>>
>>>
>>> On Feb 4, 2018 12:32, "Dominik Ruf"  wrote:
>>>
>>>
>>>
>>> On Sun, Feb 4, 2018, 12:03 Mads Kiilerich  wrote:

 On 02/03/2018 10:12 PM, Dominik Ruf wrote:


 Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um 19:32
 Uhr:
>
> On 02/01/2018 12:50 AM, Dominik Ruf wrote:
> > Hi all,
> >
> > I'm currently looking at the caching Kallithea does. And I'm a
> > bit...baffled.
>
> Yes, it is quite baffling and not very efficient.
>
> I think it very rarely gives any real benefit - it might even make
> things slower. In real world setups with multiple repositories served by
> each worker process, cache hits are quite rare.

 For large git repositories we certainly do need caching.


 Perhaps. But it is probably a very different kind of caching we need.
>>>
>>> Displaying a changelog page with 100 changesets currently takes 54 seconds
>>> even with the current caching for the Linux Kernel.
>>> I'm about to make some changes that will get this down to about 2.5
>>> seconds. But of course this also depends on caching.
>>>
>>>
>>> This is with a hot cache, ie not just invalidated?
>>
>> Yes with 'hot cache'. Without cache it is about 70 seconds.
>>>
>>>
>>> Do you see the same for a similar mercurial repository, or is it specific
>>> to git? What makes the difference?
>>
>> I don't know any hg repositories with 90+ revisions.
>
> Our own review repo is 35+ revisions.
> The public mozilla-central is 40+ revisions. Perhaps this one
> could be useful for you to compare with hg performance? If need be you
> could strip the kernel repo to the same amount of revisions to get
> meaningful comparisons.
> https://hg.mozilla.org/mozilla-central/
>
>>>
>>>
>>> Is this on an SSD disk or spinning disk?
>>
>> SSD
>>>
>>>
>>> In our old installation we had performance problems with a large
>>> (mercurial) repo with many heads (review repo where nothing ever gets
>>> merged), particularly after invalidating of the cache. After upgrading
>>> Kallithea we were able to bump mercurial (which was supposed to have
>>> improvements to dealing with many heads), and additionally convert our repos
>>> to the generaldelta storage format. Now performance is relatively fine.
>>> Loading a changelog page definitely does not take 54 seconds, with both cold
>>> and hot cache.
>>
>> Here is what I did so far.
>> https://kallithea.dominikruf.com/kallithea/kallithea-domruf/changeset/89934e26b1fe4e4c5ef36416e1afbb22299233e7
>>
>
> Is this working without other changes? I see you added max_revisions
> as argument to get_changesets, but that method does not accept that
> parameter in latest tip.

Ok, I see that now in your other PR...
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-04 Thread Thomas De Schampheleire
2018-02-04 13:28 GMT+01:00 Dominik Ruf :
>
>
> Thomas De Schampheleire  schrieb am So., 4. Feb.
> 2018 um 13:04 Uhr:
>>
>>
>>
>> On Feb 4, 2018 12:32, "Dominik Ruf"  wrote:
>>
>>
>>
>> On Sun, Feb 4, 2018, 12:03 Mads Kiilerich  wrote:
>>>
>>> On 02/03/2018 10:12 PM, Dominik Ruf wrote:
>>>
>>>
>>> Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um 19:32
>>> Uhr:

 On 02/01/2018 12:50 AM, Dominik Ruf wrote:
 > Hi all,
 >
 > I'm currently looking at the caching Kallithea does. And I'm a
 > bit...baffled.

 Yes, it is quite baffling and not very efficient.

 I think it very rarely gives any real benefit - it might even make
 things slower. In real world setups with multiple repositories served by
 each worker process, cache hits are quite rare.
>>>
>>> For large git repositories we certainly do need caching.
>>>
>>>
>>> Perhaps. But it is probably a very different kind of caching we need.
>>
>> Displaying a changelog page with 100 changesets currently takes 54 seconds
>> even with the current caching for the Linux Kernel.
>> I'm about to make some changes that will get this down to about 2.5
>> seconds. But of course this also depends on caching.
>>
>>
>> This is with a hot cache, ie not just invalidated?
>
> Yes with 'hot cache'. Without cache it is about 70 seconds.
>>
>>
>> Do you see the same for a similar mercurial repository, or is it specific
>> to git? What makes the difference?
>
> I don't know any hg repositories with 90+ revisions.

Our own review repo is 35+ revisions.
The public mozilla-central is 40+ revisions. Perhaps this one
could be useful for you to compare with hg performance? If need be you
could strip the kernel repo to the same amount of revisions to get
meaningful comparisons.
https://hg.mozilla.org/mozilla-central/

>>
>>
>> Is this on an SSD disk or spinning disk?
>
> SSD
>>
>>
>> In our old installation we had performance problems with a large
>> (mercurial) repo with many heads (review repo where nothing ever gets
>> merged), particularly after invalidating of the cache. After upgrading
>> Kallithea we were able to bump mercurial (which was supposed to have
>> improvements to dealing with many heads), and additionally convert our repos
>> to the generaldelta storage format. Now performance is relatively fine.
>> Loading a changelog page definitely does not take 54 seconds, with both cold
>> and hot cache.
>
> Here is what I did so far.
> https://kallithea.dominikruf.com/kallithea/kallithea-domruf/changeset/89934e26b1fe4e4c5ef36416e1afbb22299233e7
>

Is this working without other changes? I see you added max_revisions
as argument to get_changesets, but that method does not accept that
parameter in latest tip.
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-04 Thread Dominik Ruf
Thomas De Schampheleire  schrieb am So., 4.
Feb. 2018 um 13:04 Uhr:

>
>
> On Feb 4, 2018 12:32, "Dominik Ruf"  wrote:
>
>
>
> On Sun, Feb 4, 2018, 12:03 Mads Kiilerich  wrote:
>
>> On 02/03/2018 10:12 PM, Dominik Ruf wrote:
>>
>>
>> Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um
>> 19:32 Uhr:
>>
>>> On 02/01/2018 12:50 AM, Dominik Ruf wrote:
>>> > Hi all,
>>> >
>>> > I'm currently looking at the caching Kallithea does. And I'm a
>>> > bit...baffled.
>>>
>>> Yes, it is quite baffling and not very efficient.
>>>
>>> I think it very rarely gives any real benefit - it might even make
>>> things slower. In real world setups with multiple repositories served by
>>> each worker process, cache hits are quite rare.
>>
>> For large git repositories we certainly do need caching.
>>
>>
>> Perhaps. But it is probably a very different kind of caching we need.
>>
> Displaying a changelog page with 100 changesets currently takes 54 seconds
> even with the current caching for the Linux Kernel.
> I'm about to make some changes that will get this down to about 2.5
> seconds. But of course this also depends on caching.
>
>
> This is with a hot cache, ie not just invalidated?
>
Yes with 'hot cache'. Without cache it is about 70 seconds.

>
> Do you see the same for a similar mercurial repository, or is it specific
> to git? What makes the difference?
>
I don't know any hg repositories with 90+ revisions.

>
> Is this on an SSD disk or spinning disk?
>
SSD

>
> In our old installation we had performance problems with a large
> (mercurial) repo with many heads (review repo where nothing ever gets
> merged), particularly after invalidating of the cache. After upgrading
> Kallithea we were able to bump mercurial (which was supposed to have
> improvements to dealing with many heads), and additionally convert our
> repos to the generaldelta storage format. Now performance is relatively
> fine. Loading a changelog page definitely does not take 54 seconds, with
> both cold and hot cache.
>
Here is what I did so far.
https://kallithea.dominikruf.com/kallithea/kallithea-domruf/changeset/89934e26b1fe4e4c5ef36416e1afbb22299233e7
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-04 Thread Thomas De Schampheleire
On Feb 4, 2018 12:32, "Dominik Ruf"  wrote:



On Sun, Feb 4, 2018, 12:03 Mads Kiilerich  wrote:

> On 02/03/2018 10:12 PM, Dominik Ruf wrote:
>
>
> Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um
> 19:32 Uhr:
>
>> On 02/01/2018 12:50 AM, Dominik Ruf wrote:
>> > Hi all,
>> >
>> > I'm currently looking at the caching Kallithea does. And I'm a
>> > bit...baffled.
>>
>> Yes, it is quite baffling and not very efficient.
>>
>> I think it very rarely gives any real benefit - it might even make
>> things slower. In real world setups with multiple repositories served by
>> each worker process, cache hits are quite rare.
>
> For large git repositories we certainly do need caching.
>
>
> Perhaps. But it is probably a very different kind of caching we need.
>
Displaying a changelog page with 100 changesets currently takes 54 seconds
even with the current caching for the Linux Kernel.
I'm about to make some changes that will get this down to about 2.5
seconds. But of course this also depends on caching.


This is with a hot cache, ie not just invalidated?

Do you see the same for a similar mercurial repository, or is it specific
to git? What makes the difference?

Is this on an SSD disk or spinning disk?

In our old installation we had performance problems with a large
(mercurial) repo with many heads (review repo where nothing ever gets
merged), particularly after invalidating of the cache. After upgrading
Kallithea we were able to bump mercurial (which was supposed to have
improvements to dealing with many heads), and additionally convert our
repos to the generaldelta storage format. Now performance is relatively
fine. Loading a changelog page definitely does not take 54 seconds, with
both cold and hot cache.
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-04 Thread Dominik Ruf
On Sun, Feb 4, 2018, 12:03 Mads Kiilerich  wrote:

> On 02/03/2018 10:12 PM, Dominik Ruf wrote:
>
>
> Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um
> 19:32 Uhr:
>
>> On 02/01/2018 12:50 AM, Dominik Ruf wrote:
>> > Hi all,
>> >
>> > I'm currently looking at the caching Kallithea does. And I'm a
>> > bit...baffled.
>>
>> Yes, it is quite baffling and not very efficient.
>>
>> I think it very rarely gives any real benefit - it might even make
>> things slower. In real world setups with multiple repositories served by
>> each worker process, cache hits are quite rare.
>
> For large git repositories we certainly do need caching.
>
>
> Perhaps. But it is probably a very different kind of caching we need.
>
Displaying a changelog page with 100 changesets currently takes 54 seconds
even with the current caching for the Linux Kernel.
I'm about to make some changes that will get this down to about 2.5
seconds. But of course this also depends on caching.

>
>
> /Mads
>
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-04 Thread Mads Kiilerich

On 02/03/2018 10:12 PM, Dominik Ruf wrote:


Mads Kiilerich mailto:m...@kiilerich.com>> 
schrieb am Sa., 3. Feb. 2018 um 19:32 Uhr:


On 02/01/2018 12:50 AM, Dominik Ruf wrote:
> Hi all,
>
> I'm currently looking at the caching Kallithea does. And I'm a
> bit...baffled.

Yes, it is quite baffling and not very efficient.

I think it very rarely gives any real benefit - it might even make
things slower. In real world setups with multiple repositories
served by
each worker process, cache hits are quite rare.

For large git repositories we certainly do need caching.


Perhaps. But it is probably a very different kind of caching we need.

/Mads
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-03 Thread Dominik Ruf
Mads Kiilerich  schrieb am Sa., 3. Feb. 2018 um
19:32 Uhr:

> On 02/01/2018 12:50 AM, Dominik Ruf wrote:
> > Hi all,
> >
> > I'm currently looking at the caching Kallithea does. And I'm a
> > bit...baffled.
>
> Yes, it is quite baffling and not very efficient.
>
> I think it very rarely gives any real benefit - it might even make
> things slower. In real world setups with multiple repositories served by
> each worker process, cache hits are quite rare.

For large git repositories we certainly do need caching.
In fact, we need more of it, because currently showing the changelog for
large repositories like the linux kernel, takes for ages.
That is the reason why I'm looking at the caching.

>
> > The way I understand it is that first an entry is made to
> > CacheInvalidation to mark a cache invalid,
>
> That is to register the in-memory cache in the database.
>
> > and later that entry is checked to decide if that cache should be
> > invalidated.
>
> That is to see if other process have invalidated the cache.
>
> /Mads
>
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-03 Thread Mads Kiilerich

On 02/01/2018 12:50 AM, Dominik Ruf wrote:

Hi all,

I'm currently looking at the caching Kallithea does. And I'm a 
bit...baffled.


Yes, it is quite baffling and not very efficient.

I think it very rarely gives any real benefit - it might even make 
things slower. In real world setups with multiple repositories served by 
each worker process, cache hits are quite rare.


The way I understand it is that first an entry is made to 
CacheInvalidation to mark a cache invalid,


That is to register the in-memory cache in the database.

and later that entry is checked to decide if that cache should be 
invalidated.


That is to see if other process have invalidated the cache.

/Mads
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general


Re: Current caching

2018-02-01 Thread Dominik Ruf
Thanks for still helping the project.
That helped.

On Thu, Feb 1, 2018, 09:04 Marcin Kuzminski  wrote:

> Hi Dominik,
>
> The database table is used for invalidation multiple processes. e.g if you
> run 3 workers via gunicorn each of them will register an entry in cache
> invalidation. This way if a push occurs in one process others will to
> invalidate it’s cache.
>
> This is mainly driven by the problem that some caches need to uses memory
> type which cannot be shared process wise because of serialization problems.
>
> Best,
>
> --
> Marcin Kuzminski
> RhodeCode, Inc.
>
> Enterprise Source Code Management. Unified.
> Contact us: Web  • Twitter
>  • Community
>  • Slack 
>
> On Thu, Feb 1, 2018 at 0:50, Dominik Ruf  wrote:
>
> Hi all,
>
> I'm currently looking at the caching Kallithea does. And I'm a
> bit...baffled.
> The way I understand it is that first an entry is made to
> CacheInvalidation to mark a cache invalid,
> and later that entry is checked to decide if that cache should be
> invalidated.
> But why this detour? Why not invalidate the cache directly?
> Can somebody explain to me why the invalidation is done via the
> CacheInvalidation table?
>
> cheers
> Dominik
>
> ___ kallithea-general mailing
> list kallithea-general@sfconservancy.org
> https://lists.sfconservancy.org/mailman/listinfo/kallithea-general
>
>
___
kallithea-general mailing list
kallithea-general@sfconservancy.org
https://lists.sfconservancy.org/mailman/listinfo/kallithea-general