Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Alec Warner
On Thu, Jul 9, 2020 at 2:06 PM Chun-Yu Shei  wrote:

> Hmm, that's strange... it seems to have made it to the list archives:
> https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65
>
> (but it is entirely possible that I used "git send-email" incorrectly)
>

Ahhh it's visible there; I'll blame gMail ;)

-A


>
> On Thu, Jul 9, 2020 at 2:04 PM Alec Warner  wrote:
>
>>
>>
>> On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei  wrote:
>>
>>> Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
>>> catpkgsplit.  use_reduce was split into 2 functions, with the outer one
>>> converting lists/sets to tuples so they can be hashed and creating a
>>> copy of the returned list (since the caller seems to modify it
>>> sometimes).  I tried to select cache sizes that minimized memory use
>>> increase,
>>> while still providing about the same speedup compared to a cache with
>>> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
>>> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
>>> RES column in htop increases from 280 MB -> 290 MB.
>>>
>>> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
>>> max observed RES value actually decreases from 228 MB -> 214 MB (similar
>>> values observed across a few before/after runs).
>>>
>>> Here are the cache hit stats, max observed RES memory, and runtime in
>>> seconds  for various sizes in the update case.  Caching for each
>>> function was tested independently (only 1 function with caching enabled
>>> at a time):
>>>
>>> catpkgsplit:
>>> CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419)
>>> 270 MB
>>> 39.217
>>>
>>> CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1)
>>> 271 MB
>>> 39.112
>>>
>>> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
>>> 271 MB
>>> 39.217
>>>
>>> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
>>> 269 MB
>>> 39.438
>>>
>>> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
>>> 271 MB
>>> 39.348
>>>
>>> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
>>> 271 MB
>>> 39.487
>>>
>>>
>>> use_reduce:
>>> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
>>> 407 MB
>>> 35.77
>>>
>>> CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1)
>>> 353 MB
>>> 35.52
>>>
>>> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
>>> 335 MB
>>> 35.31
>>>
>>> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
>>> 318 MB
>>> 35.85
>>>
>>> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
>>> 301 MB
>>> 36.39
>>>
>>> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
>>> 299 MB
>>> 37.175
>>>
>>>
>>> I didn't bother collecting detailed stats for vercmp, since the
>>> inputs/outputs are quite small and don't cause much memory increase.
>>> Please let me know if there are any other suggestions/improvements (and
>>> thanks Sid for the lru_cache suggestion!).
>>>
>>
>> I don't see a patch attached; can you link to it?
>>
>> -A
>>
>>
>>>
>>> Thanks,
>>> Chun-Yu
>>>
>>>
>>>
>>>


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Chun-Yu Shei
Hmm, that's strange... it seems to have made it to the list archives:
https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65

(but it is entirely possible that I used "git send-email" incorrectly)

On Thu, Jul 9, 2020 at 2:04 PM Alec Warner  wrote:

>
>
> On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei  wrote:
>
>> Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
>> catpkgsplit.  use_reduce was split into 2 functions, with the outer one
>> converting lists/sets to tuples so they can be hashed and creating a
>> copy of the returned list (since the caller seems to modify it
>> sometimes).  I tried to select cache sizes that minimized memory use
>> increase,
>> while still providing about the same speedup compared to a cache with
>> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
>> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
>> RES column in htop increases from 280 MB -> 290 MB.
>>
>> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
>> max observed RES value actually decreases from 228 MB -> 214 MB (similar
>> values observed across a few before/after runs).
>>
>> Here are the cache hit stats, max observed RES memory, and runtime in
>> seconds  for various sizes in the update case.  Caching for each
>> function was tested independently (only 1 function with caching enabled
>> at a time):
>>
>> catpkgsplit:
>> CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419)
>> 270 MB
>> 39.217
>>
>> CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1)
>> 271 MB
>> 39.112
>>
>> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
>> 271 MB
>> 39.217
>>
>> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
>> 269 MB
>> 39.438
>>
>> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
>> 271 MB
>> 39.348
>>
>> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
>> 271 MB
>> 39.487
>>
>>
>> use_reduce:
>> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
>> 407 MB
>> 35.77
>>
>> CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1)
>> 353 MB
>> 35.52
>>
>> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
>> 335 MB
>> 35.31
>>
>> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
>> 318 MB
>> 35.85
>>
>> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
>> 301 MB
>> 36.39
>>
>> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
>> 299 MB
>> 37.175
>>
>>
>> I didn't bother collecting detailed stats for vercmp, since the
>> inputs/outputs are quite small and don't cause much memory increase.
>> Please let me know if there are any other suggestions/improvements (and
>> thanks Sid for the lru_cache suggestion!).
>>
>
> I don't see a patch attached; can you link to it?
>
> -A
>
>
>>
>> Thanks,
>> Chun-Yu
>>
>>
>>
>>


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Alec Warner
On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei  wrote:

> Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
> catpkgsplit.  use_reduce was split into 2 functions, with the outer one
> converting lists/sets to tuples so they can be hashed and creating a
> copy of the returned list (since the caller seems to modify it
> sometimes).  I tried to select cache sizes that minimized memory use
> increase,
> while still providing about the same speedup compared to a cache with
> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
> RES column in htop increases from 280 MB -> 290 MB.
>
> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
> max observed RES value actually decreases from 228 MB -> 214 MB (similar
> values observed across a few before/after runs).
>
> Here are the cache hit stats, max observed RES memory, and runtime in
> seconds  for various sizes in the update case.  Caching for each
> function was tested independently (only 1 function with caching enabled
> at a time):
>
> catpkgsplit:
> CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419)
> 270 MB
> 39.217
>
> CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1)
> 271 MB
> 39.112
>
> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
> 271 MB
> 39.217
>
> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
> 269 MB
> 39.438
>
> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
> 271 MB
> 39.348
>
> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
> 271 MB
> 39.487
>
>
> use_reduce:
> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
> 407 MB
> 35.77
>
> CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1)
> 353 MB
> 35.52
>
> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
> 335 MB
> 35.31
>
> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
> 318 MB
> 35.85
>
> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
> 301 MB
> 36.39
>
> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
> 299 MB
> 37.175
>
>
> I didn't bother collecting detailed stats for vercmp, since the
> inputs/outputs are quite small and don't cause much memory increase.
> Please let me know if there are any other suggestions/improvements (and
> thanks Sid for the lru_cache suggestion!).
>

I don't see a patch attached; can you link to it?

-A


>
> Thanks,
> Chun-Yu
>
>
>
>


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Chun-Yu Shei
Awesome!  Here's a patch that adds @lru_cache to use_reduce, vercmp, and
catpkgsplit.  use_reduce was split into 2 functions, with the outer one
converting lists/sets to tuples so they can be hashed and creating a
copy of the returned list (since the caller seems to modify it
sometimes).  I tried to select cache sizes that minimized memory use increase,
while still providing about the same speedup compared to a cache with
unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases
from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the
RES column in htop increases from 280 MB -> 290 MB.

"emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while
max observed RES value actually decreases from 228 MB -> 214 MB (similar
values observed across a few before/after runs).

Here are the cache hit stats, max observed RES memory, and runtime in
seconds  for various sizes in the update case.  Caching for each
function was tested independently (only 1 function with caching enabled
at a time):

catpkgsplit:
CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419)
270 MB
39.217

CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1)
271 MB
39.112

CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000)
271 MB
39.217

CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500)
269 MB
39.438

CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000)
271 MB
39.348

CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100)
271 MB
39.487


use_reduce:
CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561)
407 MB
35.77

CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1)
353 MB
35.52

CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000)
335 MB
35.31

CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500)
318 MB
35.85

CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000)
301 MB
36.39

CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100)
299 MB
37.175


I didn't bother collecting detailed stats for vercmp, since the
inputs/outputs are quite small and don't cause much memory increase.
Please let me know if there are any other suggestions/improvements (and
thanks Sid for the lru_cache suggestion!).

Thanks,
Chun-Yu





Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Zac Medico
On 7/6/20 11:03 AM, Zac Medico wrote:
> On 7/6/20 10:30 AM, Chun-Yu Shei wrote:
>> I finally got a chance to try Sid's lru_cache suggestion, and the
>> results were really good.  Simply adding it on catpkgsplit and moving
>> the body of use_reduce into a separate function (that accepts tuples
>> instead of unhashable lists/sets) and decorating it with lru_cache
>> gets a similar 40% overall speedup for the upgrade case I tested.  It
>> seems like even a relatively small cache size (1000 entries) gives
>> quite a speedup, even though in the use_reduce case, the cache size
>> eventually reaches almost 20,000 entries if no limit is set.  With
>> these two changes, adding caching to match_from_list didn't seem to
>> make much/any difference.
> 
> That's great!
> 
>> The catch is that lru_cache is only available in Python 3.2, so would
>> it make sense to add a dummy lru_cache implementation for Python < 3.2
>> that does nothing?  There is also a backports-functools-lru-cache
>> package that's already available in the Portage tree, but that would
>> add an additional external dependency.
>>
>> I agree that refactoring could yield an even bigger gain, but
>> hopefully this can be implemented as an interim solution to speed up
>> the common emerge case of resolving upgrades.  I'm happy to submit new
>> patches for this, if someone can suggest how to best handle the Python
>> < 3.2 case. :)
>>
>> Thanks,
>> Chun-Yu
> 
> We can safely drop support for < Python 3.6 at this point. Alternatively
> we could add a compatibility shim for Python 2.7 that does not perform
> any caching, but I really don't think it's worth the trouble to support
> it any longer.

We've dropped Python 2.7, so now the minimum version is Python 3.6.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Zac Medico
On 7/6/20 10:30 AM, Chun-Yu Shei wrote:
> I finally got a chance to try Sid's lru_cache suggestion, and the
> results were really good.  Simply adding it on catpkgsplit and moving
> the body of use_reduce into a separate function (that accepts tuples
> instead of unhashable lists/sets) and decorating it with lru_cache
> gets a similar 40% overall speedup for the upgrade case I tested.  It
> seems like even a relatively small cache size (1000 entries) gives
> quite a speedup, even though in the use_reduce case, the cache size
> eventually reaches almost 20,000 entries if no limit is set.  With
> these two changes, adding caching to match_from_list didn't seem to
> make much/any difference.

That's great!

> The catch is that lru_cache is only available in Python 3.2, so would
> it make sense to add a dummy lru_cache implementation for Python < 3.2
> that does nothing?  There is also a backports-functools-lru-cache
> package that's already available in the Portage tree, but that would
> add an additional external dependency.
> 
> I agree that refactoring could yield an even bigger gain, but
> hopefully this can be implemented as an interim solution to speed up
> the common emerge case of resolving upgrades.  I'm happy to submit new
> patches for this, if someone can suggest how to best handle the Python
> < 3.2 case. :)
> 
> Thanks,
> Chun-Yu

We can safely drop support for < Python 3.6 at this point. Alternatively
we could add a compatibility shim for Python 2.7 that does not perform
any caching, but I really don't think it's worth the trouble to support
it any longer.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Chun-Yu Shei
I finally got a chance to try Sid's lru_cache suggestion, and the
results were really good.  Simply adding it on catpkgsplit and moving
the body of use_reduce into a separate function (that accepts tuples
instead of unhashable lists/sets) and decorating it with lru_cache
gets a similar 40% overall speedup for the upgrade case I tested.  It
seems like even a relatively small cache size (1000 entries) gives
quite a speedup, even though in the use_reduce case, the cache size
eventually reaches almost 20,000 entries if no limit is set.  With
these two changes, adding caching to match_from_list didn't seem to
make much/any difference.

The catch is that lru_cache is only available in Python 3.2, so would
it make sense to add a dummy lru_cache implementation for Python < 3.2
that does nothing?  There is also a backports-functools-lru-cache
package that's already available in the Portage tree, but that would
add an additional external dependency.

I agree that refactoring could yield an even bigger gain, but
hopefully this can be implemented as an interim solution to speed up
the common emerge case of resolving upgrades.  I'm happy to submit new
patches for this, if someone can suggest how to best handle the Python
< 3.2 case. :)

Thanks,
Chun-Yu


On Mon, Jul 6, 2020 at 9:10 AM Francesco Riosa  wrote:
>
> Il 06/07/20 17:50, Michael 'veremitz' Everitt ha scritto:
> > On 06/07/20 16:26, Francesco Riosa wrote:
> >> Il 29/06/20 03:58, Sid Spry ha scritto:
> >>> There are libraries that provide decorators, etc, for caching and
> >>> memoization.
> >>> Have you evaluated any of those? One is available in the standard library:
> >>> https://docs.python.org/dev/library/functools.html#functools.lru_cache
> >>>
> >>> I comment as this would increase code clarity.
> >>>
> >> I think portage developers try hard to avoid external dependancies
> >> I hope hard they do
> >>
> >>
> > I think the key word here is 'external' - anything which is part of the
> > python standard library is game for inclusion in portage, and has/does
> > provide much needed optimisation. Many of the issues in portage are
> > so-called "solved problems" in computing terms, and as such, we should take
> > advantage of these to improve performance at every available opportunity.
> > Of course, there are presently only one, two or three key developers able
> > to make/test these changes (indeed at scale) so progress is often slower
> > than desirable in current circumstances...
> >
> > [sent direct due to posting restrictions...]
> yes I've replied too fast and didn't notice Sid was referring to
> _standard_ libraries (not even recent additions)
>
> sorry for the noise
>
> - Francesco
>
>



Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Francesco Riosa

Il 06/07/20 17:50, Michael 'veremitz' Everitt ha scritto:

On 06/07/20 16:26, Francesco Riosa wrote:

Il 29/06/20 03:58, Sid Spry ha scritto:

There are libraries that provide decorators, etc, for caching and
memoization.
Have you evaluated any of those? One is available in the standard library:
https://docs.python.org/dev/library/functools.html#functools.lru_cache

I comment as this would increase code clarity.


I think portage developers try hard to avoid external dependancies
I hope hard they do



I think the key word here is 'external' - anything which is part of the
python standard library is game for inclusion in portage, and has/does
provide much needed optimisation. Many of the issues in portage are
so-called "solved problems" in computing terms, and as such, we should take
advantage of these to improve performance at every available opportunity.
Of course, there are presently only one, two or three key developers able
to make/test these changes (indeed at scale) so progress is often slower
than desirable in current circumstances...

[sent direct due to posting restrictions...]
yes I've replied too fast and didn't notice Sid was referring to 
_standard_ libraries (not even recent additions)


sorry for the noise

- Francesco




Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Francesco Riosa



Il 29/06/20 03:58, Sid Spry ha scritto:

There are libraries that provide decorators, etc, for caching and memoization.
Have you evaluated any of those? One is available in the standard library:
https://docs.python.org/dev/library/functools.html#functools.lru_cache

I comment as this would increase code clarity.


I think portage developers try hard to avoid external dependancies
I hope hard they do




Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-06-28 Thread Sid Spry
On Sat, Jun 27, 2020, at 1:34 AM, Chun-Yu Shei wrote:
> According to cProfile, catpkgsplit is called up to 1-5.5 million times
> during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its
> results reduces the time for this command from 43.53 -> 41.53 seconds --
> a 4.8% speedup.
> ---
>  lib/portage/versions.py | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/portage/versions.py b/lib/portage/versions.py
> index 0c21373cc..ffec316ce 100644
> --- a/lib/portage/versions.py
> +++ b/lib/portage/versions.py
> @@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
>  
>  _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
>  _missing_cat = 'null'
> +_catpkgsplit_cache = {}
>  
>  def catpkgsplit(mydata, silent=1, eapi=None):
>   """
> @@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   return mydata.cpv_split
>   except AttributeError:
>   pass
> +
> + cache_entry = _catpkgsplit_cache.get(mydata)
> + if cache_entry is not None:
> + return cache_entry
> +
>   mysplit = mydata.split('/', 1)
>   p_split = None
>   if len(mysplit) == 1:
> @@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   if not p_split:
>   return None
>   retval = (cat, p_split[0], p_split[1], p_split[2])
> + _catpkgsplit_cache[mydata] = retval
>   return retval
>  
>  class _pkg_str(_unicode):
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 

There are libraries that provide decorators, etc, for caching and memoization.
Have you evaluated any of those? One is available in the standard library:
https://docs.python.org/dev/library/functools.html#functools.lru_cache

I comment as this would increase code clarity.



Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-06-27 Thread Michał Górny
Dnia June 27, 2020 6:34:13 AM UTC, Chun-Yu Shei  napisał(a):
>According to cProfile, catpkgsplit is called up to 1-5.5 million times
>during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache
>its
>results reduces the time for this command from 43.53 -> 41.53 seconds
>--
>a 4.8% speedup.


Not saying caching is wrong for an interim solution but this is the kind of 
function where refactoring may yield even more gain.


>---
> lib/portage/versions.py | 7 +++
> 1 file changed, 7 insertions(+)
>
>diff --git a/lib/portage/versions.py b/lib/portage/versions.py
>index 0c21373cc..ffec316ce 100644
>--- a/lib/portage/versions.py
>+++ b/lib/portage/versions.py
>@@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
> 
> _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
> _missing_cat = 'null'
>+_catpkgsplit_cache = {}
> 
> def catpkgsplit(mydata, silent=1, eapi=None):
>   """
>@@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   return mydata.cpv_split
>   except AttributeError:
>   pass
>+
>+  cache_entry = _catpkgsplit_cache.get(mydata)
>+  if cache_entry is not None:
>+  return cache_entry
>+
>   mysplit = mydata.split('/', 1)
>   p_split = None
>   if len(mysplit) == 1:
>@@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   if not p_split:
>   return None
>   retval = (cat, p_split[0], p_split[1], p_split[2])
>+  _catpkgsplit_cache[mydata] = retval
>   return retval
> 
> class _pkg_str(_unicode):


--
Best regards, 
Michał Górny



[gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-06-27 Thread Chun-Yu Shei
According to cProfile, catpkgsplit is called up to 1-5.5 million times
during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its
results reduces the time for this command from 43.53 -> 41.53 seconds --
a 4.8% speedup.
---
 lib/portage/versions.py | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/lib/portage/versions.py b/lib/portage/versions.py
index 0c21373cc..ffec316ce 100644
--- a/lib/portage/versions.py
+++ b/lib/portage/versions.py
@@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
 
 _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
 _missing_cat = 'null'
+_catpkgsplit_cache = {}
 
 def catpkgsplit(mydata, silent=1, eapi=None):
"""
@@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
return mydata.cpv_split
except AttributeError:
pass
+
+   cache_entry = _catpkgsplit_cache.get(mydata)
+   if cache_entry is not None:
+   return cache_entry
+
mysplit = mydata.split('/', 1)
p_split = None
if len(mysplit) == 1:
@@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
if not p_split:
return None
retval = (cat, p_split[0], p_split[1], p_split[2])
+   _catpkgsplit_cache[mydata] = retval
return retval
 
 class _pkg_str(_unicode):
-- 
2.27.0.212.ge8ba1cc988-goog