Re: [gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit

2020-07-13 Thread Chun-Yu Shei
Ah, I wasn't aware that I should have added that... I'm happy to say "Signed-off-by: Chun-Yu Shei " somewhere if necessary. The patch has gone through Google's open source patching approval process and I'm able to agree to any CLA required. On Mon, Jul 13, 2020 at 11:54 AM Ulrich Muel

[gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit

2020-07-13 Thread Chun-Yu Shei
Each of these functions is called repeatedly with the same arguments many times. Cache sizes were selected to minimize memory use increase, while still providing about the same speedup compared to a cache with unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases from 44.32s ->

Re: [gentoo-portage-dev] [PATCH] Add caching to use_reduce,

2020-07-13 Thread Chun-Yu Shei
Sounds good, here's an updated patch that uses frozenset instead. I also moved the lru_cache imports up with the other system imports and renamed use_reduce_cached to _use_reduce_cached to fit with the existing convention. Memory usage and performance are essentially unchanged due to the tuple

Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Chun-Yu Shei
t; > > On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei wrote: > >> Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and >> catpkgsplit. use_reduce was split into 2 functions, with the outer one >> converting lists/sets to tuples so they can be hashed and

[gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit

2020-07-09 Thread Chun-Yu Shei
Each of these functions is called repeatedly with the same arguments many times. Cache sizes were selected to minimize memory use increase, while still providing about the same speedup compared to a cache with unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases from 44.32s ->

Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-09 Thread Chun-Yu Shei
Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and catpkgsplit. use_reduce was split into 2 functions, with the outer one converting lists/sets to tuples so they can be hashed and creating a copy of the returned list (since the caller seems to modify it sometimes). I tried

Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-07-06 Thread Chun-Yu Shei
I finally got a chance to try Sid's lru_cache suggestion, and the results were really good. Simply adding it on catpkgsplit and moving the body of use_reduce into a separate function (that accepts tuples instead of unhashable lists/sets) and decorating it with lru_cache gets a similar 40% overall

[gentoo-portage-dev] [PATCH 2/3] Add caching to use_reduce function

2020-06-27 Thread Chun-Yu Shei
This function is called extremely frequently with similar arguments, so this optimization reduces "emerge -uDvpU --with-bdeps=y @world" runtime from 43.5 -> 34.5s -- a 25.8% speedup. --- lib/portage/dep/__init__.py | 26 ++ 1 file changed, 26 insertions(+) diff --git

[gentoo-portage-dev] Add caching to a few commonly used functions

2020-06-27 Thread Chun-Yu Shei
Hi, I was recently interested in whether portage could be speed up, since dependency resolution can sometimes take a while on slower machines. After generating some flame graphs with cProfile and vmprof, I found 3 functions which seem to be called extremely frequently with the same arguments:

[gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-06-27 Thread Chun-Yu Shei
According to cProfile, catpkgsplit is called up to 1-5.5 million times during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its results reduces the time for this command from 43.53 -> 41.53 seconds -- a 4.8% speedup. --- lib/portage/versions.py | 7 +++ 1 file changed, 7

[gentoo-portage-dev] [PATCH 3/3] Add partial caching to match_from_list

2020-06-27 Thread Chun-Yu Shei
This function is called frequently with similar arguments, so cache as much of the partial results as possible. It seems that "match_from_list" must return a list containing actual entries from the "candidate_list" argument, so we store frozensets in "_match_from_list_cache" and test items from

Re: [gentoo-portage-dev] Add caching to a few commonly used functions

2020-06-27 Thread Chun-Yu Shei
n Sat, Jun 27, 2020, 12:35 AM Fabian Groffen wrote: > > Hi Chun-Yu, > > On 26-06-2020 23:34:12 -0700, Chun-Yu Shei wrote: > > Hi, > > > > I was recently interested in whether portage could be speed up, since > > dependency resolution can sometimes take a while