Re: [gentoo-portage-dev] [PATCH] ecompress: optimize docompress -x precompressed comparison
On Sun, Jun 28, 2020 at 12:54:56PM -0700, Zac Medico wrote: > Use sort and comm with temporary files in order to compare lists > of docompress -x and precompressed files, since the file lists > can be extremely large. Also strip ${D%/} from paths in order to > reduce length. +1 looks much better. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature
Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
On Sat, Jun 27, 2020, at 1:34 AM, Chun-Yu Shei wrote: > According to cProfile, catpkgsplit is called up to 1-5.5 million times > during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its > results reduces the time for this command from 43.53 -> 41.53 seconds -- > a 4.8% speedup. > --- > lib/portage/versions.py | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/lib/portage/versions.py b/lib/portage/versions.py > index 0c21373cc..ffec316ce 100644 > --- a/lib/portage/versions.py > +++ b/lib/portage/versions.py > @@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None): > > _cat_re = re.compile('^%s$' % _cat, re.UNICODE) > _missing_cat = 'null' > +_catpkgsplit_cache = {} > > def catpkgsplit(mydata, silent=1, eapi=None): > """ > @@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None): > return mydata.cpv_split > except AttributeError: > pass > + > + cache_entry = _catpkgsplit_cache.get(mydata) > + if cache_entry is not None: > + return cache_entry > + > mysplit = mydata.split('/', 1) > p_split = None > if len(mysplit) == 1: > @@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None): > if not p_split: > return None > retval = (cat, p_split[0], p_split[1], p_split[2]) > + _catpkgsplit_cache[mydata] = retval > return retval > > class _pkg_str(_unicode): > -- > 2.27.0.212.ge8ba1cc988-goog > There are libraries that provide decorators, etc, for caching and memoization. Have you evaluated any of those? One is available in the standard library: https://docs.python.org/dev/library/functools.html#functools.lru_cache I comment as this would increase code clarity.
[gentoo-portage-dev] Re: [PATCH] ecompress: optimize docompress -x precompressed comparison
On 6/28/20 12:54 PM, Zac Medico wrote: > + LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > > "${T}/.ecompress_skip_files_sorted"|| die > + LC_COLLATE=C sort -zu > "${T}/.ecompress_had_precompressed" > > "${T}/.ecompress_had_precompressed_sorted" || die > + LC_COLLATE=C comm -z13 > "${T}/.ecompress_skip_files_sorted" > "${T}/.ecompress_had_precompressed_sorted" > > "${T}/.ecompress_had_precompressed" || die I've updated my branch to use \n separators, since posix comm does not support the -z option. -- Thanks, Zac signature.asc Description: OpenPGP digital signature
[gentoo-portage-dev] [PATCH] ecompress: optimize docompress -x precompressed comparison
Use sort and comm with temporary files in order to compare lists of docompress -x and precompressed files, since the file lists can be extremely large. Also strip ${D%/} from paths in order to reduce length. Bug: https://bugs.gentoo.org/721516 Suggested-by: Robin H. Johnson Signed-off-by: Zac Medico --- bin/ecompress | 29 ++- .../tests/resolver/ResolverPlayground.py | 1 + 2 files changed, 16 insertions(+), 14 deletions(-) diff --git a/bin/ecompress b/bin/ecompress index 60b083834..983a4d1f7 100755 --- a/bin/ecompress +++ b/bin/ecompress @@ -19,29 +19,30 @@ while [[ $# -gt 0 ]] ; do shift skip_dirs=() - skip_files=() + > "${T}/.ecompress_skip_files" || die for skip; do if [[ -d ${ED%/}/${skip#/} ]]; then skip_dirs+=( "${ED%/}/${skip#/}" ) else rm -f "${ED%/}/${skip#/}.ecompress" || die - skip_files+=("${ED%/}/${skip#/}") + printf '%s\0' "${EPREFIX}/${skip#/}" >> "${T}/.ecompress_skip_files" fi done if [[ ${#skip_dirs[@]} -gt 0 ]]; then - while read -r -d ''; do - skip_files+=("${REPLY%.ecompress}") + while read -r -d '' skip; do + skip=${skip%.ecompress} + printf '%s\0' "${skip#${D%/}}" >> "${T}/.ecompress_skip_files" done < <(find "${skip_dirs[@]}" -name '*.ecompress' -print0 -delete || die) fi - if [[ ${#skip_files[@]} -gt 0 && -s ${T}/.ecompress_had_precompressed ]]; then - sed_args=() - for f in "${skip_files[@]}"; do - sed_args+=("s|^${f}\$||;") - done - sed_args+=('/^$/d') - sed -f - -i "${T}/.ecompress_had_precompressed" <<< "${sed_args[@]}" || die + if [[ -s ${T}/.ecompress_skip_files && -s ${T}/.ecompress_had_precompressed ]]; then + # Filter skipped files from ${T}/.ecompress_had_precompressed, + # using temporary files since these lists can be extremely large. + LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > "${T}/.ecompress_skip_files_sorted"|| die + LC_COLLATE=C sort -zu "${T}/.ecompress_had_precompressed" > "${T}/.ecompress_had_precompressed_sorted" || die + LC_COLLATE=C comm -z13 "${T}/.ecompress_skip_files_sorted" "${T}/.ecompress_had_precompressed_sorted" > "${T}/.ecompress_had_precompressed" || die + rm -f "${T}/.ecompress_had_precompressed_sorted" "${T}/.ecompress_skip_files"{,_sorted} fi exit 0 @@ -81,7 +82,7 @@ while [[ $# -gt 0 ]] ; do continue 2 fi done - echo "${path}" >> "${T}"/.ecompress_had_precompressed + printf '%s\0' "${path#${D%/}}" >> "${T}"/.ecompress_had_precompressed || die ;; esac @@ -195,8 +196,8 @@ if [[ -s ${T}/.ecompress_had_precompressed ]]; then eqawarn "(manpages, documentation) when automatic compression is used:" eqawarn n=0 - while read -r f; do - eqawarn " ${f#${D%/}}" + while read -r -d '' f; do + eqawarn " ${f}" if [[ $(( n++ )) -eq 10 ]]; then eqawarn " ..." break diff --git a/lib/portage/tests/resolver/ResolverPlayground.py b/lib/portage/tests/resolver/ResolverPlayground.py index de80a0cc1..ec2e31ae9 100644 --- a/lib/portage/tests/resolver/ResolverPlayground.py +++ b/lib/portage/tests/resolver/ResolverPlayground.py @@ -91,6 +91,7 @@ class ResolverPlayground(object): "chgrp", "chmod", "chown", + "comm", "cp", "egrep", "env", -- 2.25.3