Re: [gentoo-portage-dev] [PATCH] ecompress: optimize docompress -x precompressed comparison

2020-06-28 Thread Robin H. Johnson
On Sun, Jun 28, 2020 at 12:54:56PM -0700, Zac Medico wrote:
> Use sort and comm with temporary files in order to compare lists
> of docompress -x and precompressed files, since the file lists
> can be extremely large. Also strip ${D%/} from paths in order to
> reduce length.
+1 looks much better.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Treasurer
E-Mail   : robb...@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136


signature.asc
Description: PGP signature


Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function

2020-06-28 Thread Sid Spry
On Sat, Jun 27, 2020, at 1:34 AM, Chun-Yu Shei wrote:
> According to cProfile, catpkgsplit is called up to 1-5.5 million times
> during "emerge -uDvpU --with-bdeps=y @world". Adding a dict to cache its
> results reduces the time for this command from 43.53 -> 41.53 seconds --
> a 4.8% speedup.
> ---
>  lib/portage/versions.py | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/lib/portage/versions.py b/lib/portage/versions.py
> index 0c21373cc..ffec316ce 100644
> --- a/lib/portage/versions.py
> +++ b/lib/portage/versions.py
> @@ -312,6 +312,7 @@ def _pkgsplit(mypkg, eapi=None):
>  
>  _cat_re = re.compile('^%s$' % _cat, re.UNICODE)
>  _missing_cat = 'null'
> +_catpkgsplit_cache = {}
>  
>  def catpkgsplit(mydata, silent=1, eapi=None):
>   """
> @@ -331,6 +332,11 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   return mydata.cpv_split
>   except AttributeError:
>   pass
> +
> + cache_entry = _catpkgsplit_cache.get(mydata)
> + if cache_entry is not None:
> + return cache_entry
> +
>   mysplit = mydata.split('/', 1)
>   p_split = None
>   if len(mysplit) == 1:
> @@ -343,6 +349,7 @@ def catpkgsplit(mydata, silent=1, eapi=None):
>   if not p_split:
>   return None
>   retval = (cat, p_split[0], p_split[1], p_split[2])
> + _catpkgsplit_cache[mydata] = retval
>   return retval
>  
>  class _pkg_str(_unicode):
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 

There are libraries that provide decorators, etc, for caching and memoization.
Have you evaluated any of those? One is available in the standard library:
https://docs.python.org/dev/library/functools.html#functools.lru_cache

I comment as this would increase code clarity.



[gentoo-portage-dev] Re: [PATCH] ecompress: optimize docompress -x precompressed comparison

2020-06-28 Thread Zac Medico
On 6/28/20 12:54 PM, Zac Medico wrote:
> + LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > 
> "${T}/.ecompress_skip_files_sorted"|| die
> + LC_COLLATE=C sort -zu 
> "${T}/.ecompress_had_precompressed" > 
> "${T}/.ecompress_had_precompressed_sorted" || die
> + LC_COLLATE=C comm -z13 
> "${T}/.ecompress_skip_files_sorted" 
> "${T}/.ecompress_had_precompressed_sorted" > 
> "${T}/.ecompress_had_precompressed" || die

I've updated my branch to use \n separators, since posix comm does not
support the -z option.
-- 
Thanks,
Zac



signature.asc
Description: OpenPGP digital signature


[gentoo-portage-dev] [PATCH] ecompress: optimize docompress -x precompressed comparison

2020-06-28 Thread Zac Medico
Use sort and comm with temporary files in order to compare lists
of docompress -x and precompressed files, since the file lists
can be extremely large. Also strip ${D%/} from paths in order to
reduce length.

Bug: https://bugs.gentoo.org/721516
Suggested-by: Robin H. Johnson 
Signed-off-by: Zac Medico 
---
 bin/ecompress | 29 ++-
 .../tests/resolver/ResolverPlayground.py  |  1 +
 2 files changed, 16 insertions(+), 14 deletions(-)

diff --git a/bin/ecompress b/bin/ecompress
index 60b083834..983a4d1f7 100755
--- a/bin/ecompress
+++ b/bin/ecompress
@@ -19,29 +19,30 @@ while [[ $# -gt 0 ]] ; do
shift
 
skip_dirs=()
-   skip_files=()
+   > "${T}/.ecompress_skip_files" || die
for skip; do
if [[ -d ${ED%/}/${skip#/} ]]; then
skip_dirs+=( "${ED%/}/${skip#/}" )
else
rm -f "${ED%/}/${skip#/}.ecompress" || die
-   skip_files+=("${ED%/}/${skip#/}")
+   printf '%s\0' "${EPREFIX}/${skip#/}" >> 
"${T}/.ecompress_skip_files"
fi
done
 
if [[ ${#skip_dirs[@]} -gt 0 ]]; then
-   while read -r -d ''; do
-   skip_files+=("${REPLY%.ecompress}")
+   while read -r -d '' skip; do
+   skip=${skip%.ecompress}
+   printf '%s\0' "${skip#${D%/}}" >> 
"${T}/.ecompress_skip_files"
done < <(find "${skip_dirs[@]}" -name '*.ecompress' 
-print0 -delete || die)
fi
 
-   if [[ ${#skip_files[@]} -gt 0 && -s 
${T}/.ecompress_had_precompressed ]]; then
-   sed_args=()
-   for f in "${skip_files[@]}"; do
-   sed_args+=("s|^${f}\$||;")
-   done
-   sed_args+=('/^$/d')
-   sed -f - -i "${T}/.ecompress_had_precompressed" <<< 
"${sed_args[@]}" || die
+   if [[ -s ${T}/.ecompress_skip_files && -s 
${T}/.ecompress_had_precompressed ]]; then
+   # Filter skipped files from 
${T}/.ecompress_had_precompressed,
+   # using temporary files since these lists can be 
extremely large.
+   LC_COLLATE=C sort -zu "${T}/.ecompress_skip_files" > 
"${T}/.ecompress_skip_files_sorted"|| die
+   LC_COLLATE=C sort -zu 
"${T}/.ecompress_had_precompressed" > 
"${T}/.ecompress_had_precompressed_sorted" || die
+   LC_COLLATE=C comm -z13 
"${T}/.ecompress_skip_files_sorted" "${T}/.ecompress_had_precompressed_sorted" 
> "${T}/.ecompress_had_precompressed" || die
+   rm -f "${T}/.ecompress_had_precompressed_sorted" 
"${T}/.ecompress_skip_files"{,_sorted}
fi
 
exit 0
@@ -81,7 +82,7 @@ while [[ $# -gt 0 ]] ; do
continue 2
fi
done
-   echo "${path}" >> 
"${T}"/.ecompress_had_precompressed
+   printf '%s\0' "${path#${D%/}}" 
>> "${T}"/.ecompress_had_precompressed || die
;;
esac
 
@@ -195,8 +196,8 @@ if [[ -s ${T}/.ecompress_had_precompressed ]]; then
eqawarn "(manpages, documentation) when automatic compression is used:"
eqawarn
n=0
-   while read -r f; do
-   eqawarn "  ${f#${D%/}}"
+   while read -r -d '' f; do
+   eqawarn "  ${f}"
if [[ $(( n++ )) -eq 10 ]]; then
eqawarn "  ..."
break
diff --git a/lib/portage/tests/resolver/ResolverPlayground.py 
b/lib/portage/tests/resolver/ResolverPlayground.py
index de80a0cc1..ec2e31ae9 100644
--- a/lib/portage/tests/resolver/ResolverPlayground.py
+++ b/lib/portage/tests/resolver/ResolverPlayground.py
@@ -91,6 +91,7 @@ class ResolverPlayground(object):
"chgrp",
"chmod",
"chown",
+   "comm",
"cp",
"egrep",
"env",
-- 
2.25.3