[gentoo-dev] Last-rites: dev-perl/gnome2-perl
# Kent Fredric (2020-07-10) # No reverse dependencies, and Gtk2 support is becomming # obsolete in Gentoo. # Removal in 30 days pgpqlJgLDz9hL.pgp Description: OpenPGP digital signature
Re: [gentoo-dev] euses(1) Reimplementation
Hi Fabian, cheers for your response. On Thu, Jul 09, 2020 at 08:39:30AM +0200, Fabian Groffen wrote: > Sounds like you've put some work into this. You could compare against > `quse -D ` (from portage-utils) as well to get another point of > measure. quse is about half as fast as my tool, however that's understandable as it's working primarily from ebuild scripts, as opposed to USE-flag descriptors. The two tools yield exactly the same results, providing that `-s` is passed to ash-euses (its default behaviour is to include flag descriptions in the search; `-s` instructs it to only display matches which appear as a flag). The disadvantage of my tool is its inability to understand the nature of the packages, such that it cannot offer command-line options such as "only display results related to installed packages". > I don't know what you did measure euses against though, it seems fairly > fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a > specific case you're focussing on? It is very fast, however it could be faster. I ran it through callgrind and kcachegrind to find that it spends over 56% of its execution time on strncpy calls; the string-construction is extremely inefficient. My reimplementation also aims to consist of more maintainable and clean code (for example, the original tool declares 23 nondescriptly named local variables at the top of main(), and more throughout the function). Regardless, the obvious main advantage is that it is fully compliant with the repos.conf syntax, but also works on legacy PORTDIR systems. As an irrelevant aside, my version also uses the strcasestr(3) function to perform the case-insensitive search. Unfortunately, this forces _GNU_SOURCE to be defined for the inclusion of `string.h`---however, it is hugely faster than running tolower(3) on every character of the query and buffer, as the canonicalisation (in this case, converting the needle and haystack to lower-case), is done as part of the standard string-searching function call (`two_way_{long,short}_needle`) [1]. As discussed in my previous e-mail, I'm working on reimplementing this with the Two-Way algorithm (and shift tables for small needles) to avoid the non-standard dependency, although it might take a few days. Ashley. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/str-two-way.h;h=de247fbc98b83a6e1653288e4161751710d026ce;hb=HEAD#l35 -- Ashley Dixon suugaku.co.uk 2A9A 4117 DA96 D18A 8A7B B0D2 A30E BF25 F290 A8AA signature.asc Description: PGP signature
Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
On Thu, Jul 9, 2020 at 2:06 PM Chun-Yu Shei wrote: > Hmm, that's strange... it seems to have made it to the list archives: > https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65 > > (but it is entirely possible that I used "git send-email" incorrectly) > Ahhh it's visible there; I'll blame gMail ;) -A > > On Thu, Jul 9, 2020 at 2:04 PM Alec Warner wrote: > >> >> >> On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei wrote: >> >>> Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and >>> catpkgsplit. use_reduce was split into 2 functions, with the outer one >>> converting lists/sets to tuples so they can be hashed and creating a >>> copy of the returned list (since the caller seems to modify it >>> sometimes). I tried to select cache sizes that minimized memory use >>> increase, >>> while still providing about the same speedup compared to a cache with >>> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases >>> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the >>> RES column in htop increases from 280 MB -> 290 MB. >>> >>> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while >>> max observed RES value actually decreases from 228 MB -> 214 MB (similar >>> values observed across a few before/after runs). >>> >>> Here are the cache hit stats, max observed RES memory, and runtime in >>> seconds for various sizes in the update case. Caching for each >>> function was tested independently (only 1 function with caching enabled >>> at a time): >>> >>> catpkgsplit: >>> CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419) >>> 270 MB >>> 39.217 >>> >>> CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1) >>> 271 MB >>> 39.112 >>> >>> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000) >>> 271 MB >>> 39.217 >>> >>> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500) >>> 269 MB >>> 39.438 >>> >>> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000) >>> 271 MB >>> 39.348 >>> >>> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100) >>> 271 MB >>> 39.487 >>> >>> >>> use_reduce: >>> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561) >>> 407 MB >>> 35.77 >>> >>> CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1) >>> 353 MB >>> 35.52 >>> >>> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000) >>> 335 MB >>> 35.31 >>> >>> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500) >>> 318 MB >>> 35.85 >>> >>> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000) >>> 301 MB >>> 36.39 >>> >>> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100) >>> 299 MB >>> 37.175 >>> >>> >>> I didn't bother collecting detailed stats for vercmp, since the >>> inputs/outputs are quite small and don't cause much memory increase. >>> Please let me know if there are any other suggestions/improvements (and >>> thanks Sid for the lru_cache suggestion!). >>> >> >> I don't see a patch attached; can you link to it? >> >> -A >> >> >>> >>> Thanks, >>> Chun-Yu >>> >>> >>> >>>
Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
Hmm, that's strange... it seems to have made it to the list archives: https://archives.gentoo.org/gentoo-portage-dev/message/a4db905a64e3c1f6d88c4876e8291a65 (but it is entirely possible that I used "git send-email" incorrectly) On Thu, Jul 9, 2020 at 2:04 PM Alec Warner wrote: > > > On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei wrote: > >> Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and >> catpkgsplit. use_reduce was split into 2 functions, with the outer one >> converting lists/sets to tuples so they can be hashed and creating a >> copy of the returned list (since the caller seems to modify it >> sometimes). I tried to select cache sizes that minimized memory use >> increase, >> while still providing about the same speedup compared to a cache with >> unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases >> from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the >> RES column in htop increases from 280 MB -> 290 MB. >> >> "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while >> max observed RES value actually decreases from 228 MB -> 214 MB (similar >> values observed across a few before/after runs). >> >> Here are the cache hit stats, max observed RES memory, and runtime in >> seconds for various sizes in the update case. Caching for each >> function was tested independently (only 1 function with caching enabled >> at a time): >> >> catpkgsplit: >> CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419) >> 270 MB >> 39.217 >> >> CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1) >> 271 MB >> 39.112 >> >> CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000) >> 271 MB >> 39.217 >> >> CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500) >> 269 MB >> 39.438 >> >> CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000) >> 271 MB >> 39.348 >> >> CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100) >> 271 MB >> 39.487 >> >> >> use_reduce: >> CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561) >> 407 MB >> 35.77 >> >> CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1) >> 353 MB >> 35.52 >> >> CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000) >> 335 MB >> 35.31 >> >> CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500) >> 318 MB >> 35.85 >> >> CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000) >> 301 MB >> 36.39 >> >> CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100) >> 299 MB >> 37.175 >> >> >> I didn't bother collecting detailed stats for vercmp, since the >> inputs/outputs are quite small and don't cause much memory increase. >> Please let me know if there are any other suggestions/improvements (and >> thanks Sid for the lru_cache suggestion!). >> > > I don't see a patch attached; can you link to it? > > -A > > >> >> Thanks, >> Chun-Yu >> >> >> >>
Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
On Thu, Jul 9, 2020 at 12:03 AM Chun-Yu Shei wrote: > Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and > catpkgsplit. use_reduce was split into 2 functions, with the outer one > converting lists/sets to tuples so they can be hashed and creating a > copy of the returned list (since the caller seems to modify it > sometimes). I tried to select cache sizes that minimized memory use > increase, > while still providing about the same speedup compared to a cache with > unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases > from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the > RES column in htop increases from 280 MB -> 290 MB. > > "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while > max observed RES value actually decreases from 228 MB -> 214 MB (similar > values observed across a few before/after runs). > > Here are the cache hit stats, max observed RES memory, and runtime in > seconds for various sizes in the update case. Caching for each > function was tested independently (only 1 function with caching enabled > at a time): > > catpkgsplit: > CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419) > 270 MB > 39.217 > > CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1) > 271 MB > 39.112 > > CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000) > 271 MB > 39.217 > > CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500) > 269 MB > 39.438 > > CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000) > 271 MB > 39.348 > > CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100) > 271 MB > 39.487 > > > use_reduce: > CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561) > 407 MB > 35.77 > > CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1) > 353 MB > 35.52 > > CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000) > 335 MB > 35.31 > > CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500) > 318 MB > 35.85 > > CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000) > 301 MB > 36.39 > > CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100) > 299 MB > 37.175 > > > I didn't bother collecting detailed stats for vercmp, since the > inputs/outputs are quite small and don't cause much memory increase. > Please let me know if there are any other suggestions/improvements (and > thanks Sid for the lru_cache suggestion!). > I don't see a patch attached; can you link to it? -A > > Thanks, > Chun-Yu > > > >
[gentoo-dev] Last-rites: kde-apps/kblog
# Andreas Sturmlechner (2020-07-09) # Dead upstream, no reverse dependencies. # Removal in 30 days. kde-apps/kblog signature.asc Description: This is a digitally signed message part.
[gentoo-portage-dev] [PATCH] Add caching to use_reduce, vercmp, and catpkgsplit
Each of these functions is called repeatedly with the same arguments many times. Cache sizes were selected to minimize memory use increase, while still providing about the same speedup compared to a cache with unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the RES column in htop increases from 280 MB -> 290 MB. "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while max observed RES value actually decreases from 228 MB -> 214 MB (similar values observed across a few before/after runs). --- lib/portage/dep/__init__.py | 106 +--- lib/portage/versions.py | 3 + 2 files changed, 66 insertions(+), 43 deletions(-) diff --git a/lib/portage/dep/__init__.py b/lib/portage/dep/__init__.py index 72988357a..4d91a411a 100644 --- a/lib/portage/dep/__init__.py +++ b/lib/portage/dep/__init__.py @@ -23,6 +23,7 @@ portage.proxy.lazyimport.lazyimport(globals(), 'portage.util:cmp_sort_key,writemsg', ) +from functools import lru_cache from portage import _encodings, _unicode_decode, _unicode_encode from portage.eapi import _get_eapi_attrs from portage.exception import InvalidAtom, InvalidData, InvalidDependString @@ -404,49 +405,9 @@ def paren_enclose(mylist, unevaluated_atom=False, opconvert=False): mystrparts.append(x) return " ".join(mystrparts) -def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), is_src_uri=False, \ - eapi=None, opconvert=False, flat=False, is_valid_flag=None, token_class=None, matchnone=False, - subset=None): - """ - Takes a dep string and reduces the use? conditionals out, leaving an array - with subarrays. All redundant brackets are removed. - - @param depstr: depstring - @type depstr: String - @param uselist: Sequence of use enabled flags - @type uselist: Sequence - @param masklist: Sequence of masked flags (always treated as disabled) - @type masklist: Sequence - @param matchall: Treat all conditionals as active. Used by repoman. - @type matchall: Bool - @param excludeall: Sequence of flags for which negated conditionals are always treated as inactive. - @type excludeall: Sequence - @param is_src_uri: Indicates if depstr represents a SRC_URI - @type is_src_uri: Bool - @param eapi: Indicates the EAPI the dep string has to comply to - @type eapi: String - @param opconvert: Put every operator as first element into it's argument list - @type opconvert: Bool - @param flat: Create a flat list of all tokens - @type flat: Bool - @param is_valid_flag: Function that decides if a given use flag might be used in use conditionals - @type is_valid_flag: Function - @param token_class: Convert all non operator tokens into this class - @type token_class: Class - @param matchnone: Treat all conditionals as inactive. Used by digestgen(). - @type matchnone: Bool - @param subset: Select a subset of dependencies conditional on the given flags - @type subset: Sequence - @rtype: List - @return: The use reduced depend array - """ - if isinstance(depstr, list): - if portage._internal_caller: - warnings.warn(_("Passing paren_reduced dep arrays to %s is deprecated. " + \ - "Pass the original dep string instead.") % \ - ('portage.dep.use_reduce',), DeprecationWarning, stacklevel=2) - depstr = paren_enclose(depstr) - +@lru_cache(1024) +def use_reduce_cached(depstr, uselist, masklist, matchall, excludeall, is_src_uri, eapi, \ + opconvert, flat, is_valid_flag, token_class, matchnone, subset): if opconvert and flat: raise ValueError("portage.dep.use_reduce: 'opconvert' and 'flat' are mutually exclusive") @@ -769,6 +730,65 @@ def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), i return stack[0] +def use_reduce(depstr, uselist=(), masklist=(), matchall=False, excludeall=(), is_src_uri=False, \ + eapi=None, opconvert=False, flat=False, is_valid_flag=None, token_class=None, matchnone=False, + subset=None): + """ + Takes a dep string and reduces the use? conditionals out, leaving an array + with subarrays. All redundant brackets are removed. + + @param depstr: depstring + @type depstr: String + @param uselist: Sequence of use enabled flags + @type uselist: Sequence + @param masklist: Sequence of masked flags (always treated as disabled) + @type masklist: Sequence + @param matchall: Treat all conditionals as active. Used by repoman. + @type matchall: Bool + @param excludeall: Sequence of flags for which negated conditionals are
[gentoo-dev] Python 3.8 is now stable-ready
Hi, everyone. I'd like to announce that thanks to the hard work of our arch testers, Python 3.8 target is now available on stable systems on some of our architectures, notably amd64, arm and arm64. Hopefully, it will also become available on other architectures as arch teams proceed. Package maintainers, please take this as a cue to start enabling Python 3.8 in your development systems and testing your packages with it. Additional testing on 3.9 is also recommended. If you need help porting, please don't hesitate to contact the Python team or myself. -- Best regards, Michał Górny signature.asc Description: This is a digitally signed message part
Re: [gentoo-portage-dev] [PATCH 1/3] Add caching to catpkgsplit function
Awesome! Here's a patch that adds @lru_cache to use_reduce, vercmp, and catpkgsplit. use_reduce was split into 2 functions, with the outer one converting lists/sets to tuples so they can be hashed and creating a copy of the returned list (since the caller seems to modify it sometimes). I tried to select cache sizes that minimized memory use increase, while still providing about the same speedup compared to a cache with unbounded size. "emerge -uDvpU --with-bdeps=y @world" runtime decreases from 44.32s -> 29.94s -- a 48% speedup, while the maximum value of the RES column in htop increases from 280 MB -> 290 MB. "emerge -ep @world" time slightly decreases from 18.77s -> 17.93, while max observed RES value actually decreases from 228 MB -> 214 MB (similar values observed across a few before/after runs). Here are the cache hit stats, max observed RES memory, and runtime in seconds for various sizes in the update case. Caching for each function was tested independently (only 1 function with caching enabled at a time): catpkgsplit: CacheInfo(hits=133, misses=21419, maxsize=None, currsize=21419) 270 MB 39.217 CacheInfo(hits=1218900, misses=24905, maxsize=1, currsize=1) 271 MB 39.112 CacheInfo(hits=1212675, misses=31022, maxsize=5000, currsize=5000) 271 MB 39.217 CacheInfo(hits=1207879, misses=35878, maxsize=2500, currsize=2500) 269 MB 39.438 CacheInfo(hits=1199402, misses=44250, maxsize=1000, currsize=1000) 271 MB 39.348 CacheInfo(hits=1149150, misses=94610, maxsize=100, currsize=100) 271 MB 39.487 use_reduce: CacheInfo(hits=45326, misses=18660, maxsize=None, currsize=18561) 407 MB 35.77 CacheInfo(hits=45186, misses=18800, maxsize=1, currsize=1) 353 MB 35.52 CacheInfo(hits=44977, misses=19009, maxsize=5000, currsize=5000) 335 MB 35.31 CacheInfo(hits=44691, misses=19295, maxsize=2500, currsize=2500) 318 MB 35.85 CacheInfo(hits=44178, misses=19808, maxsize=1000, currsize=1000) 301 MB 36.39 CacheInfo(hits=41211, misses=22775, maxsize=100, currsize=100) 299 MB 37.175 I didn't bother collecting detailed stats for vercmp, since the inputs/outputs are quite small and don't cause much memory increase. Please let me know if there are any other suggestions/improvements (and thanks Sid for the lru_cache suggestion!). Thanks, Chun-Yu
Re: [gentoo-dev] euses(1) Reimplementation
Hi Ashley, Sounds like you've put some work into this. You could compare against `quse -D ` (from portage-utils) as well to get another point of measure. I don't know what you did measure euses against though, it seems fairly fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a specific case you're focussing on? Thanks, Fabian On 09-07-2020 02:33:28 +0100, Ashley Dixon wrote: > Hi, Gentoo-Dev. > > A while ago, I had a bit of a rant on Gentoo-User regarding the current > issues > with `app-portage/euses`. Specifically, the fact that it does not work on > newer > Gentoo-like systems which have moved away from PORTDIR and conform to > the > repos.conf/ syntax [1, 2, 3]. There are also some bugs/issues in the code, > such > as malloc(3)'ing without checking the result, et cetera. > > Over the past month or so, I've completed a ground-up rewrite which provides > a > similar interface and functionality, that remedies all of these issues, and > adds > a few useful features on top; it is also written in standard C with > no > dependencies other than the standard library. In addition to processing all > the > repositories described in the repos.conf directory, it is also written to > be > remarkably robust, optionally working from the PORTDIR make.conf key-value > pair > or environment variable for legacy systems. (As an initial user pointed > out, > make.conf cannot be used if it is a directory, and will only touched at all > if > the legacy option is enabled and the $PORTDIR environment variable is unset > or > infeasible.) > > Almost all of the features from the original euses tool are present, with > extras > to facilitate multi-repo searching (in the rare event that a > non-Gentoo.git > repository has USE-description files). From my testing, it is equally, if > not > more, performant than the original tool, despite the extra work of > traversing > the meta-repository description files. A copy of the help page is included > here, > for convenience (run with the `-h` or `--help` option): > > ash-euses command-line argument summary. > Syntax: ./ash-euses [options] substrings > > --list-repos-r Prepend a list of located repositories (repos.conf/ only). > --repo-names-n Print repository names for each match. > --repo-paths-p Print repository details for each match (implies > repo-names). > --help -h Print this help information and exit. > --version -v Prepend version and license information to the output. > --strict-s Search only in the flag field, excluding the description. > --portdir -d Attempt to use the PORTDIR value. > --quiet -q Do not complain about PORTDIR. > --no-case -c Perform a case-insensitive search across the files. > --print-needles -e Prepend each match with the relevant needle substring. > --no-interrupt -i Do not interrupt the search results with warnings. > --package -k Restrict the search to category-package description files. > --colour-o Print the package, flag, and description in distinct > colours. > -- Consider all further arguments as substrings/queries. > > There's also a man page in the tree, providing deeper explanations for > these > command-line arguments: `ash-euses.1`. > > Off-line, I'm working on a strstr(3) (and strcasestr) reimplementation using > the > Two-Way string-matching algorithm [4] and shift tables, to remove the > dependency > on _GNU_SOURCE for the case-insensitive variant (it is very annoying that > this > is not a standard function, as it only defines CANON_ELEMENT to tolower(3) > and > calls glibc strstr [5]). > > For all my tests, the search yield is generally identical to euses(1). An > ebuild > is also included in the tree, however I am hardly experienced with writing > them, > so I'm not entirely sure if it respects the globally defined compiler > flags. > Regardless, I am posting here for anyone who is interested in using/testing > this > program, with the hope that it can provide an alternative for quick > flag-lookup > on newer, standards-conformant Gentoo-like systems. > > The source code is at [6], and a gzipped tarball of the latest release > (v0.3) > can be found at [7]. Thank you in advance to all interested parties. > > Cheers, > Ashley. > > P.S. I really need a better name for this. A portmanteau of my first name, > and > the tool of which the program is a replica, doesn't seem very creative. > > [1] https://bugs.gentoo.org/546210 > [2] https://bugs.gentoo.org/378603 > [3] https://bugs.gentoo.org/663706#c4 > [4] https://dl.acm.org/doi/abs/10.1145/116825.116845 > [5] > https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcasestr.c;h=d2964c5548b9ea7a68fc5b18b25ddfe7ddd6835c;hb=HEAD#l45 > [6] http://git.suugaku.co.uk/ash-euses/tree/ > [7] http://git.suugaku.co.uk/ash-euses/snapshot/ash-euses-0.3.tar.gz > > -- >