Re: [gentoo-dev] euses(1) Reimplementation
Hi Fabian, cheers for your response. On Thu, Jul 09, 2020 at 08:39:30AM +0200, Fabian Groffen wrote: > Sounds like you've put some work into this. You could compare against > `quse -D ` (from portage-utils) as well to get another point of > measure. quse is about half as fast as my tool, however that's understandable as it's working primarily from ebuild scripts, as opposed to USE-flag descriptors. The two tools yield exactly the same results, providing that `-s` is passed to ash-euses (its default behaviour is to include flag descriptions in the search; `-s` instructs it to only display matches which appear as a flag). The disadvantage of my tool is its inability to understand the nature of the packages, such that it cannot offer command-line options such as "only display results related to installed packages". > I don't know what you did measure euses against though, it seems fairly > fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a > specific case you're focussing on? It is very fast, however it could be faster. I ran it through callgrind and kcachegrind to find that it spends over 56% of its execution time on strncpy calls; the string-construction is extremely inefficient. My reimplementation also aims to consist of more maintainable and clean code (for example, the original tool declares 23 nondescriptly named local variables at the top of main(), and more throughout the function). Regardless, the obvious main advantage is that it is fully compliant with the repos.conf syntax, but also works on legacy PORTDIR systems. As an irrelevant aside, my version also uses the strcasestr(3) function to perform the case-insensitive search. Unfortunately, this forces _GNU_SOURCE to be defined for the inclusion of `string.h`---however, it is hugely faster than running tolower(3) on every character of the query and buffer, as the canonicalisation (in this case, converting the needle and haystack to lower-case), is done as part of the standard string-searching function call (`two_way_{long,short}_needle`) [1]. As discussed in my previous e-mail, I'm working on reimplementing this with the Two-Way algorithm (and shift tables for small needles) to avoid the non-standard dependency, although it might take a few days. Ashley. [1] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/str-two-way.h;h=de247fbc98b83a6e1653288e4161751710d026ce;hb=HEAD#l35 -- Ashley Dixon suugaku.co.uk 2A9A 4117 DA96 D18A 8A7B B0D2 A30E BF25 F290 A8AA signature.asc Description: PGP signature
Re: [gentoo-dev] euses(1) Reimplementation
Hi Ashley, Sounds like you've put some work into this. You could compare against `quse -D ` (from portage-utils) as well to get another point of measure. I don't know what you did measure euses against though, it seems fairly fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a specific case you're focussing on? Thanks, Fabian On 09-07-2020 02:33:28 +0100, Ashley Dixon wrote: > Hi, Gentoo-Dev. > > A while ago, I had a bit of a rant on Gentoo-User regarding the current > issues > with `app-portage/euses`. Specifically, the fact that it does not work on > newer > Gentoo-like systems which have moved away from PORTDIR and conform to > the > repos.conf/ syntax [1, 2, 3]. There are also some bugs/issues in the code, > such > as malloc(3)'ing without checking the result, et cetera. > > Over the past month or so, I've completed a ground-up rewrite which provides > a > similar interface and functionality, that remedies all of these issues, and > adds > a few useful features on top; it is also written in standard C with > no > dependencies other than the standard library. In addition to processing all > the > repositories described in the repos.conf directory, it is also written to > be > remarkably robust, optionally working from the PORTDIR make.conf key-value > pair > or environment variable for legacy systems. (As an initial user pointed > out, > make.conf cannot be used if it is a directory, and will only touched at all > if > the legacy option is enabled and the $PORTDIR environment variable is unset > or > infeasible.) > > Almost all of the features from the original euses tool are present, with > extras > to facilitate multi-repo searching (in the rare event that a > non-Gentoo.git > repository has USE-description files). From my testing, it is equally, if > not > more, performant than the original tool, despite the extra work of > traversing > the meta-repository description files. A copy of the help page is included > here, > for convenience (run with the `-h` or `--help` option): > > ash-euses command-line argument summary. > Syntax: ./ash-euses [options] substrings > > --list-repos-r Prepend a list of located repositories (repos.conf/ only). > --repo-names-n Print repository names for each match. > --repo-paths-p Print repository details for each match (implies > repo-names). > --help -h Print this help information and exit. > --version -v Prepend version and license information to the output. > --strict-s Search only in the flag field, excluding the description. > --portdir -d Attempt to use the PORTDIR value. > --quiet -q Do not complain about PORTDIR. > --no-case -c Perform a case-insensitive search across the files. > --print-needles -e Prepend each match with the relevant needle substring. > --no-interrupt -i Do not interrupt the search results with warnings. > --package -k Restrict the search to category-package description files. > --colour-o Print the package, flag, and description in distinct > colours. > -- Consider all further arguments as substrings/queries. > > There's also a man page in the tree, providing deeper explanations for > these > command-line arguments: `ash-euses.1`. > > Off-line, I'm working on a strstr(3) (and strcasestr) reimplementation using > the > Two-Way string-matching algorithm [4] and shift tables, to remove the > dependency > on _GNU_SOURCE for the case-insensitive variant (it is very annoying that > this > is not a standard function, as it only defines CANON_ELEMENT to tolower(3) > and > calls glibc strstr [5]). > > For all my tests, the search yield is generally identical to euses(1). An > ebuild > is also included in the tree, however I am hardly experienced with writing > them, > so I'm not entirely sure if it respects the globally defined compiler > flags. > Regardless, I am posting here for anyone who is interested in using/testing > this > program, with the hope that it can provide an alternative for quick > flag-lookup > on newer, standards-conformant Gentoo-like systems. > > The source code is at [6], and a gzipped tarball of the latest release > (v0.3) > can be found at [7]. Thank you in advance to all interested parties. > > Cheers, > Ashley. > > P.S. I really need a better name for this. A portmanteau of my first name, > and > the tool of which the program is a replica, doesn't seem very creative. > > [1] https://bugs.gentoo.org/546210 > [2] https://bugs.gentoo.org/378603 > [3] https://bugs.gentoo.org/663706#c4 > [4] https://dl.acm.org/doi/abs/10.1145/116825.116845 > [5] > https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcasestr.c;h=d2964c5548b9ea7a68fc5b18b25ddfe7ddd6835c;hb=HEAD#l45 > [6] http://git.suugaku.co.uk/ash-euses/tree/ > [7] http://git.suugaku.co.uk/ash-euses/snapshot/ash-euses-0.3.tar.gz > > -- >
[gentoo-dev] euses(1) Reimplementation
Hi, Gentoo-Dev. A while ago, I had a bit of a rant on Gentoo-User regarding the current issues with `app-portage/euses`. Specifically, the fact that it does not work on newer Gentoo-like systems which have moved away from PORTDIR and conform to the repos.conf/ syntax [1, 2, 3]. There are also some bugs/issues in the code, such as malloc(3)'ing without checking the result, et cetera. Over the past month or so, I've completed a ground-up rewrite which provides a similar interface and functionality, that remedies all of these issues, and adds a few useful features on top; it is also written in standard C with no dependencies other than the standard library. In addition to processing all the repositories described in the repos.conf directory, it is also written to be remarkably robust, optionally working from the PORTDIR make.conf key-value pair or environment variable for legacy systems. (As an initial user pointed out, make.conf cannot be used if it is a directory, and will only touched at all if the legacy option is enabled and the $PORTDIR environment variable is unset or infeasible.) Almost all of the features from the original euses tool are present, with extras to facilitate multi-repo searching (in the rare event that a non-Gentoo.git repository has USE-description files). From my testing, it is equally, if not more, performant than the original tool, despite the extra work of traversing the meta-repository description files. A copy of the help page is included here, for convenience (run with the `-h` or `--help` option): ash-euses command-line argument summary. Syntax: ./ash-euses [options] substrings --list-repos-r Prepend a list of located repositories (repos.conf/ only). --repo-names-n Print repository names for each match. --repo-paths-p Print repository details for each match (implies repo-names). --help -h Print this help information and exit. --version -v Prepend version and license information to the output. --strict-s Search only in the flag field, excluding the description. --portdir -d Attempt to use the PORTDIR value. --quiet -q Do not complain about PORTDIR. --no-case -c Perform a case-insensitive search across the files. --print-needles -e Prepend each match with the relevant needle substring. --no-interrupt -i Do not interrupt the search results with warnings. --package -k Restrict the search to category-package description files. --colour-o Print the package, flag, and description in distinct colours. -- Consider all further arguments as substrings/queries. There's also a man page in the tree, providing deeper explanations for these command-line arguments: `ash-euses.1`. Off-line, I'm working on a strstr(3) (and strcasestr) reimplementation using the Two-Way string-matching algorithm [4] and shift tables, to remove the dependency on _GNU_SOURCE for the case-insensitive variant (it is very annoying that this is not a standard function, as it only defines CANON_ELEMENT to tolower(3) and calls glibc strstr [5]). For all my tests, the search yield is generally identical to euses(1). An ebuild is also included in the tree, however I am hardly experienced with writing them, so I'm not entirely sure if it respects the globally defined compiler flags. Regardless, I am posting here for anyone who is interested in using/testing this program, with the hope that it can provide an alternative for quick flag-lookup on newer, standards-conformant Gentoo-like systems. The source code is at [6], and a gzipped tarball of the latest release (v0.3) can be found at [7]. Thank you in advance to all interested parties. Cheers, Ashley. P.S. I really need a better name for this. A portmanteau of my first name, and the tool of which the program is a replica, doesn't seem very creative. [1] https://bugs.gentoo.org/546210 [2] https://bugs.gentoo.org/378603 [3] https://bugs.gentoo.org/663706#c4 [4] https://dl.acm.org/doi/abs/10.1145/116825.116845 [5] https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcasestr.c;h=d2964c5548b9ea7a68fc5b18b25ddfe7ddd6835c;hb=HEAD#l45 [6] http://git.suugaku.co.uk/ash-euses/tree/ [7] http://git.suugaku.co.uk/ash-euses/snapshot/ash-euses-0.3.tar.gz -- Ashley Dixon suugaku.co.uk 2A9A 4117 DA96 D18A 8A7B B0D2 A30E BF25 F290 A8AA signature.asc Description: PGP signature