Re: [gentoo-dev] euses(1) Reimplementation

2020-07-09 Thread Ashley Dixon
Hi Fabian, cheers for your response.

On Thu, Jul 09, 2020 at 08:39:30AM +0200, Fabian Groffen wrote:
> Sounds like you've put some work into this.  You could compare against
> `quse -D ` (from portage-utils) as well to get another point of
> measure.

quse is about half as fast as my tool, however  that's  understandable  as  it's
working primarily from ebuild scripts, as opposed to USE-flag descriptors.   The
two tools yield exactly the same results,  providing  that  `-s`  is  passed  to
ash-euses (its default behaviour is to include flag descriptions in the  search;
`-s` instructs it to only display matches which appear as a flag).

The disadvantage of my tool is its inability to understand  the  nature  of  the
packages, such that it cannot offer command-line options such as  "only  display
results related to installed packages".

> I don't know what you did measure euses against though, it seems fairly
> fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a
> specific case you're focussing on?

It is very fast, however it could be faster.  I ran  it  through  callgrind  and
kcachegrind to find that it spends over 56% of its  execution  time  on  strncpy
calls; the string-construction is extremely  inefficient.   My  reimplementation
also aims to consist of more maintainable  and  clean  code  (for  example,  the
original tool declares 23 nondescriptly named local  variables  at  the  top  of
main(), and  more  throughout  the  function).   Regardless,  the  obvious  main
advantage is that it is fully compliant with the  repos.conf  syntax,  but  also
works on legacy PORTDIR systems.

As an irrelevant aside, my version  also  uses  the  strcasestr(3)  function  to
perform the case-insensitive search.  Unfortunately, this forces _GNU_SOURCE  to
be defined for the inclusion of `string.h`---however, it is hugely  faster  than
running  tolower(3)  on  every  character  of  the  query  and  buffer,  as  the
canonicalisation  (in  this  case,  converting  the  needle  and   haystack   to
lower-case), is done as part of  the  standard  string-searching  function  call
(`two_way_{long,short}_needle`) [1].  As discussed in my  previous  e-mail,  I'm
working on reimplementing this with the Two-Way algorithm (and shift tables  for
small needles) to avoid the non-standard dependency, although it  might  take  a
few days.

Ashley.

[1] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/str-two-way.h;h=de247fbc98b83a6e1653288e4161751710d026ce;hb=HEAD#l35

-- 

Ashley Dixon
suugaku.co.uk

2A9A 4117
DA96 D18A
8A7B B0D2
A30E BF25
F290 A8AA



signature.asc
Description: PGP signature


Re: [gentoo-dev] euses(1) Reimplementation

2020-07-09 Thread Fabian Groffen
Hi Ashley,

Sounds like you've put some work into this.  You could compare against
`quse -D ` (from portage-utils) as well to get another point of
measure.

I don't know what you did measure euses against though, it seems fairly
fast to me (env PORTDIR=`q -e PORTDIR` euses -v libressl), is there a
specific case you're focussing on?

Thanks,
Fabian


On 09-07-2020 02:33:28 +0100, Ashley Dixon wrote:
> Hi, Gentoo-Dev.
> 
> A while ago, I had a bit of a rant on Gentoo-User regarding the  current  
> issues
> with `app-portage/euses`.  Specifically, the fact that it does not work on 
> newer
> Gentoo-like systems which have moved  away  from  PORTDIR  and  conform  to  
> the
> repos.conf/ syntax [1, 2, 3].  There are also some bugs/issues in the code, 
> such
> as malloc(3)'ing without checking the result, et cetera.
> 
> Over the past month or so, I've completed a ground-up rewrite which  provides 
>  a
> similar interface and functionality, that remedies all of these issues, and 
> adds
> a few useful features on  top;  it  is  also  written  in  standard  C  with  
> no
> dependencies other than the standard library.  In addition to processing all 
> the
> repositories described in the repos.conf directory, it is  also  written  to  
> be
> remarkably robust, optionally working from the PORTDIR make.conf key-value  
> pair
> or environment variable for legacy systems.  (As an initial  user  pointed  
> out,
> make.conf cannot be used if it is a directory, and will only touched at  all  
> if
> the legacy option is enabled and the $PORTDIR environment variable is  unset  
> or
> infeasible.)
> 
> Almost all of the features from the original euses tool are present, with 
> extras
> to facilitate multi-repo searching (in the  rare  event  that  a  
> non-Gentoo.git
> repository has USE-description files).  From my testing, it is equally,  if  
> not
> more, performant than the original tool, despite the extra  work  of  
> traversing
> the meta-repository description files. A copy of the help page is included 
> here,
> for convenience (run with the `-h` or `--help` option): 
> 
> ash-euses command-line argument summary.
> Syntax: ./ash-euses [options] substrings
> 
> --list-repos-r Prepend a list of located repositories (repos.conf/ only).
> --repo-names-n Print repository names for each match.
> --repo-paths-p Print repository details for each match (implies 
> repo-names).
> --help  -h Print this help information and exit.
> --version   -v Prepend version and license information to the output.
> --strict-s Search only in the flag field, excluding the description.
> --portdir   -d Attempt to use the PORTDIR value.
> --quiet -q Do not complain about PORTDIR.
> --no-case   -c Perform a case-insensitive search across the files.
> --print-needles -e Prepend each match with the relevant needle substring.
> --no-interrupt  -i Do not interrupt the search results with warnings.
> --package   -k Restrict the search to category-package description files.
> --colour-o Print the package, flag, and description in distinct 
> colours.
> -- Consider all further arguments as substrings/queries.
> 
> There's also a man page in the tree, providing  deeper  explanations  for  
> these
> command-line arguments: `ash-euses.1`.
> 
> Off-line, I'm working on a strstr(3) (and strcasestr) reimplementation using 
> the
> Two-Way string-matching algorithm [4] and shift tables, to remove the 
> dependency
> on _GNU_SOURCE for the case-insensitive variant (it is very annoying  that  
> this
> is not a standard function, as it only defines CANON_ELEMENT to  tolower(3)  
> and
> calls glibc strstr [5]).
> 
> For all my tests, the search yield is generally identical to euses(1). An 
> ebuild
> is also included in the tree, however I am hardly experienced with writing 
> them,
> so I'm not entirely sure if it respects the  globally  defined  compiler  
> flags.
> Regardless, I am posting here for anyone who is interested in using/testing 
> this
> program, with the hope that it can provide an alternative for quick  
> flag-lookup
> on newer, standards-conformant Gentoo-like systems.
> 
> The source code is at [6], and a gzipped tarball of the  latest  release  
> (v0.3)
> can be found at [7]. Thank you in advance to all interested parties.
> 
> Cheers,
> Ashley.
> 
> P.S.  I really need a better name for this.  A portmanteau of my first name, 
> and
> the tool of which the program is a replica, doesn't seem very creative.
> 
> [1] https://bugs.gentoo.org/546210
> [2] https://bugs.gentoo.org/378603
> [3] https://bugs.gentoo.org/663706#c4
> [4] https://dl.acm.org/doi/abs/10.1145/116825.116845
> [5] 
> https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcasestr.c;h=d2964c5548b9ea7a68fc5b18b25ddfe7ddd6835c;hb=HEAD#l45
> [6] http://git.suugaku.co.uk/ash-euses/tree/
> [7] http://git.suugaku.co.uk/ash-euses/snapshot/ash-euses-0.3.tar.gz
> 
> -- 
> 

[gentoo-dev] euses(1) Reimplementation

2020-07-08 Thread Ashley Dixon
Hi, Gentoo-Dev.

A while ago, I had a bit of a rant on Gentoo-User regarding the  current  issues
with `app-portage/euses`.  Specifically, the fact that it does not work on newer
Gentoo-like systems which have moved  away  from  PORTDIR  and  conform  to  the
repos.conf/ syntax [1, 2, 3].  There are also some bugs/issues in the code, such
as malloc(3)'ing without checking the result, et cetera.

Over the past month or so, I've completed a ground-up rewrite which  provides  a
similar interface and functionality, that remedies all of these issues, and adds
a few useful features on  top;  it  is  also  written  in  standard  C  with  no
dependencies other than the standard library.  In addition to processing all the
repositories described in the repos.conf directory, it is  also  written  to  be
remarkably robust, optionally working from the PORTDIR make.conf key-value  pair
or environment variable for legacy systems.  (As an initial  user  pointed  out,
make.conf cannot be used if it is a directory, and will only touched at  all  if
the legacy option is enabled and the $PORTDIR environment variable is  unset  or
infeasible.)

Almost all of the features from the original euses tool are present, with extras
to facilitate multi-repo searching (in the  rare  event  that  a  non-Gentoo.git
repository has USE-description files).  From my testing, it is equally,  if  not
more, performant than the original tool, despite the extra  work  of  traversing
the meta-repository description files. A copy of the help page is included here,
for convenience (run with the `-h` or `--help` option): 

ash-euses command-line argument summary.
Syntax: ./ash-euses [options] substrings

--list-repos-r Prepend a list of located repositories (repos.conf/ only).
--repo-names-n Print repository names for each match.
--repo-paths-p Print repository details for each match (implies repo-names).
--help  -h Print this help information and exit.
--version   -v Prepend version and license information to the output.
--strict-s Search only in the flag field, excluding the description.
--portdir   -d Attempt to use the PORTDIR value.
--quiet -q Do not complain about PORTDIR.
--no-case   -c Perform a case-insensitive search across the files.
--print-needles -e Prepend each match with the relevant needle substring.
--no-interrupt  -i Do not interrupt the search results with warnings.
--package   -k Restrict the search to category-package description files.
--colour-o Print the package, flag, and description in distinct colours.
-- Consider all further arguments as substrings/queries.

There's also a man page in the tree, providing  deeper  explanations  for  these
command-line arguments: `ash-euses.1`.

Off-line, I'm working on a strstr(3) (and strcasestr) reimplementation using the
Two-Way string-matching algorithm [4] and shift tables, to remove the dependency
on _GNU_SOURCE for the case-insensitive variant (it is very annoying  that  this
is not a standard function, as it only defines CANON_ELEMENT to  tolower(3)  and
calls glibc strstr [5]).

For all my tests, the search yield is generally identical to euses(1). An ebuild
is also included in the tree, however I am hardly experienced with writing them,
so I'm not entirely sure if it respects the  globally  defined  compiler  flags.
Regardless, I am posting here for anyone who is interested in using/testing this
program, with the hope that it can provide an alternative for quick  flag-lookup
on newer, standards-conformant Gentoo-like systems.

The source code is at [6], and a gzipped tarball of the  latest  release  (v0.3)
can be found at [7]. Thank you in advance to all interested parties.

Cheers,
Ashley.

P.S.  I really need a better name for this.  A portmanteau of my first name, and
the tool of which the program is a replica, doesn't seem very creative.

[1] https://bugs.gentoo.org/546210
[2] https://bugs.gentoo.org/378603
[3] https://bugs.gentoo.org/663706#c4
[4] https://dl.acm.org/doi/abs/10.1145/116825.116845
[5] 
https://sourceware.org/git/?p=glibc.git;a=blob;f=string/strcasestr.c;h=d2964c5548b9ea7a68fc5b18b25ddfe7ddd6835c;hb=HEAD#l45
[6] http://git.suugaku.co.uk/ash-euses/tree/
[7] http://git.suugaku.co.uk/ash-euses/snapshot/ash-euses-0.3.tar.gz

-- 

Ashley Dixon
suugaku.co.uk

2A9A 4117
DA96 D18A
8A7B B0D2
A30E BF25
F290 A8AA



signature.asc
Description: PGP signature