Hi,
As we all know, emerge --search/--searchdesc actions are embarrassingly
slow (from most users' perspectives, anyway), especially in comparison
to external tools like eix and esearch.
Wouldn't it be nice if the performance of emerge's search functionality
was more competitive with other offerings? Then, external search tools
might not be seen as an absolute necessity.
In order to solve this problem, I suggest that we add support for a
package description index file format. For example, the attached script
will generate a suitable index formatted as series of lines like this:
sys-apps/sandbox-1.6-r2,2.3-r1,2.4,2.5,2.6-r1: sandbox'd LD_PRELOAD hack
Using this format, the index file for the entire gentoo-x86 repository
consumes approximately 1.5 MB. The whole file can be quickly searched as
a stream (the whole file need not be in memory at once), yielding emerge
--search/--searchdesc performance that will be competitive with
app-portage/esearch.
The index can either be generated on the server side by egencache, or on
the client side by a post emerge --sync hook. It makes sense to support
both modes of operation, so that server side generation is purely optional.
What do others think about this proposal?
--
Thanks,
Zac
#!/usr/bin/env python
import os
import sys
import portage
from portage.versions import _pkg_str
usage = "usage: %s \n" % os.path.basename(sys.argv[0])
def main(args):
if len(args) != 1:
sys.stderr.write(usage)
return 1
repo_name = args[0]
repo_info = portage.settings.repositories.prepos.get(repo_name)
if repo_info is None:
sys.stderr.write("unknown repo: %s\n" % repo_name)
return 1
portdb = portage.db[portage.root]["porttree"].dbapi
portdb.porttrees = [repo_info.location]
f = sys.stdout
if sys.hexversion >= 0x300:
f = f.buffer
class duplicates(object):
cp = None
desc = None
pkgs = []
def flush_duplicates():
if duplicates.pkgs:
if len(duplicates.pkgs) == 1:
output = "%s: %s\n" % (duplicates.pkgs[0],
duplicates.desc)
else:
output = "%s,%s: %s\n" % (duplicates.pkgs[0],
",".join(pkg.version
for pkg in duplicates.pkgs[1:]), duplicates.desc)
f.write(output.encode('utf_8'))
del duplicates.pkgs[:]
for cp in portdb.cp_all():
for cpv in portdb.cp_list(cp):
desc, = portdb.aux_get(cpv, ["DESCRIPTION"])
if duplicates.cp != cp or duplicates.desc != desc:
flush_duplicates()
duplicates.cp = cp
duplicates.desc = desc
duplicates.pkgs.append(_pkg_str(cpv))
flush_duplicates()
return os.EX_OK
if __name__ == '__main__':
sys.exit(main(sys.argv[1:]))