Re: [gentoo-portage-dev] [RFC] Package description index file format for faster emerge search actions

2014-10-17 Thread Zac Medico
On 10/15/2014 10:26 AM, Paul Varner wrote:
> Please do this.  Once this is in place, I will probably deprecate
> esearch and point users to emerge for basic searching and eix for
> advanced searching.

Okay, it's done [1]. If you want to test it out, note that any
un-indexed repositories will slow down the search. When I was testing
the patch, I only generated the index for the gentoo repo, and disabled
my overlays when running searches:

PORTAGE_REPOSITORIES=$'[gentoo]\nlocation = /usr/portage\n' emerge -S
search-string

[1] https://bugs.gentoo.org/show_bug.cgi?id=525718
-- 
Thanks,
Zac



Re: [gentoo-portage-dev] [RFC] Package description index file format for faster emerge search actions

2014-10-15 Thread Paul Varner
On 10/14/14 02:40, Zac Medico wrote:
> Hi,
>
> As we all know, emerge --search/--searchdesc actions are embarrassingly
> slow (from most users' perspectives, anyway), especially in comparison
> to external tools like eix and esearch.
>
> Wouldn't it be nice if the performance of emerge's search functionality
> was more competitive with other offerings? Then, external search tools
> might not be seen as an absolute necessity.
>
> In order to solve this problem, I suggest that we add support for a
> package description index file format. For example, the attached script
> will generate a suitable index formatted as series of lines like this:
>
> sys-apps/sandbox-1.6-r2,2.3-r1,2.4,2.5,2.6-r1: sandbox'd LD_PRELOAD hack
>
> Using this format, the index file for the entire gentoo-x86 repository
> consumes approximately 1.5 MB. The whole file can be quickly searched as
> a stream (the whole file need not be in memory at once), yielding emerge
> --search/--searchdesc performance that will be competitive with
> app-portage/esearch.
>
> The index can either be generated on the server side by egencache, or on
> the client side by a post emerge --sync hook. It makes sense to support
> both modes of operation, so that server side generation is purely optional.
>
> What do others think about this proposal?

Please do this.  Once this is in place, I will probably deprecate
esearch and point users to emerge for basic searching and eix for
advanced searching.

Regards,
Paul



[gentoo-portage-dev] [RFC] Package description index file format for faster emerge search actions

2014-10-14 Thread Zac Medico
Hi,

As we all know, emerge --search/--searchdesc actions are embarrassingly
slow (from most users' perspectives, anyway), especially in comparison
to external tools like eix and esearch.

Wouldn't it be nice if the performance of emerge's search functionality
was more competitive with other offerings? Then, external search tools
might not be seen as an absolute necessity.

In order to solve this problem, I suggest that we add support for a
package description index file format. For example, the attached script
will generate a suitable index formatted as series of lines like this:

sys-apps/sandbox-1.6-r2,2.3-r1,2.4,2.5,2.6-r1: sandbox'd LD_PRELOAD hack

Using this format, the index file for the entire gentoo-x86 repository
consumes approximately 1.5 MB. The whole file can be quickly searched as
a stream (the whole file need not be in memory at once), yielding emerge
--search/--searchdesc performance that will be competitive with
app-portage/esearch.

The index can either be generated on the server side by egencache, or on
the client side by a post emerge --sync hook. It makes sense to support
both modes of operation, so that server side generation is purely optional.

What do others think about this proposal?
-- 
Thanks,
Zac
#!/usr/bin/env python

import os
import sys

import portage
from portage.versions import _pkg_str

usage = "usage: %s \n" % os.path.basename(sys.argv[0])

def main(args):
	if len(args) != 1:
		sys.stderr.write(usage)
		return 1

	repo_name = args[0]
	repo_info = portage.settings.repositories.prepos.get(repo_name)

	if repo_info is None:
		sys.stderr.write("unknown repo: %s\n" % repo_name)
		return 1

	portdb = portage.db[portage.root]["porttree"].dbapi
	portdb.porttrees = [repo_info.location]

	f = sys.stdout
	if sys.hexversion >= 0x300:
		f = f.buffer

	class duplicates(object):
		cp = None
		desc = None
		pkgs = []

	def flush_duplicates():
		if duplicates.pkgs:
			if len(duplicates.pkgs) == 1:
output = "%s: %s\n" % (duplicates.pkgs[0],
	duplicates.desc)
			else:
output = "%s,%s: %s\n" % (duplicates.pkgs[0],
	",".join(pkg.version
	for pkg in duplicates.pkgs[1:]), duplicates.desc)
			f.write(output.encode('utf_8'))
			del duplicates.pkgs[:]

	for cp in portdb.cp_all():
		for cpv in portdb.cp_list(cp):
			desc, = portdb.aux_get(cpv, ["DESCRIPTION"])
			if duplicates.cp != cp or duplicates.desc != desc:
flush_duplicates()
			duplicates.cp = cp
			duplicates.desc = desc
			duplicates.pkgs.append(_pkg_str(cpv))

	flush_duplicates()

	return os.EX_OK

if __name__ == '__main__':
	sys.exit(main(sys.argv[1:]))