Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal

2008-12-02 Thread Alec Warner
On Mon, Dec 1, 2008 at 3:14 PM, Maciej Mrozowski [EMAIL PROTECTED] wrote:
 On Monday 01 of December 2008 22:51:57 Ciaran McCreesh wrote:

 Experience, manpower, the ability to try out potential enhancements
 rapidly, a long track record of getting it right and the growing
 recognition that most people doing package manager work for Gentoo
 aren't doing it with Portage.

 While of course I agree that any input from 'outside' is welcome and valuable,
 yet to get things done, in my opinion the final decision should not be blocked
 by from any alternative package manager and some policies should be enforced.

 But on topic, what's a counter proposal for my idea then?

mean hat

You asked, so the counter proposal is to *do nothing*.

very mean generic rant hat on

Ideas (even good ones) don't always get implemented.

Sometimes that just isn't the direction the maintainers want to take
the project.
Sometimes it is harder to implement than most people realize.
Sometimes suggested implementations are just a hack and a bad idea all around.

I think starting with an implementation may have been a bad starting move.

Start with what you want to accomplish:
  - Get feedback on whether this is useful or not.
  - Get feedback on other features that may be available.
  - Get feedback on how some folks would accomplish this.

I want to be able to turn debug builds on or off on a per-package
basis.  Debug builds entail both debugging symbols, split-debug, debug
CFLAGS and debug LDFLAGS.

Is that a fair summary of your request?

I am unsure how much you actually care about how each package manager
implements this feature (or if anyone implements it but portage, or
paludis, or whatever the majority of the KDE users are using).

I'm also unsure how useful this is when say, some part of KDE links
against libfoo and KDE is built with debug symbols but libfoo is not.
Is that really useful?  Are users actually asking for this proposed
feature or do you just think they want it?  Do you have any data to
back up why someone should implement this feature (mailing list posts,
forums threads, etc..)

Certainly for portage per-package features are possible with a minor
patch (to read the custom settings from your config and to inject the
FEATURES variables into the per-package config when necessary).  The
problem that has been stated in the past is that FEATURES were not
designed to be used in that manner (per-package).

We could design an separate system that let you define per-package
'things' and use these 'things' to trigger debug builds (completely
outside of FEATURES, leaving them alone).  FEATURES were in fact
specific features of portage that you want 'on' or 'off'
(metadata-transfer, parallel-fetch, userfetch, unmerge-orphans,
etc...)  These are examples of things you would not turn off
per-ebuild.

But the question is always 'is it really worth it' and can you get
someone to do it.
Sometimes, doing nothing is better than doing something badly.

endrant

-Alec

 Quick search in archives gave me some results I don't particularly like, like
 the idea with /etc/portage/packages.cflags and /etc/portage/package.env, and
 they have been dropped for similar reasons - as the former needs special
 parsing instead just sourcing the script (the problem is that someone needs to
 implement this - this is usually the problem, especially in pure volunteer
 projects like Gentoo), the latter looks a bit messy to me. /etc/portage/env
 would be the best approach when made officially supported (recently it looks
 like /etc/portage/env is sourced multiple times and that should be fixed, for
 convenience, just in case user wants to put:
 CFLAGS=-O0 -ggdb
 CXXFLAGS=${CFLAGS}
 FEATURES=${FEATURES} nostrip
 (or even USE=${USE} debug)
 actually /etc/portage/env could easily replace package.keywords and
 package.use as well and introduce replacement for meybe-proposed-sometime
 package.features - I wonder whether it's been discussed already.

Not without causing a bunch of pain in figuring out the inheriting
order of stack USE variables.


 --
 regards
 MM




Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread James Cloos
 Jan == Jan Kundrát [EMAIL PROTECTED] writes:

 - less data in metadata cache;

Jan Isn't it in the cache for some reason? Really, I'm just asking.

If for nothing else, so that update-eix can get it to allow searching on
homepage.  And, yes, that is an important feature.  And, no, openeing
every metadata.xml file during update-eix is in no way acceptable.

For eix above, of course, read your favourite query tool.

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6



Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread James Cloos
 Diego == Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes:

 But also the need to replicate http://www.kde.org/ to metadata.xml of
 all KDE split ebuilds -- right now, this is set by an eclass.

Diego The usefulness of this is IMHO debatable; why not just writing it one
Diego package (say kde-base/kde or kde-meta) and just there? Having each
Diego mini-package express itself as having that as its homepage is not very
Diego useful to me, but I guess it's debatable.

Searching is an important reason for every package to specify its homepage.

-JimC
-- 
James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6



[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread Diego 'Flameeyes' Pettenò
James Cloos [EMAIL PROTECTED] writes:

 Searching is an important reason for every package to specify its homepage.

And?

metadata.xml already contains data that eix and other software should be
able to search in (like longdescriptions), and having each package in
kde-base report http://www.kde.org/ as its homepage is kinda pointless
if you think about search, since that's not data, it's noise.

Which only adds to my point.

-- 
Diego Flameeyes Pettenò
http://blog.flameeyes.eu/


pgpS3fcWfM3UH.pgp
Description: PGP signature


[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread Ryan Hill
On Mon, 01 Dec 2008 10:00:33 +0100
[EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote:

 Alec Warner [EMAIL PROTECTED] writes:
 
  That being said I still don't see the usefulness here.
 
  You seem to think that using the existing APIs for this data is
  wrong, and I think the opposite, so I guess we will agree to
  disagree on this matter.
 
 Yeah I still think that there is no point in requiring using of a
 specific API when the same data can easily be available in a format
 that is more or less parsable with ease in any modern (and non)
 programming language.
 
 Beside, I find expanding the HOMEPAGE syntax to allow more than one
 link a bit ... overkill, if the same thing can be achieved in
 metadata.xml...

I find moving HOMEPAGE out of ebuilds to be a bit overkill.


-- 
gcc-porting,  by design, by neglect
treecleaner,  for a fact or just for effect
wxwidgets @ gentoo EFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662


signature.asc
Description: PGP signature


Re: [gentoo-dev] [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread Robin H. Johnson
While the KDE eclass doesn't set specific homepages per packages, a
number of other eclasses do:

eclass/horde.eclass:HOMEPAGE=http://www.horde.org/${HORDE_PN};
eclass/java-pkg-2.eclass:   
HOMEPAGE=http://commons.apache.org/${PN#commons-}/;
eclass/kernel-2.eclass:HOMEPAGE=http://www.kernel.org/ http://www.gentoo.org/ 
${HOMEPAGE}
eclass/perl-module.eclass:  
HOMEPAGE=http://search.cpan.org/search?query=${MY_PN:-${PN}}mode=dist;
eclass/php-ext-pecl-r1.eclass:HOMEPAGE=http://pecl.php.net/${PECL_PKG};
eclass/php-pear-r1.eclass:[[ -z ${HOMEPAGE} ]]  
HOMEPAGE=http://pear.php.net/${PHP_PEAR_PKG_NAME};
eclass/ruby.eclass:HOMEPAGE=http://raa.ruby-lang.org/list.rhtml?name=${PN};
eclass/xfce44.eclass:   
HOMEPAGE=http://thunar.xfce.org/pwiki/projects/${MY_PN};

Additionally, some of the above eclasses are used by other eclasses: ant-tasks,
java-gnome, perl-app, perl-post, php-ext-pecl, php-ezc, php-pear, gems

A quick scan of the tree shows 15% of the ebuilds do not set the HOMEPAGE
variable in the ebuild itself. And a LOT more qualify, esp. in dev-ruby and
dev-perl. Some quick scanning on groups of packages that I'm aware of puts the
figure beyond 20% of the tree qualifying (converting any dev-perl/perl-core
package that comes from CPAN).

As another major pain, for ebuilds where the homepage changes every version in
some predictable pattern, you have now increased the maintenance burden. Before
we could just copy the ebuild if we had a suitable variable expression in the
HOMEPAGE variable, but now we'd have to edit it into metadata.xml as well.

For all the rest of the ebuilds where it does remain static, I don't see
any actual advantage to removing it from the ebuilds.

To be very clear however, I've got _zero_ objections to adding the extra
new fields into the metadata.xml, provided they are version independent.

-- 
Robin Hugh Johnson
Gentoo Linux Developer  Infra Guy
E-Mail : [EMAIL PROTECTED]
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85


pgp24EsyRaSjN.pgp
Description: PGP signature


Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future

2008-12-02 Thread Marius Mauch
On Wed, 03 Dec 2008 02:05:31 +0100
[EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote:

 metadata.xml already contains data that eix and other software should
 be able to search in (like longdescriptions), and having each package
 in kde-base report http://www.kde.org/ as its homepage is kinda
 pointless if you think about search, since that's not data, it's
 noise.

So you're saying if I'm interested in a url to look for information
about kalarm, I should search for it in metadata.xml of random kde
packages? Sorry, but that doesn't make any sense to me.

While I'm not necessarily against your primary goal here, your
argumentation is very subjective to say the least (e.g. just because
you find xml easier to read/parse than ebuilds doesn't mean the same
holds true for everyone else, ignoring the whole cache issue). It
feels a bit like you're looking for problems to justify your solution
rather than the other way round.

Marius



Re: [gentoo-dev] Looking for help with kernel maintenance

2008-12-02 Thread Nicolas Sebrecht

On Tue, Dec 02, 2008 at 12:59:51PM +0800, Cheng Renquan wrote:

 1. I have written several patches for vanilla kernel since 2.6.21,
 mostly very simple,
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=search;st=author;[EMAIL
  PROTECTED]

I would like to go further in the Linux kernel internal comprehension.
Could someone tell me where to find a good starting free documentation ?

Most of the documentations I've found are about old kernel versions (2.4
series).

-- 
Nicolas Sebrecht




Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal

2008-12-02 Thread Maciej Mrozowski
On Tuesday 02 of December 2008 10:40:19 Alec Warner wrote:

 mean hat
 You asked, so the counter proposal is to *do nothing*.
 very mean generic rant hat on
 Ideas (even good ones) don't always get implemented.

 Sometimes that just isn't the direction the maintainers want to take
 the project.
 Sometimes it is harder to implement than most people realize.
 Sometimes suggested implementations are just a hack and a bad idea all
 around.

 I think starting with an implementation may have been a bad starting move.

 Start with what you want to accomplish:
   - Get feedback on whether this is useful or not.
   - Get feedback on other features that may be available.
   - Get feedback on how some folks would accomplish this.

 I want to be able to turn debug builds on or off on a per-package
 basis.  Debug builds entail both debugging symbols, split-debug, debug
 CFLAGS and debug LDFLAGS.

 Is that a fair summary of your request?

Yes, precisely. But forget about this proposal, as I stated already it's just 
a workaround for inability to set CFLAGS/LDFLAGS and those two FEATURES per-
package basis in *official* way.

 I am unsure how much you actually care about how each package manager
 implements this feature (or if anyone implements it but portage, or
 paludis, or whatever the majority of the KDE users are using).

 I'm also unsure how useful this is when say, some part of KDE links
 against libfoo and KDE is built with debug symbols but libfoo is not.
 Is that really useful?  Are users actually asking for this proposed
 feature or do you just think they want it?  Do you have any data to
 back up why someone should implement this feature (mailing list posts,
 forums threads, etc..)

No, and I'm afraid I cannot provide any single evidence that users actually 
need features like:
- per package cflags/ldflags/features
- per category use flags, accept_keywords, cflags
- or tag clouds instead of hard coded categories
- user-defined packages sets (official)
- multiple portage configurations support to ease building binaries for 
several targets on a same host
- dynamic libraries tracking for safe package upgrade or removal
- real backwards dependencies
- maybe git driven Portage
- automatic kernel modules rebuilding
- mysql split ebuilds

Actually, I'm perfectly certain that users are way more interested in critical 
important aspects of their system like whether HOMEPAGE should be set in 
ebuilds or in metadata.xml :D

Please let me solve your little problem with HOMEPAGE for you...
Package's homepage obviously may be, and actually is - ${PN}-${PV} specific.
That being said it *would* needs to be specified either in every ebuild or as 
someone proposed - in metadata.xml in versioned/tagged way.
And no matter how many searches you run - it may be easy to predict that due 
to lazyness (less probable) or just to avoid copy/paste (copy/paste is bad - 
everyone knows that) - some developers used to put HOMEPAGE in eclasses - 
because it may be used to put in postinst message for some reasons, that being 
said it needs to be in ebuild domain in current implementation.
Mixing XML and bash (ebuild) in general isn't a bad idea but using bothe of 
them seems to be inconsistent - but some trade off needs to be paid sometimes.
When duplicating HOMEPAGE is such a pain for developer (as he needs to type it 
all over again, I agree, it is pain, especially when one need to put some 
things only to please repoman), why not invest some time and develop tools 
that could make it easier - like meta-ebuilds (or ebuild generators) and 
ebuild templates? I've done something like this to autogenerate plasma applet 
live ebuilds from KDE playground on their SVN (it's not yet commited to 
overlay as eclass is not yet ready to fetch/unpack and build packages from 
this location and I haven't got time yet to patch it).
If declaring HOMEPAGE in eclasses troubles you as you need BASH to process it 
properly (it may be pain for non-BASH search tools) and XML may be problematic 
to parse for bash tools - why not create such ebuild generator or 'compiler' - 
that could generate ebuild? Or for example as complete BASH script (no need 
for inherit anything) - and use eclasses ONLY like 'development library'.
This way - every ebuild could be:
- eclass-breakage free (overwriting eclasses don't take place so you are 
certain that user's emerge-problem is not him messing with eclasses - like 
mixing those from other overlays)
- every defined variable is there (no need for 'inherit' lookup) - so that one 
can easily find HOMEPAGE= using every kind of tool (unless it is enclosed with 
some condition - why would anyone need to do that btw?)
- much larger disk space requirements for Portage tree - but that could be 
compensated by for example gzipping every ebuild.
Of  course every problems with dichotomy ebuild vs metadata could be solved by 
some new Portage backend - better suited for queries and storage (maybe some 
relational database).
But so far - 

Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal

2008-12-02 Thread Marius Mauch
On Wed, 3 Dec 2008 08:19:18 +0100
Maciej Mrozowski [EMAIL PROTECTED] wrote:

 No, and I'm afraid I cannot provide any single evidence that users
 actually need features like:
 - per package cflags/ldflags/features
 - per category use flags, accept_keywords, cflags
 - or tag clouds instead of hard coded categories
 - user-defined packages sets (official)
 - multiple portage configurations support to ease building binaries
 for several targets on a same host
 - dynamic libraries tracking for safe package upgrade or removal
 - real backwards dependencies
 - maybe git driven Portage
 - automatic kernel modules rebuilding
 - mysql split ebuilds

Assuming that's a list of feature requests, you know that half of them
are already available, right? (not counting the non-feature in there)

Marius




Re: [gentoo-portage-dev] Re: search functionality in emerge

2008-12-02 Thread Tambet
About zipping.. Default settings might not really be good idea - i think
that fastest might be even better. Considering that portage tree contains
same word again and again (like applications) it needs pretty small
dictionary to make it much smaller. Decompressing will not be reading from
disc, decompressing and writing back to disc as in your case probably - try
decompression to memory drive and you might get better numbers.

I have personally used compression in one c++ application and with optimum
settings, it made things much faster - those were files, where i had for
example 65536 16-byte integers, which could be zeros and mostly were; I
didnt care about creating better file format, but just compressed the whole
thing.

I suggest you to compress esearch db, then decompress it to memory drive and
give us those numbers - might be considerably faster.

http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip support.
Try open of that and normal open on esearch db; also compress with the same
lib to get right kind of file.

Anyway - maybe this compression should be later added and optional.

Tambet - technique evolves to art, art evolves to magic, magic evolves to
just doing.


2008/12/2 Alec Warner [EMAIL PROTECTED]

 On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote:
  2008/12/2 Emma Strubell [EMAIL PROTECTED]
 
  True, true. Like I said, I don't really use overlays, so excuse my
  igonrance.
 
  Do you know an order of doing things:
 
  Rules of Optimization:
 
  Rule 1: Don't do it.
  Rule 2 (for experts only): Don't do it yet.
 
  What this actually means - functionality comes first. Readability comes
  next. Optimization comes last. Unless you are creating a fancy 3D engine
 for
  kung fu game.
 
  If you are going to exclude overlays, you are removing functionality -
 and,
  indeed, absolutely has-to-be-there functionality, because noone would
  intuitively expect search function to search only one subset of packages,
  however reasonable this subset would be. So, you can't, just can't, add
 this
  package into portage base - you could write just another external search
  package for portage.
 
  I looked this code a bit and:
  Portage's __init__.py contains comment # search functionality. After
  this comment, there is a nice and simple search class.
  It also contains method def action_sync(...), which contains
  synchronization stuff.
 
  Now, search class will be initialized by setting up 3 databases -
 porttree,
  bintree and vartree, whatever those are. Those will be in self._dbs array
  and porttree will be in self._portdb.
 
  It contains some more methods:
  _findname(...) will return result of self._portdb.findname(...) with same
  parameters or None if it does not exist.
  Other methods will do similar things - map one or another method.
  execute will do the real search...
  Now - for package in self.portdb.cp_all() is important here ...it
  currently loops over whole portage tree. All kinds of matching will be
 done
  inside.
  self.portdb obviously points to porttree.py (unless it points to fake
 tree).
  cp_all will take all porttrees and do simple file search inside. This
 method
  should contain optional index search.
 
self.porttrees = [self.porttree_root] + \
[os.path.realpath(t) for t in
 self.mysettings[PORTDIR_OVERLAY].split()]
 
  So, self.porttrees contains list of trees - first of them is root, others
  are overlays.
 
  Now, what you have to do will not be harder just because of having
 overlay
  search, too.
 
  You have to create method def cp_index(self), which will return
 dictionary
  containing package names as keys. For oroot... will be
 self.porttrees[1:],
  not self.porttrees - this will only search overlays. d = {} will be
  replaced with d = self.cp_index(). If index is not there, old version
 will
  be used (thus, you have to make internal porttrees variable, which
 contains
  all or all except first).
 
  Other methods used by search are xmatch and aux_get - first used several
  times and last one used to get description. You have to cache results of
  those specific queries and make them use your cache - as you can see,
 those
  parts of portage are already able to use overlays. Thus, you have to put
  your code again in beginning of those functions - create index_xmatch and
  index_aux_get methods, then make those methods use them and return their
  results unless those are None (or something other in case none is already
  legal result) - if they return None, old code will be run and do it's
 job.
  If index is not created, result is None. In index_** methods, just check
 if
  query is what you can answer and if it is, then answer it.
 
  Obviously, the simplest way to create your index is to delete index, then
  use those same methods to query for all nessecary information - and
 fastest
  way would be to add updating index directly into sync, which you could do
  later.
 
  Please, also, make 

[gentoo-portage-dev] About boosting sync

2008-12-02 Thread Tambet
Has anyone ever noticed that portage tree contains a lot of md5 hashes,
which are not at all important for using it? I think that it does not make
reliability or functionality smaller any bit if those would all stay in sync
servers - anyway, syncing would go much faster and this tree smaller. What
about removing all those md5 hashes and downloading them only when they're
needed?

Tambet - technique evolves to art, art evolves to magic, magic evolves to
just doing.


Re: [gentoo-portage-dev] Re: search functionality in emerge

2008-12-02 Thread Alec Warner
On Tue, Dec 2, 2008 at 4:42 AM, Tambet [EMAIL PROTECTED] wrote:
 About zipping.. Default settings might not really be good idea - i think
 that fastest might be even better. Considering that portage tree contains
 same word again and again (like applications) it needs pretty small
 dictionary to make it much smaller. Decompressing will not be reading from
 disc, decompressing and writing back to disc as in your case probably - try
 decompression to memory drive and you might get better numbers.

I ran gzip -d -c file.gz  /dev/null, which should not write to disk.

I tried again with gzip -1 and it still takes 29ms to decompress (even
with gzip -1) where a bare read takes 26ms.  (I have a 2.6Ghz X2 which
is probably relevant to gzip decompression speed)


 I have personally used compression in one c++ application and with optimum
 settings, it made things much faster - those were files, where i had for
 example 65536 16-byte integers, which could be zeros and mostly were; I
 didnt care about creating better file format, but just compressed the whole
 thing.

I'm not saying compression won't make the index smaller.  I'm saying
making the index smaller does not improve performance.  If you have a
10 meg file and you make it 1 meg, you do not increase performance
because you (on average) are not saving enough time reading the
smaller file; since you pay it in decompressing the smaller file
later.


 I suggest you to compress esearch db, then decompress it to memory drive and
 give us those numbers - might be considerably faster.

gzip -d -c esearchdb.py.gz  /dev/null (compressed with gzip -1) takes
on average (6 trials, dropped caches between trials) takes 35.1666ms

cat esearchdb.py  /dev/null (uncompressed) takes on average of 6 trials, 24ms.

The point is you use compression when you need to save space (sending
data over the network, or storing large amounts of data or a lot of
something).  The index isn't going to be big (if it is bigger than 20
or 30 meg I'll be surprised), the index isn't going over the network
and there is only 1 index, not say a million indexes (where
compression might actually be useful for some kind of LRU subset of
indexes to meet disk requirements).

Anyway this is all moot since as you stated so well earlier,
optimization comes last, so stop trying to do it now ;)

-Alec


 http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip support.
 Try open of that and normal open on esearch db; also compress with the same
 lib to get right kind of file.

 Anyway - maybe this compression should be later added and optional.

 Tambet - technique evolves to art, art evolves to magic, magic evolves to
 just doing.


 2008/12/2 Alec Warner [EMAIL PROTECTED]

 On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote:
  2008/12/2 Emma Strubell [EMAIL PROTECTED]
 
  True, true. Like I said, I don't really use overlays, so excuse my
  igonrance.
 
  Do you know an order of doing things:
 
  Rules of Optimization:
 
  Rule 1: Don't do it.
  Rule 2 (for experts only): Don't do it yet.
 
  What this actually means - functionality comes first. Readability comes
  next. Optimization comes last. Unless you are creating a fancy 3D engine
  for
  kung fu game.
 
  If you are going to exclude overlays, you are removing functionality -
  and,
  indeed, absolutely has-to-be-there functionality, because noone would
  intuitively expect search function to search only one subset of
  packages,
  however reasonable this subset would be. So, you can't, just can't, add
  this
  package into portage base - you could write just another external search
  package for portage.
 
  I looked this code a bit and:
  Portage's __init__.py contains comment # search functionality. After
  this comment, there is a nice and simple search class.
  It also contains method def action_sync(...), which contains
  synchronization stuff.
 
  Now, search class will be initialized by setting up 3 databases -
  porttree,
  bintree and vartree, whatever those are. Those will be in self._dbs
  array
  and porttree will be in self._portdb.
 
  It contains some more methods:
  _findname(...) will return result of self._portdb.findname(...) with
  same
  parameters or None if it does not exist.
  Other methods will do similar things - map one or another method.
  execute will do the real search...
  Now - for package in self.portdb.cp_all() is important here ...it
  currently loops over whole portage tree. All kinds of matching will be
  done
  inside.
  self.portdb obviously points to porttree.py (unless it points to fake
  tree).
  cp_all will take all porttrees and do simple file search inside. This
  method
  should contain optional index search.
 
self.porttrees = [self.porttree_root] + \
[os.path.realpath(t) for t in
  self.mysettings[PORTDIR_OVERLAY].split()]
 
  So, self.porttrees contains list of trees - first of them is root,
  others
  are overlays.
 
  Now, what you have to 

Re: [gentoo-portage-dev] About boosting sync

2008-12-02 Thread Ned Ludd

On Tue, 2008-12-02 at 19:46 +0200, Tambet wrote:
 Has anyone ever noticed that portage tree contains a lot of md5
 hashes, which are not at all important for using it? I think that it
 does not make reliability or functionality smaller any bit if those
 would all stay in sync servers - anyway, syncing would go much faster
 and this tree smaller. What about removing all those md5 hashes and
 downloading them only when they're needed?

To build a deptree portage needs to source the ebuild in the depend
phase, so portage needs to know that a file is safe to source before it
loads it. Being that FEATURES='strict' is enabled per default in all
profiles. It's rather vital that things remain the way they are now.


-- 
Ned Ludd [EMAIL PROTECTED]
Gentoo Linux




Re: [gentoo-portage-dev] Re: search functionality in emerge

2008-12-02 Thread Tambet
It might be that your hard drive is not that much slower than memory, then,
but I really doubt this one ...or it could mean that reading gzip out is
much slower than reading cat - and this one is highly probable. I mean, file
size of gzip.

Actually it's elementary logic that decompressing is faster than just
loading. What I personally did use was *much *faster than without
compressing, but that was also c++ application having this zip always in
memory and this was also highly inefficiently stored data at first.

I suggest you such test to understand me - make some file and write there
character a about 10 000 000 times to get it that big, then try the same
thing on that file. I think it's probable that it will be real fast to
decompress the resulting file.

Anyway, you have still made me think that at first, no zip should be used :)
- just because your tests took several new variables in like speed of
reading decompression utility from disk.

Tambet - technique evolves to art, art evolves to magic, magic evolves to
just doing.


2008/12/2 Alec Warner [EMAIL PROTECTED]

 On Tue, Dec 2, 2008 at 4:42 AM, Tambet [EMAIL PROTECTED] wrote:
  About zipping.. Default settings might not really be good idea - i think
  that fastest might be even better. Considering that portage tree
 contains
  same word again and again (like applications) it needs pretty small
  dictionary to make it much smaller. Decompressing will not be reading
 from
  disc, decompressing and writing back to disc as in your case probably -
 try
  decompression to memory drive and you might get better numbers.

 I ran gzip -d -c file.gz  /dev/null, which should not write to disk.

 I tried again with gzip -1 and it still takes 29ms to decompress (even
 with gzip -1) where a bare read takes 26ms.  (I have a 2.6Ghz X2 which
 is probably relevant to gzip decompression speed)

 
  I have personally used compression in one c++ application and with
 optimum
  settings, it made things much faster - those were files, where i had for
  example 65536 16-byte integers, which could be zeros and mostly were; I
  didnt care about creating better file format, but just compressed the
 whole
  thing.

 I'm not saying compression won't make the index smaller.  I'm saying
 making the index smaller does not improve performance.  If you have a
 10 meg file and you make it 1 meg, you do not increase performance
 because you (on average) are not saving enough time reading the
 smaller file; since you pay it in decompressing the smaller file
 later.

 
  I suggest you to compress esearch db, then decompress it to memory drive
 and
  give us those numbers - might be considerably faster.

 gzip -d -c esearchdb.py.gz  /dev/null (compressed with gzip -1) takes
 on average (6 trials, dropped caches between trials) takes 35.1666ms

 cat esearchdb.py  /dev/null (uncompressed) takes on average of 6 trials,
 24ms.

 The point is you use compression when you need to save space (sending
 data over the network, or storing large amounts of data or a lot of
 something).  The index isn't going to be big (if it is bigger than 20
 or 30 meg I'll be surprised), the index isn't going over the network
 and there is only 1 index, not say a million indexes (where
 compression might actually be useful for some kind of LRU subset of
 indexes to meet disk requirements).

 Anyway this is all moot since as you stated so well earlier,
 optimization comes last, so stop trying to do it now ;)

 -Alec

 
  http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip
 support.
  Try open of that and normal open on esearch db; also compress with the
 same
  lib to get right kind of file.
 
  Anyway - maybe this compression should be later added and optional.
 
  Tambet - technique evolves to art, art evolves to magic, magic evolves to
  just doing.
 
 
  2008/12/2 Alec Warner [EMAIL PROTECTED]
 
  On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote:
   2008/12/2 Emma Strubell [EMAIL PROTECTED]
  
   True, true. Like I said, I don't really use overlays, so excuse my
   igonrance.
  
   Do you know an order of doing things:
  
   Rules of Optimization:
  
   Rule 1: Don't do it.
   Rule 2 (for experts only): Don't do it yet.
  
   What this actually means - functionality comes first. Readability
 comes
   next. Optimization comes last. Unless you are creating a fancy 3D
 engine
   for
   kung fu game.
  
   If you are going to exclude overlays, you are removing functionality -
   and,
   indeed, absolutely has-to-be-there functionality, because noone would
   intuitively expect search function to search only one subset of
   packages,
   however reasonable this subset would be. So, you can't, just can't,
 add
   this
   package into portage base - you could write just another external
 search
   package for portage.
  
   I looked this code a bit and:
   Portage's __init__.py contains comment # search functionality.
 After
   this comment, there is a nice and simple search class.
   It 

Re: [gentoo-portage-dev] About boosting sync

2008-12-02 Thread Robin H. Johnson
On Tue, Dec 02, 2008 at 07:46:13PM +0200, Tambet wrote:
 Has anyone ever noticed that portage tree contains a lot of md5 hashes,
 which are not at all important for using it? I think that it does not make
 reliability or functionality smaller any bit if those would all stay in sync
 servers - anyway, syncing would go much faster and this tree smaller. What
 about removing all those md5 hashes and downloading them only when they're
 needed?
Umm, what are you on? There are no more MD5s in Manifest2. It should be
only RMD160, SHA1, SHA256. If you DO find a Manifest with an MD5, I'd
REALLY like to know about it.

As for the important of Manifests and the hashes, I'd like to offer the
following as suggested reading:
http://www.cs.arizona.edu/people/justin/packagemanagersecurity/
Specifically, see the papers page, and find the paper from CCS 2008 [1].
He DID solicit input from me on how Gentoo deals with the issue, and
gave it fair coverage in my opinion. It's CRITICALLY important that the
checksums go with the content, and that the checksums are later verified
themselves against a known up to date source.

If you're interested in the Gentoo side of it, specifically how it ties
into tree-signing, read my gleps:
http://www.gentoo.org/proj/en/glep/glep-0057.html
http://www.gentoo.org/proj/en/glep/glep-0058.html
http://www.gentoo.org/proj/en/glep/glep-0059.html
http://www.gentoo.org/proj/en/glep/glep-0060.html
http://www.gentoo.org/proj/en/glep/glep-0061.html

[1] Cappos, J. et al. A Look In the Mirror: Attacks on Package
Managers. (2008). Published in the proceedings of ACM CCS 2008.

-- 
Robin Hugh Johnson
Gentoo Linux Developer  Infra Guy
E-Mail : [EMAIL PROTECTED]
GnuPG FP   : 11AC BA4F 4778 E3F6 E4ED  F38E B27B 944E 3488 4E85


pgprZuHMj1Wb3.pgp
Description: PGP signature