Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal
On Mon, Dec 1, 2008 at 3:14 PM, Maciej Mrozowski [EMAIL PROTECTED] wrote: On Monday 01 of December 2008 22:51:57 Ciaran McCreesh wrote: Experience, manpower, the ability to try out potential enhancements rapidly, a long track record of getting it right and the growing recognition that most people doing package manager work for Gentoo aren't doing it with Portage. While of course I agree that any input from 'outside' is welcome and valuable, yet to get things done, in my opinion the final decision should not be blocked by from any alternative package manager and some policies should be enforced. But on topic, what's a counter proposal for my idea then? mean hat You asked, so the counter proposal is to *do nothing*. very mean generic rant hat on Ideas (even good ones) don't always get implemented. Sometimes that just isn't the direction the maintainers want to take the project. Sometimes it is harder to implement than most people realize. Sometimes suggested implementations are just a hack and a bad idea all around. I think starting with an implementation may have been a bad starting move. Start with what you want to accomplish: - Get feedback on whether this is useful or not. - Get feedback on other features that may be available. - Get feedback on how some folks would accomplish this. I want to be able to turn debug builds on or off on a per-package basis. Debug builds entail both debugging symbols, split-debug, debug CFLAGS and debug LDFLAGS. Is that a fair summary of your request? I am unsure how much you actually care about how each package manager implements this feature (or if anyone implements it but portage, or paludis, or whatever the majority of the KDE users are using). I'm also unsure how useful this is when say, some part of KDE links against libfoo and KDE is built with debug symbols but libfoo is not. Is that really useful? Are users actually asking for this proposed feature or do you just think they want it? Do you have any data to back up why someone should implement this feature (mailing list posts, forums threads, etc..) Certainly for portage per-package features are possible with a minor patch (to read the custom settings from your config and to inject the FEATURES variables into the per-package config when necessary). The problem that has been stated in the past is that FEATURES were not designed to be used in that manner (per-package). We could design an separate system that let you define per-package 'things' and use these 'things' to trigger debug builds (completely outside of FEATURES, leaving them alone). FEATURES were in fact specific features of portage that you want 'on' or 'off' (metadata-transfer, parallel-fetch, userfetch, unmerge-orphans, etc...) These are examples of things you would not turn off per-ebuild. But the question is always 'is it really worth it' and can you get someone to do it. Sometimes, doing nothing is better than doing something badly. endrant -Alec Quick search in archives gave me some results I don't particularly like, like the idea with /etc/portage/packages.cflags and /etc/portage/package.env, and they have been dropped for similar reasons - as the former needs special parsing instead just sourcing the script (the problem is that someone needs to implement this - this is usually the problem, especially in pure volunteer projects like Gentoo), the latter looks a bit messy to me. /etc/portage/env would be the best approach when made officially supported (recently it looks like /etc/portage/env is sourced multiple times and that should be fixed, for convenience, just in case user wants to put: CFLAGS=-O0 -ggdb CXXFLAGS=${CFLAGS} FEATURES=${FEATURES} nostrip (or even USE=${USE} debug) actually /etc/portage/env could easily replace package.keywords and package.use as well and introduce replacement for meybe-proposed-sometime package.features - I wonder whether it's been discussed already. Not without causing a bunch of pain in figuring out the inheriting order of stack USE variables. -- regards MM
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Jan == Jan Kundrát [EMAIL PROTECTED] writes: - less data in metadata cache; Jan Isn't it in the cache for some reason? Really, I'm just asking. If for nothing else, so that update-eix can get it to allow searching on homepage. And, yes, that is an important feature. And, no, openeing every metadata.xml file during update-eix is in no way acceptable. For eix above, of course, read your favourite query tool. -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
Diego == Diego 'Flameeyes' Pettenò [EMAIL PROTECTED] writes: But also the need to replicate http://www.kde.org/ to metadata.xml of all KDE split ebuilds -- right now, this is set by an eclass. Diego The usefulness of this is IMHO debatable; why not just writing it one Diego package (say kde-base/kde or kde-meta) and just there? Having each Diego mini-package express itself as having that as its homepage is not very Diego useful to me, but I guess it's debatable. Searching is an important reason for every package to specify its homepage. -JimC -- James Cloos [EMAIL PROTECTED] OpenPGP: 1024D/ED7DAEA6
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
James Cloos [EMAIL PROTECTED] writes: Searching is an important reason for every package to specify its homepage. And? metadata.xml already contains data that eix and other software should be able to search in (like longdescriptions), and having each package in kde-base report http://www.kde.org/ as its homepage is kinda pointless if you think about search, since that's not data, it's noise. Which only adds to my point. -- Diego Flameeyes Pettenò http://blog.flameeyes.eu/ pgpS3fcWfM3UH.pgp Description: PGP signature
[gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Mon, 01 Dec 2008 10:00:33 +0100 [EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote: Alec Warner [EMAIL PROTECTED] writes: That being said I still don't see the usefulness here. You seem to think that using the existing APIs for this data is wrong, and I think the opposite, so I guess we will agree to disagree on this matter. Yeah I still think that there is no point in requiring using of a specific API when the same data can easily be available in a format that is more or less parsable with ease in any modern (and non) programming language. Beside, I find expanding the HOMEPAGE syntax to allow more than one link a bit ... overkill, if the same thing can be achieved in metadata.xml... I find moving HOMEPAGE out of ebuilds to be a bit overkill. -- gcc-porting, by design, by neglect treecleaner, for a fact or just for effect wxwidgets @ gentoo EFFD 380E 047A 4B51 D2BD C64F 8AA8 8346 F9A4 0662 signature.asc Description: PGP signature
Re: [gentoo-dev] [RFC] Moving HOMEPAGE out of ebuilds for the future
While the KDE eclass doesn't set specific homepages per packages, a number of other eclasses do: eclass/horde.eclass:HOMEPAGE=http://www.horde.org/${HORDE_PN}; eclass/java-pkg-2.eclass: HOMEPAGE=http://commons.apache.org/${PN#commons-}/; eclass/kernel-2.eclass:HOMEPAGE=http://www.kernel.org/ http://www.gentoo.org/ ${HOMEPAGE} eclass/perl-module.eclass: HOMEPAGE=http://search.cpan.org/search?query=${MY_PN:-${PN}}mode=dist; eclass/php-ext-pecl-r1.eclass:HOMEPAGE=http://pecl.php.net/${PECL_PKG}; eclass/php-pear-r1.eclass:[[ -z ${HOMEPAGE} ]] HOMEPAGE=http://pear.php.net/${PHP_PEAR_PKG_NAME}; eclass/ruby.eclass:HOMEPAGE=http://raa.ruby-lang.org/list.rhtml?name=${PN}; eclass/xfce44.eclass: HOMEPAGE=http://thunar.xfce.org/pwiki/projects/${MY_PN}; Additionally, some of the above eclasses are used by other eclasses: ant-tasks, java-gnome, perl-app, perl-post, php-ext-pecl, php-ezc, php-pear, gems A quick scan of the tree shows 15% of the ebuilds do not set the HOMEPAGE variable in the ebuild itself. And a LOT more qualify, esp. in dev-ruby and dev-perl. Some quick scanning on groups of packages that I'm aware of puts the figure beyond 20% of the tree qualifying (converting any dev-perl/perl-core package that comes from CPAN). As another major pain, for ebuilds where the homepage changes every version in some predictable pattern, you have now increased the maintenance burden. Before we could just copy the ebuild if we had a suitable variable expression in the HOMEPAGE variable, but now we'd have to edit it into metadata.xml as well. For all the rest of the ebuilds where it does remain static, I don't see any actual advantage to removing it from the ebuilds. To be very clear however, I've got _zero_ objections to adding the extra new fields into the metadata.xml, provided they are version independent. -- Robin Hugh Johnson Gentoo Linux Developer Infra Guy E-Mail : [EMAIL PROTECTED] GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 pgp24EsyRaSjN.pgp Description: PGP signature
Re: [gentoo-dev] Re: [RFC] Moving HOMEPAGE out of ebuilds for the future
On Wed, 03 Dec 2008 02:05:31 +0100 [EMAIL PROTECTED] (Diego 'Flameeyes' Pettenò) wrote: metadata.xml already contains data that eix and other software should be able to search in (like longdescriptions), and having each package in kde-base report http://www.kde.org/ as its homepage is kinda pointless if you think about search, since that's not data, it's noise. So you're saying if I'm interested in a url to look for information about kalarm, I should search for it in metadata.xml of random kde packages? Sorry, but that doesn't make any sense to me. While I'm not necessarily against your primary goal here, your argumentation is very subjective to say the least (e.g. just because you find xml easier to read/parse than ebuilds doesn't mean the same holds true for everyone else, ignoring the whole cache issue). It feels a bit like you're looking for problems to justify your solution rather than the other way round. Marius
Re: [gentoo-dev] Looking for help with kernel maintenance
On Tue, Dec 02, 2008 at 12:59:51PM +0800, Cheng Renquan wrote: 1. I have written several patches for vanilla kernel since 2.6.21, mostly very simple, http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=search;st=author;[EMAIL PROTECTED] I would like to go further in the Linux kernel internal comprehension. Could someone tell me where to find a good starting free documentation ? Most of the documentations I've found are about old kernel versions (2.4 series). -- Nicolas Sebrecht
Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal
On Tuesday 02 of December 2008 10:40:19 Alec Warner wrote: mean hat You asked, so the counter proposal is to *do nothing*. very mean generic rant hat on Ideas (even good ones) don't always get implemented. Sometimes that just isn't the direction the maintainers want to take the project. Sometimes it is harder to implement than most people realize. Sometimes suggested implementations are just a hack and a bad idea all around. I think starting with an implementation may have been a bad starting move. Start with what you want to accomplish: - Get feedback on whether this is useful or not. - Get feedback on other features that may be available. - Get feedback on how some folks would accomplish this. I want to be able to turn debug builds on or off on a per-package basis. Debug builds entail both debugging symbols, split-debug, debug CFLAGS and debug LDFLAGS. Is that a fair summary of your request? Yes, precisely. But forget about this proposal, as I stated already it's just a workaround for inability to set CFLAGS/LDFLAGS and those two FEATURES per- package basis in *official* way. I am unsure how much you actually care about how each package manager implements this feature (or if anyone implements it but portage, or paludis, or whatever the majority of the KDE users are using). I'm also unsure how useful this is when say, some part of KDE links against libfoo and KDE is built with debug symbols but libfoo is not. Is that really useful? Are users actually asking for this proposed feature or do you just think they want it? Do you have any data to back up why someone should implement this feature (mailing list posts, forums threads, etc..) No, and I'm afraid I cannot provide any single evidence that users actually need features like: - per package cflags/ldflags/features - per category use flags, accept_keywords, cflags - or tag clouds instead of hard coded categories - user-defined packages sets (official) - multiple portage configurations support to ease building binaries for several targets on a same host - dynamic libraries tracking for safe package upgrade or removal - real backwards dependencies - maybe git driven Portage - automatic kernel modules rebuilding - mysql split ebuilds Actually, I'm perfectly certain that users are way more interested in critical important aspects of their system like whether HOMEPAGE should be set in ebuilds or in metadata.xml :D Please let me solve your little problem with HOMEPAGE for you... Package's homepage obviously may be, and actually is - ${PN}-${PV} specific. That being said it *would* needs to be specified either in every ebuild or as someone proposed - in metadata.xml in versioned/tagged way. And no matter how many searches you run - it may be easy to predict that due to lazyness (less probable) or just to avoid copy/paste (copy/paste is bad - everyone knows that) - some developers used to put HOMEPAGE in eclasses - because it may be used to put in postinst message for some reasons, that being said it needs to be in ebuild domain in current implementation. Mixing XML and bash (ebuild) in general isn't a bad idea but using bothe of them seems to be inconsistent - but some trade off needs to be paid sometimes. When duplicating HOMEPAGE is such a pain for developer (as he needs to type it all over again, I agree, it is pain, especially when one need to put some things only to please repoman), why not invest some time and develop tools that could make it easier - like meta-ebuilds (or ebuild generators) and ebuild templates? I've done something like this to autogenerate plasma applet live ebuilds from KDE playground on their SVN (it's not yet commited to overlay as eclass is not yet ready to fetch/unpack and build packages from this location and I haven't got time yet to patch it). If declaring HOMEPAGE in eclasses troubles you as you need BASH to process it properly (it may be pain for non-BASH search tools) and XML may be problematic to parse for bash tools - why not create such ebuild generator or 'compiler' - that could generate ebuild? Or for example as complete BASH script (no need for inherit anything) - and use eclasses ONLY like 'development library'. This way - every ebuild could be: - eclass-breakage free (overwriting eclasses don't take place so you are certain that user's emerge-problem is not him messing with eclasses - like mixing those from other overlays) - every defined variable is there (no need for 'inherit' lookup) - so that one can easily find HOMEPAGE= using every kind of tool (unless it is enclosed with some condition - why would anyone need to do that btw?) - much larger disk space requirements for Portage tree - but that could be compensated by for example gzipping every ebuild. Of course every problems with dichotomy ebuild vs metadata could be solved by some new Portage backend - better suited for queries and storage (maybe some relational database). But so far -
Re: [gentoo-dev] Re: debug/release builds extensions/clarification proposal
On Wed, 3 Dec 2008 08:19:18 +0100 Maciej Mrozowski [EMAIL PROTECTED] wrote: No, and I'm afraid I cannot provide any single evidence that users actually need features like: - per package cflags/ldflags/features - per category use flags, accept_keywords, cflags - or tag clouds instead of hard coded categories - user-defined packages sets (official) - multiple portage configurations support to ease building binaries for several targets on a same host - dynamic libraries tracking for safe package upgrade or removal - real backwards dependencies - maybe git driven Portage - automatic kernel modules rebuilding - mysql split ebuilds Assuming that's a list of feature requests, you know that half of them are already available, right? (not counting the non-feature in there) Marius
Re: [gentoo-portage-dev] Re: search functionality in emerge
About zipping.. Default settings might not really be good idea - i think that fastest might be even better. Considering that portage tree contains same word again and again (like applications) it needs pretty small dictionary to make it much smaller. Decompressing will not be reading from disc, decompressing and writing back to disc as in your case probably - try decompression to memory drive and you might get better numbers. I have personally used compression in one c++ application and with optimum settings, it made things much faster - those were files, where i had for example 65536 16-byte integers, which could be zeros and mostly were; I didnt care about creating better file format, but just compressed the whole thing. I suggest you to compress esearch db, then decompress it to memory drive and give us those numbers - might be considerably faster. http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip support. Try open of that and normal open on esearch db; also compress with the same lib to get right kind of file. Anyway - maybe this compression should be later added and optional. Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing. 2008/12/2 Alec Warner [EMAIL PROTECTED] On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote: 2008/12/2 Emma Strubell [EMAIL PROTECTED] True, true. Like I said, I don't really use overlays, so excuse my igonrance. Do you know an order of doing things: Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. What this actually means - functionality comes first. Readability comes next. Optimization comes last. Unless you are creating a fancy 3D engine for kung fu game. If you are going to exclude overlays, you are removing functionality - and, indeed, absolutely has-to-be-there functionality, because noone would intuitively expect search function to search only one subset of packages, however reasonable this subset would be. So, you can't, just can't, add this package into portage base - you could write just another external search package for portage. I looked this code a bit and: Portage's __init__.py contains comment # search functionality. After this comment, there is a nice and simple search class. It also contains method def action_sync(...), which contains synchronization stuff. Now, search class will be initialized by setting up 3 databases - porttree, bintree and vartree, whatever those are. Those will be in self._dbs array and porttree will be in self._portdb. It contains some more methods: _findname(...) will return result of self._portdb.findname(...) with same parameters or None if it does not exist. Other methods will do similar things - map one or another method. execute will do the real search... Now - for package in self.portdb.cp_all() is important here ...it currently loops over whole portage tree. All kinds of matching will be done inside. self.portdb obviously points to porttree.py (unless it points to fake tree). cp_all will take all porttrees and do simple file search inside. This method should contain optional index search. self.porttrees = [self.porttree_root] + \ [os.path.realpath(t) for t in self.mysettings[PORTDIR_OVERLAY].split()] So, self.porttrees contains list of trees - first of them is root, others are overlays. Now, what you have to do will not be harder just because of having overlay search, too. You have to create method def cp_index(self), which will return dictionary containing package names as keys. For oroot... will be self.porttrees[1:], not self.porttrees - this will only search overlays. d = {} will be replaced with d = self.cp_index(). If index is not there, old version will be used (thus, you have to make internal porttrees variable, which contains all or all except first). Other methods used by search are xmatch and aux_get - first used several times and last one used to get description. You have to cache results of those specific queries and make them use your cache - as you can see, those parts of portage are already able to use overlays. Thus, you have to put your code again in beginning of those functions - create index_xmatch and index_aux_get methods, then make those methods use them and return their results unless those are None (or something other in case none is already legal result) - if they return None, old code will be run and do it's job. If index is not created, result is None. In index_** methods, just check if query is what you can answer and if it is, then answer it. Obviously, the simplest way to create your index is to delete index, then use those same methods to query for all nessecary information - and fastest way would be to add updating index directly into sync, which you could do later. Please, also, make
[gentoo-portage-dev] About boosting sync
Has anyone ever noticed that portage tree contains a lot of md5 hashes, which are not at all important for using it? I think that it does not make reliability or functionality smaller any bit if those would all stay in sync servers - anyway, syncing would go much faster and this tree smaller. What about removing all those md5 hashes and downloading them only when they're needed? Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing.
Re: [gentoo-portage-dev] Re: search functionality in emerge
On Tue, Dec 2, 2008 at 4:42 AM, Tambet [EMAIL PROTECTED] wrote: About zipping.. Default settings might not really be good idea - i think that fastest might be even better. Considering that portage tree contains same word again and again (like applications) it needs pretty small dictionary to make it much smaller. Decompressing will not be reading from disc, decompressing and writing back to disc as in your case probably - try decompression to memory drive and you might get better numbers. I ran gzip -d -c file.gz /dev/null, which should not write to disk. I tried again with gzip -1 and it still takes 29ms to decompress (even with gzip -1) where a bare read takes 26ms. (I have a 2.6Ghz X2 which is probably relevant to gzip decompression speed) I have personally used compression in one c++ application and with optimum settings, it made things much faster - those were files, where i had for example 65536 16-byte integers, which could be zeros and mostly were; I didnt care about creating better file format, but just compressed the whole thing. I'm not saying compression won't make the index smaller. I'm saying making the index smaller does not improve performance. If you have a 10 meg file and you make it 1 meg, you do not increase performance because you (on average) are not saving enough time reading the smaller file; since you pay it in decompressing the smaller file later. I suggest you to compress esearch db, then decompress it to memory drive and give us those numbers - might be considerably faster. gzip -d -c esearchdb.py.gz /dev/null (compressed with gzip -1) takes on average (6 trials, dropped caches between trials) takes 35.1666ms cat esearchdb.py /dev/null (uncompressed) takes on average of 6 trials, 24ms. The point is you use compression when you need to save space (sending data over the network, or storing large amounts of data or a lot of something). The index isn't going to be big (if it is bigger than 20 or 30 meg I'll be surprised), the index isn't going over the network and there is only 1 index, not say a million indexes (where compression might actually be useful for some kind of LRU subset of indexes to meet disk requirements). Anyway this is all moot since as you stated so well earlier, optimization comes last, so stop trying to do it now ;) -Alec http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip support. Try open of that and normal open on esearch db; also compress with the same lib to get right kind of file. Anyway - maybe this compression should be later added and optional. Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing. 2008/12/2 Alec Warner [EMAIL PROTECTED] On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote: 2008/12/2 Emma Strubell [EMAIL PROTECTED] True, true. Like I said, I don't really use overlays, so excuse my igonrance. Do you know an order of doing things: Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. What this actually means - functionality comes first. Readability comes next. Optimization comes last. Unless you are creating a fancy 3D engine for kung fu game. If you are going to exclude overlays, you are removing functionality - and, indeed, absolutely has-to-be-there functionality, because noone would intuitively expect search function to search only one subset of packages, however reasonable this subset would be. So, you can't, just can't, add this package into portage base - you could write just another external search package for portage. I looked this code a bit and: Portage's __init__.py contains comment # search functionality. After this comment, there is a nice and simple search class. It also contains method def action_sync(...), which contains synchronization stuff. Now, search class will be initialized by setting up 3 databases - porttree, bintree and vartree, whatever those are. Those will be in self._dbs array and porttree will be in self._portdb. It contains some more methods: _findname(...) will return result of self._portdb.findname(...) with same parameters or None if it does not exist. Other methods will do similar things - map one or another method. execute will do the real search... Now - for package in self.portdb.cp_all() is important here ...it currently loops over whole portage tree. All kinds of matching will be done inside. self.portdb obviously points to porttree.py (unless it points to fake tree). cp_all will take all porttrees and do simple file search inside. This method should contain optional index search. self.porttrees = [self.porttree_root] + \ [os.path.realpath(t) for t in self.mysettings[PORTDIR_OVERLAY].split()] So, self.porttrees contains list of trees - first of them is root, others are overlays. Now, what you have to
Re: [gentoo-portage-dev] About boosting sync
On Tue, 2008-12-02 at 19:46 +0200, Tambet wrote: Has anyone ever noticed that portage tree contains a lot of md5 hashes, which are not at all important for using it? I think that it does not make reliability or functionality smaller any bit if those would all stay in sync servers - anyway, syncing would go much faster and this tree smaller. What about removing all those md5 hashes and downloading them only when they're needed? To build a deptree portage needs to source the ebuild in the depend phase, so portage needs to know that a file is safe to source before it loads it. Being that FEATURES='strict' is enabled per default in all profiles. It's rather vital that things remain the way they are now. -- Ned Ludd [EMAIL PROTECTED] Gentoo Linux
Re: [gentoo-portage-dev] Re: search functionality in emerge
It might be that your hard drive is not that much slower than memory, then, but I really doubt this one ...or it could mean that reading gzip out is much slower than reading cat - and this one is highly probable. I mean, file size of gzip. Actually it's elementary logic that decompressing is faster than just loading. What I personally did use was *much *faster than without compressing, but that was also c++ application having this zip always in memory and this was also highly inefficiently stored data at first. I suggest you such test to understand me - make some file and write there character a about 10 000 000 times to get it that big, then try the same thing on that file. I think it's probable that it will be real fast to decompress the resulting file. Anyway, you have still made me think that at first, no zip should be used :) - just because your tests took several new variables in like speed of reading decompression utility from disk. Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing. 2008/12/2 Alec Warner [EMAIL PROTECTED] On Tue, Dec 2, 2008 at 4:42 AM, Tambet [EMAIL PROTECTED] wrote: About zipping.. Default settings might not really be good idea - i think that fastest might be even better. Considering that portage tree contains same word again and again (like applications) it needs pretty small dictionary to make it much smaller. Decompressing will not be reading from disc, decompressing and writing back to disc as in your case probably - try decompression to memory drive and you might get better numbers. I ran gzip -d -c file.gz /dev/null, which should not write to disk. I tried again with gzip -1 and it still takes 29ms to decompress (even with gzip -1) where a bare read takes 26ms. (I have a 2.6Ghz X2 which is probably relevant to gzip decompression speed) I have personally used compression in one c++ application and with optimum settings, it made things much faster - those were files, where i had for example 65536 16-byte integers, which could be zeros and mostly were; I didnt care about creating better file format, but just compressed the whole thing. I'm not saying compression won't make the index smaller. I'm saying making the index smaller does not improve performance. If you have a 10 meg file and you make it 1 meg, you do not increase performance because you (on average) are not saving enough time reading the smaller file; since you pay it in decompressing the smaller file later. I suggest you to compress esearch db, then decompress it to memory drive and give us those numbers - might be considerably faster. gzip -d -c esearchdb.py.gz /dev/null (compressed with gzip -1) takes on average (6 trials, dropped caches between trials) takes 35.1666ms cat esearchdb.py /dev/null (uncompressed) takes on average of 6 trials, 24ms. The point is you use compression when you need to save space (sending data over the network, or storing large amounts of data or a lot of something). The index isn't going to be big (if it is bigger than 20 or 30 meg I'll be surprised), the index isn't going over the network and there is only 1 index, not say a million indexes (where compression might actually be useful for some kind of LRU subset of indexes to meet disk requirements). Anyway this is all moot since as you stated so well earlier, optimization comes last, so stop trying to do it now ;) -Alec http://www.python.org/doc/2.5.2/lib/module-gzip.html - Python gzip support. Try open of that and normal open on esearch db; also compress with the same lib to get right kind of file. Anyway - maybe this compression should be later added and optional. Tambet - technique evolves to art, art evolves to magic, magic evolves to just doing. 2008/12/2 Alec Warner [EMAIL PROTECTED] On Mon, Dec 1, 2008 at 4:20 PM, Tambet [EMAIL PROTECTED] wrote: 2008/12/2 Emma Strubell [EMAIL PROTECTED] True, true. Like I said, I don't really use overlays, so excuse my igonrance. Do you know an order of doing things: Rules of Optimization: Rule 1: Don't do it. Rule 2 (for experts only): Don't do it yet. What this actually means - functionality comes first. Readability comes next. Optimization comes last. Unless you are creating a fancy 3D engine for kung fu game. If you are going to exclude overlays, you are removing functionality - and, indeed, absolutely has-to-be-there functionality, because noone would intuitively expect search function to search only one subset of packages, however reasonable this subset would be. So, you can't, just can't, add this package into portage base - you could write just another external search package for portage. I looked this code a bit and: Portage's __init__.py contains comment # search functionality. After this comment, there is a nice and simple search class. It
Re: [gentoo-portage-dev] About boosting sync
On Tue, Dec 02, 2008 at 07:46:13PM +0200, Tambet wrote: Has anyone ever noticed that portage tree contains a lot of md5 hashes, which are not at all important for using it? I think that it does not make reliability or functionality smaller any bit if those would all stay in sync servers - anyway, syncing would go much faster and this tree smaller. What about removing all those md5 hashes and downloading them only when they're needed? Umm, what are you on? There are no more MD5s in Manifest2. It should be only RMD160, SHA1, SHA256. If you DO find a Manifest with an MD5, I'd REALLY like to know about it. As for the important of Manifests and the hashes, I'd like to offer the following as suggested reading: http://www.cs.arizona.edu/people/justin/packagemanagersecurity/ Specifically, see the papers page, and find the paper from CCS 2008 [1]. He DID solicit input from me on how Gentoo deals with the issue, and gave it fair coverage in my opinion. It's CRITICALLY important that the checksums go with the content, and that the checksums are later verified themselves against a known up to date source. If you're interested in the Gentoo side of it, specifically how it ties into tree-signing, read my gleps: http://www.gentoo.org/proj/en/glep/glep-0057.html http://www.gentoo.org/proj/en/glep/glep-0058.html http://www.gentoo.org/proj/en/glep/glep-0059.html http://www.gentoo.org/proj/en/glep/glep-0060.html http://www.gentoo.org/proj/en/glep/glep-0061.html [1] Cappos, J. et al. A Look In the Mirror: Attacks on Package Managers. (2008). Published in the proceedings of ACM CCS 2008. -- Robin Hugh Johnson Gentoo Linux Developer Infra Guy E-Mail : [EMAIL PROTECTED] GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85 pgprZuHMj1Wb3.pgp Description: PGP signature