Re: [gentoo-dev] multiple categories for a package (was: [gentoo-dev] Re: New category proposal)
On Mon, May 16, 2005 at 10:28:48PM +0200, David Klaftenegger wrote: > Georgi Georgiev wrote: > > Would it be inappropriate to start bitching (again) about a flat tree > > where each package can go in multiple categories? > > So now, that I've read all messages in this thread, I needed a point to > start at.. > I guess my approach isn't a way to go, but I can't find the reason for > it being bad, so: > Why not just create a symlink to the package in the category it *also* > should be in? > > For example, net-mail/mutt could be a symlink to ../mail-client/mutt, > allowing to find it in both categories. > > Ok, portage would have to do extra work, as it would have to check > wether a package is a symlink or not, ignore "symlink-packages" when it > comes to ambiguous naming, count them as already installed if the > package it points to is already installed and so on... > quite some work, but from my point of view less than some other solutions. > > So I hope you understand what I mean, you may now hang me for this > proposal, but if you do please tell me why it is not a good way to allow > multiple categories per package ;-) It's a better approach then tagging it into the metadata imo, since it forces unique cat/package still. Won't play nice if the tree's fs doesn't like symlinks though (fat)... Also doesn't seem incredibly useful to me, although keep in mind I'm the lazy bugger who thinks what's there currently suffices :) ~brian -- gentoo-dev@gentoo.org mailing list
[gentoo-dev] multiple categories for a package (was: [gentoo-dev] Re: New category proposal)
Georgi Georgiev wrote: > Would it be inappropriate to start bitching (again) about a flat tree > where each package can go in multiple categories? So now, that I've read all messages in this thread, I needed a point to start at.. I guess my approach isn't a way to go, but I can't find the reason for it being bad, so: Why not just create a symlink to the package in the category it *also* should be in? For example, net-mail/mutt could be a symlink to ../mail-client/mutt, allowing to find it in both categories. Ok, portage would have to do extra work, as it would have to check wether a package is a symlink or not, ignore "symlink-packages" when it comes to ambiguous naming, count them as already installed if the package it points to is already installed and so on... quite some work, but from my point of view less than some other solutions. So I hope you understand what I mean, you may now hang me for this proposal, but if you do please tell me why it is not a good way to allow multiple categories per package ;-) Greetings, David -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
Brian Harring wrote: > > The layout on disk and the semantics of categories do not need to be > > > > related. > Yes and no. You're assuming that people don't use the layout on > disk for digging around without calling portage. Personally, I do. Sometimes I do the same; but other times I find the layout a barrier. Many's the time I've done: $ ls -d /usr/portage/*/ to find a package, for example - that indicates the categories are actually hindering searches in this case. Incidentally it also treats the tree as if it were a flat namespace. However, ideally I wouldn't be searching the tree directly like that at all, I'd be searching metadata based on various criteria. Indeed package names as they stand are frequently uninformative; if you decide you need something for a particular function, you can have a look in what you think may be the relevant categories, only to find a list of mostly meaningless package names. Then you start grepping the DESCRIPTIONs, and so on finally trying equery. This whole process is rather unsatisfactory, and in my experience often fruitless. Many times I've gone the other way; google for something to find candidate package names, then 'ls -d /usr/portage/*/' to see if some kind soul has already added an ebuild to the tree. For me, the whole point of a flat namespace is to _remove_ categories from the atom. Obviously this has far-reaching disruptive consequences as you describe, and in practice is not workable in the short to medium term at least. I'd like to be able to ask questions like, "what app-text packages exist for ?". At the moment, listing app-text, grepping app-text/*/*ebuild may get somewhere, but what about packages placed in different categories for reasons like name clash, other functionality and so on? Cieran McCreesh wrote: > So we end up not using upstream naming, leading to major hassle with > tarballs, major user confusion and inconsistent naming (why are some vim > things vim- and others not?). Bad! Now that portage *tells* you when you > need to be more specific, there's no problem with name matches. I agree maintaing upstream naming is very important. However obviously upstream names can and do clash. That raises the question of how such clashes should be resolved. Categories are a rather arbitrary way of doing that - it's quite possible that a clash could occur between two packages that naturally fall into the same category - in the current system that means one of the packages gets dumped in a second-choice category. Talking atoms, one could handle clashes by differentiating occurrences with an extension to the name. To take the sudo example, sudo could be the normal sudo, sudo:vim (or perhaps sudo__vim to be acceptable to more filesystems) could be the vim extension sudo. Brian Harring wrote: > Re-asserting that the fs layout *does* matter, how is that more intuitive > > when trying > to track down the ebuild for dev-util/diffball ? How many directories > deep > would I have to go before I reached the ebuild? $ ls -d /usr/portage/*/ becomes $ find /usr/portage -type d -name -print and for quick&dirty things like $ grep -l /usr/portage/*//*ebuild instead do: $ find /usr/portage -type d -name \ -exec grep -l \{\}/*ebuild \; or somesuch. An interesting possibility is that the portage mirrors and clients can have different layouts depending what is most suitable. Those with reiserfs could sensibly choose the very wide layout. Others on ext2 could choose a s/u/sudo approach to avoid problems with very wide directories. Obviously this means modifying the sync process somewhat deal with this, but it's quite possible, in a scalable efficient manner. Brian Harring wrote: > > The key here is to separate the category (metadata) and filesystem [snip] > This also locks out several possibilities, like relying on dir structure > to > limit the searches. > You force category classification to be metadata, you need an additional > db > to do searching, > and basic atom lookup. That's 19000+ keys in a db. No db, and you force > a > tree wide search, which _will_ be as fast as emerge -S is. If you retain category in the atom; for me there's no point flattening the namespace without removing the category completely from the atom. Where at the moment you perhaps want to do: $ grep /usr/portage/app-text/*/*ebuild then yes, an additional db of some kind is necessary, or perhaps a more efficient way of searching the metadata.xml files. However I disagree with the 19000+ keys. Portage could for example maintain a simple category->package name mapping - only needs to be updated when packages are added/removed from the tree or metadata is changed, and can be trivial. For example, it could be a simple shell script with entries like: PC_= at which point you only need to do: $ source $ for pkg in ${PC_}; do ... ; done Brian Harring wrote: > cpvs can't conflict, pure and simple
Re: [gentoo-dev] Re: New category proposal
maillog: 11/05/2005-10:33:23(-0500): Brian Harring types > The original request was having a package turn up in multiple > categories for searching, right? Actually, that was a side effect. The original request was to stop moving packages around, which is the most annoying part and is also the part that consumes a lot of effort. After all, this started as a result of a discussion about the new name of a category. There were also talks about whether some package should be in here or there... Having a flat tree is one way to solve the discussed problem and which would also allow me to find some package quickly. The flat tree request I, at least personally, am willing to drop if you offer an alternative solution to the keep-them-package-atoms-fixed-once-and-for-all problem. Ahem, by an "atom" in this sentence I was referring to the CP part of CPV. -- \/ Georgi Georgiev \/ When all other means of communication\/ /\[EMAIL PROTECTED]/\ fail, try words. /\ \/ +81(90)2877-8845 \/ \/ pgpS8pG4R8nhq.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
On Wed, May 11, 2005 at 11:11:02AM -0400, Alec Warner wrote: > >>>Yes and no. You're assuming that people don't use the layout on disk for > >>>digging > >>>around without calling portage. Personally, I do. >Not need to be related, but shouldn't be related. In essence this > allows people to put the tree where-ever, as long as that storage > mechanism supports the database operations required ( stuffing > everything in a SQL db fex ). I don't know why someone wouldbut hey ;) Not a valid arguement for dropping categories however, since I'm playing with sqlite based repository module atm locally :) (don't ask for it, it's not even remotely ready for any use beyond destroying things locally on my box at the moment) Category is just a bit of info used for looking up a node (category="xyz" and package="abc"). Shouldn't isn't applicable here; in this case, category *is* required due to our atoms, unless people manage to push flattening the namespace through. :) > >>The fact that the directory where diffball is is easily deducable by its > >>name. As it is, I'd be a bit lost if I had to guess whether diffball is > >>in app-arch or dev-util. Even if I remembered it was something > >>dev-related I'd still be inclined to look in sys-devel. > > > > dev-util is accurate (it's a compressor, but a specialized variant, > > same as patch is). From it's current fs location/layout, we get thus- > > quick lookup on the atom, and inference of it's intentions. This is > > why we have xml at the category level, for example. > > > > One thing that's being unstated also- it's implicitly stated that this > > directory structure is somehow easier to look up a package. If you > > know the _exact_ package name, maybe. Otherwise, you're falling back > > to a search tool (which defeats to some degree the whole arguement for > > flattened namespace). > > Some quicky python, grouping by first char of the package name, and > > you wind up with (top 8)- > > 421, 521, 571, 582, 657, 663, 664, 746 > Assuming the directories are ordered by letter, ( a,b,c,d ) and > subdirectories if present as well, a bsearch wouldn't be bad. Both are > ordered sets and you can quickly determine the location of a package in > log(n) time. I don't see a big deal here. Dodging my point though. I was pointing out that the categories approach decreases the number of directories/packages within each grouping, while first letter approach jacks up the # of dirs w/in dirs (in some cases, of course). basically, a category fs layout is easier on the human who is digging in if they don't know what they're exactly hunting for, point still stands. :) Regarding bsearch, ehh. listdir returns a list (not an iterable over the (open|read|close)dir syscall), so you'd have to either resort to a linear search, or sort then bsearch it. Like I said, the issue isn't how we code things to make it speedy, my concern outlined above is purely the human factor in dealing with the proposed tree structure. > > I know the path as sys-apps/portage already though. Doesn't that > > count for something? :) > > > > Or, a bit more likely case, I know the type of the package, the > > category, but don't recall it's exact name. What y'all are proposing > > forces the user to use a tool, rather then just a quicky ls. > > *tongue in cheek* and what is ls but another tool for listing the > contents of a directory :) ls does no good if you're trying to see all packages of a category, under this proposal, which is what I'm driving at. It forces the user to use scripts/tools to do querying. > >>I personally want to see the category part *disappear* from the *DEPEND > >>which is one of the reasons to advocate a flat tree. If the category (or > >>part of it) goes in the package name, so be it, but at least there will > >>be no package moves to break older ebuilds or outdated overlays. > > > > Frankly, you need to give a *really* damn good reason for why this is > > better. I don't think it is, convince me otherwise. > > > > What do we gain from a flat namespace? > > Right now, I can infer an atom out of a DEPEND string's purpose to > > some degree, based upon it's category. To head off the "well you > > don't need to know the category, you should know the packages > > intentions if you're modifying the ebuild", that dodges the point; via > > the category portion of an atom, I can infer at least -intention- of a > > package. > > The CPV thing.is a big fix :( > > > Ignoring changes required (have stated them already, no point in > > sniping ya over it), what _exactly_ do we gain from the change? So... what do we gain? Like I said, ignore for a second that massive overhaul required; if work is required to gain something, and what's gained is worth it over the work, sure. I'm not seeing the gain though :) The original request was having a package turn up in multiple categories for searching, right? I don't like it
Re: [gentoo-dev] Re: New category proposal
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Brian Harring wrote: > On Wed, May 11, 2005 at 06:01:17PM +0900, Georgi Georgiev wrote: > >>maillog: 11/05/2005-03:40:04(-0500): Brian Harring types >> On Wed, May 11, 2005 at 09:46:03AM +0200, Kevin F. Quinn wrote: Here's my suggestion, for what it's worth :) The layout on disk and the semantics of categories do not need to be related. >>> >>>Yes and no. You're assuming that people don't use the layout on disk for >>>digging >>>around without calling portage. Personally, I do. Not need to be related, but shouldn't be related. In essence this allows people to put the tree where-ever, as long as that storage mechanism supports the database operations required ( stuffing everything in a SQL db fex ). I don't know why someone wouldbut hey ;) >>> >>> I like the idea of using the first character of a package name as the sub-directory name. This can be extended more deeply as and when necessary to avoid over-large directories which cause problems on some filesystems. e.g. for sudo you get "s/sudo" and vim-sudo "v/vim-sudo". This is architecture-neutral, rsyncable, scalable, and not too difficult for users to parse manually (see later for searching through categories). If the algorithm portage would use to locate a package is such that it doesn't mandate the depth (i.e. tries "package", "p/package" if "p/" exists, "p/a/package" if "p/a/" exists) then overlays can have a different depth to the rsync tree; if you only have a few packages in overlay then they need not be in subdirectories at all. >>> >>>Re-asserting that the fs layout *does* matter, how is that more intuitive >>>when trying >>>to track down the ebuild for dev-util/diffball ? >> >> >>The fact that the directory where diffball is is easily deducable by its >>name. As it is, I'd be a bit lost if I had to guess whether diffball is >>in app-arch or dev-util. Even if I remembered it was something >>dev-related I'd still be inclined to look in sys-devel. > > dev-util is accurate (it's a compressor, but a specialized variant, > same as patch is). From it's current fs location/layout, we get thus- > quick lookup on the atom, and inference of it's intentions. This is > why we have xml at the category level, for example. > > One thing that's being unstated also- it's implicitly stated that this > directory structure is somehow easier to look up a package. If you > know the _exact_ package name, maybe. Otherwise, you're falling back > to a search tool (which defeats to some degree the whole arguement for > flattened namespace). > Some quicky python, grouping by first char of the package name, and > you wind up with (top 8)- > 421, 521, 571, 582, 657, 663, 664, 746 Assuming the directories are ordered by letter, ( a,b,c,d ) and subdirectories if present as well, a bsearch wouldn't be bad. Both are ordered sets and you can quickly determine the location of a package in log(n) time. I don't see a big deal here. > Seperate directories within an individual directory. Say 'd' for > example, and we'll pretend 746 is the count of packages that start > with 'd'. That's a butload of directories to go digging in. > > The response would be, "well then extend it to the first two chars > after the first dir". You narrow it down, but add another layer of > dirs, again, for what gain? > > See, the thing I find odd about this thread/request is that > essentially breaking it down to first letter groupping, is being > argued as being _easier_ for people, while allowing multi cats, or > just flat out dropping the category aspect. The example above, imo, > proves otherwise. > > Keep in mind at this point, the discussion is whats easiest for > _humans_. What's easiest for code/comp is another matter, and (within > limits) can work with anything that's thrown at it. > > >>>How many directories deep would I have to go before I reached the >>>ebuild? >> >>Does it matter? You know the path exactly. It's p/portage. It's >>not ... "was it sys-apps/portage or app-portage/portage"? > > I know the path as sys-apps/portage already though. Doesn't that > count for something? :) > > Or, a bit more likely case, I know the type of the package, the > category, but don't recall it's exact name. What y'all are proposing > forces the user to use a tool, rather then just a quicky ls. *tongue in cheek* and what is ls but another tool for listing the contents of a directory :) Having said that, some things could be done now. If a flat package namespace is desirable, the existing package name clashes could be resolved by renaming the few packages that clash. >>> >>>74 packages, roughly, out of 9429 roughly. >> >>76/9295, which is not that bad, considering about half of them are >>emacs/xemacs related. > > 'cept either you, or someone else was proposing a totall
Re: [gentoo-dev] Re: New category proposal
On Wed, May 11, 2005 at 06:01:17PM +0900, Georgi Georgiev wrote: > maillog: 11/05/2005-03:40:04(-0500): Brian Harring types > > > On Wed, May 11, 2005 at 09:46:03AM +0200, Kevin F. Quinn wrote: > > > Here's my suggestion, for what it's worth :) > > > > > > The layout on disk and the semantics of categories do not need to be > > > related. > > Yes and no. You're assuming that people don't use the layout on disk for > > digging > > around without calling portage. Personally, I do. > > > > > I like the idea of using the first character of a package name as the > > > sub-directory name. This can be extended more deeply as and when > > > necessary to > > > avoid over-large directories which cause problems on some filesystems. > > > e.g. > > > for sudo you get "s/sudo" and vim-sudo "v/vim-sudo". This is > > > architecture-neutral, rsyncable, scalable, and not too difficult for > > > users to > > > parse manually (see later for searching through categories). If the > > > algorithm > > > portage would use to locate a package is such that it doesn't mandate the > > > depth > > > (i.e. tries "package", "p/package" if "p/" exists, "p/a/package" if > > > "p/a/" > > > exists) then overlays can have a different depth to the rsync tree; if > > > you only > > > have a few packages in overlay then they need not be in subdirectories at > > > all. > > Re-asserting that the fs layout *does* matter, how is that more intuitive > > when trying > > to track down the ebuild for dev-util/diffball ? > > > The fact that the directory where diffball is is easily deducable by its > name. As it is, I'd be a bit lost if I had to guess whether diffball is > in app-arch or dev-util. Even if I remembered it was something > dev-related I'd still be inclined to look in sys-devel. dev-util is accurate (it's a compressor, but a specialized variant, same as patch is). From it's current fs location/layout, we get thus- quick lookup on the atom, and inference of it's intentions. This is why we have xml at the category level, for example. One thing that's being unstated also- it's implicitly stated that this directory structure is somehow easier to look up a package. If you know the _exact_ package name, maybe. Otherwise, you're falling back to a search tool (which defeats to some degree the whole arguement for flattened namespace). Some quicky python, grouping by first char of the package name, and you wind up with (top 8)- 421, 521, 571, 582, 657, 663, 664, 746 Seperate directories within an individual directory. Say 'd' for example, and we'll pretend 746 is the count of packages that start with 'd'. That's a butload of directories to go digging in. The response would be, "well then extend it to the first two chars after the first dir". You narrow it down, but add another layer of dirs, again, for what gain? See, the thing I find odd about this thread/request is that essentially breaking it down to first letter groupping, is being argued as being _easier_ for people, while allowing multi cats, or just flat out dropping the category aspect. The example above, imo, proves otherwise. Keep in mind at this point, the discussion is whats easiest for _humans_. What's easiest for code/comp is another matter, and (within limits) can work with anything that's thrown at it. > > How many directories deep would I have to go before I reached the > > ebuild? > > Does it matter? You know the path exactly. It's p/portage. It's > not ... "was it sys-apps/portage or app-portage/portage"? I know the path as sys-apps/portage already though. Doesn't that count for something? :) Or, a bit more likely case, I know the type of the package, the category, but don't recall it's exact name. What y'all are proposing forces the user to use a tool, rather then just a quicky ls. > > > Having said that, some things could be done now. If a flat package > > > namespace > > > is desirable, the existing package name clashes could be resolved by > > > renaming > > > the few packages that clash. > > 74 packages, roughly, out of 9429 roughly. > > 76/9295, which is not that bad, considering about half of them are > emacs/xemacs related. 'cept either you, or someone else was proposing a totally flat namespace, no cats in the atoms. That means the count of changes (the 76 above is just # of conflicting packages) is around 19000, plus a fairly large amount of portage modifications. > > > Category could be added as a field in > > > metadata.xml, so that a package could "belong" to multiple categories. > > > The query/search tools could be enhanced to scan this metadata (perhaps > > > including > > > the current category directory as an implied entry in the metadata.xml). > > If that's the goal of the "belong to N categories" thread, strictly > > searching, > > sure, although I don't like it. It can't become an atom for *DEPEND due to > > the cpv > > nonconflicting bit. > > I personally want to see
Re: [gentoo-dev] Re: New category proposal
maillog: 11/05/2005-03:40:04(-0500): Brian Harring types > > On Wed, May 11, 2005 at 09:46:03AM +0200, Kevin F. Quinn wrote: > > Here's my suggestion, for what it's worth :) > > > > The layout on disk and the semantics of categories do not need to be > > related. > Yes and no. You're assuming that people don't use the layout on disk for > digging > around without calling portage. Personally, I do. > > > I like the idea of using the first character of a package name as the > > sub-directory name. This can be extended more deeply as and when necessary > > to > > avoid over-large directories which cause problems on some filesystems. > > e.g. > > for sudo you get "s/sudo" and vim-sudo "v/vim-sudo". This is > > architecture-neutral, rsyncable, scalable, and not too difficult for users > > to > > parse manually (see later for searching through categories). If the > > algorithm > > portage would use to locate a package is such that it doesn't mandate the > > depth > > (i.e. tries "package", "p/package" if "p/" exists, "p/a/package" if "p/a/" > > exists) then overlays can have a different depth to the rsync tree; if you > > only > > have a few packages in overlay then they need not be in subdirectories at > > all. > Re-asserting that the fs layout *does* matter, how is that more intuitive > when trying > to track down the ebuild for dev-util/diffball ? The fact that the directory where diffball is is easily deducable by its name. As it is, I'd be a bit lost if I had to guess whether diffball is in app-arch or dev-util. Even if I remembered it was something dev-related I'd still be inclined to look in sys-devel. As it is, for those who don't use a given package on an everyday basis the categories are extremely obscure. > How many directories deep would I have to go before I reached the > ebuild? Does it matter? You know the path exactly. It's p/portage. It's not ... "was it sys-apps/portage or app-portage/portage"? ... skip a bit ... > > Having said that, some things could be done now. If a flat package > > namespace > > is desirable, the existing package name clashes could be resolved by > > renaming > > the few packages that clash. > 74 packages, roughly, out of 9429 roughly. 76/9295, which is not that bad, considering about half of them are emacs/xemacs related. > > Category could be added as a field in > > metadata.xml, so that a package could "belong" to multiple categories. > > The query/search tools could be enhanced to scan this metadata (perhaps > > including > > the current category directory as an implied entry in the metadata.xml). > If that's the goal of the "belong to N categories" thread, strictly > searching, > sure, although I don't like it. It can't become an atom for *DEPEND due to > the cpv > nonconflicting bit. I personally want to see the category part *disappear* from the *DEPEND which is one of the reasons to advocate a flat tree. If the category (or part of it) goes in the package name, so be it, but at least there will be no package moves to break older ebuilds or outdated overlays. -- \Georgi Georgiev \ Taxes, n.: Of life's two certainties, the\ / [EMAIL PROTECTED]/ only one for which you can get an extension. / \ +81(90)2877-8845 \ \ -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
> On Wed, May 11, 2005 at 09:46:03AM +0200, Kevin F. Quinn wrote: > Here's my suggestion, for what it's worth :) > > The layout on disk and the semantics of categories do not need to be related. Yes and no. You're assuming that people don't use the layout on disk for digging around without calling portage. Personally, I do. > I like the idea of using the first character of a package name as the > sub-directory name. This can be extended more deeply as and when necessary > to > avoid over-large directories which cause problems on some filesystems. e.g. > for sudo you get "s/sudo" and vim-sudo "v/vim-sudo". This is > architecture-neutral, rsyncable, scalable, and not too difficult for users to > parse manually (see later for searching through categories). If the > algorithm > portage would use to locate a package is such that it doesn't mandate the > depth > (i.e. tries "package", "p/package" if "p/" exists, "p/a/package" if "p/a/" > exists) then overlays can have a different depth to the rsync tree; if you > only > have a few packages in overlay then they need not be in subdirectories at all. Re-asserting that the fs layout *does* matter, how is that more intuitive when trying to track down the ebuild for dev-util/diffball ? How many directories deep would I have to go before I reached the ebuild? The changes I posit aren't anymore friendly to devs doing ebuild work, and requires a flat namespace- no conflicts, meaning that we have to choose alternate names for conflicts (or the category data winds up in the name). Like I said, I really dislike debian's flat namespace, even if we had a category component to it. > The key here is to separate the category (metadata) and filesystem layout > (implementation detail) from the concept of package name. This opens up all > sorts of possibilities, for example different layouts in CVS, on mirrors and > on > clients (some kind of custom rsync would be necessary) - but that's going > perhaps too far... This also locks out several possibilities, like relying on dir structure to limit the searches. You force category classification to be metadata, you need an additional db to do searching, and basic atom lookup. That's 19000+ keys in a db. No db, and you force a tree wide search, which _will_ be as fast as emerge -S is. > Categories become metadata, formally (this is the root of the problem - > including the category in the package name is a pollution of the package > name). > Once they become properly understood and implemented as metadata, a package > being a member of more than one category is a natural consequence. Currently, the only conflicts that can occur in searches are package specific. Atoms, the basis of our depends system require categories; as such conflicts *cannot* occur. Multiple categories per package allows for conflicts to occur in our deps. This is nasty, and again, requires pretty much a walk of the whole tree to verify no conflicts (mr_bones_, aka michael sterret would probably quietly curl up and die when his repoman runs, which are now under an hour, clear 2 hours again) :) > Portage would essentially ignore categories. Some support would be necessary > to allow the user to query categories (since 'ls /usr/portage/' > would > no longer work) - but searching for packages is already a function and would > just need to be adapted (and perhaps optimised ;) ). Indeed just listing out > portage directories at the moment is often insufficient to find a suitable > package, since package names are often amusing but uninformative acronyms. Portage can't ignore categories, see the bit above about cat/pkg-ver (cpv from this point on) conflicts. cpvs can't conflict, pure and simple under the current layout, which is enforce by the single category/fs layout. What are we gaining? Ability to find a package under two categories? > The benefits include > 1) no more "moving packages around the tree" cpv conflict. You aren't moving the fs position of it, but it still requires walking the tree and updating all atom's that reference the old position. > 2) categories can be added to a package in the most natural way Elaborate. > 3) overlays can be tidier Eh? > well, it's a big downside... E'yep. :) > Having said that, some things could be done now. If a flat package namespace > is desirable, the existing package name clashes could be resolved by renaming > the few packages that clash. 74 packages, roughly, out of 9429 roughly. > Category could be added as a field in > metadata.xml, so that a package could "belong" to multiple categories. > The query/search tools could be enhanced to scan this metadata (perhaps > including > the current category directory as an implied entry in the metadata.xml). If that's the goal of the "belong to N categories" thread, strictly searching, sure, although I don't like it. It can't become an atom for *DEPEND due to the cpv nonconflict
Re: [gentoo-dev] Re: New category proposal
On Wed, May 11, 2005 at 07:09:20, Brian Harring wrote: > One thing that just clicked in the skull on why flat-tree has issues; > > currently it's possible to have a package with the same name, yet a > differing category (app-vim/sudo vs app-admin/sudo). Aa flat package namespace would necessitate renaming any existing package name clashes. The essential problem with categories is that you will always have confusion in some cases as to which category a package should be in; it's natural that some packages will make sense in more than one category. A good example of this problem with categories can be seen in the CDDB/FreeDB track listing database which does a similar category thing with category/album-hash (the problem is pretty acute there, not least because the probability of clashes in the album-hash is high, but also because people disagree whether their favourite album is new-age, folk, etc). Here's my suggestion, for what it's worth :) The layout on disk and the semantics of categories do not need to be related. I like the idea of using the first character of a package name as the sub-directory name. This can be extended more deeply as and when necessary to avoid over-large directories which cause problems on some filesystems. e.g. for sudo you get "s/sudo" and vim-sudo "v/vim-sudo". This is architecture-neutral, rsyncable, scalable, and not too difficult for users to parse manually (see later for searching through categories). If the algorithm portage would use to locate a package is such that it doesn't mandate the depth (i.e. tries "package", "p/package" if "p/" exists, "p/a/package" if "p/a/" exists) then overlays can have a different depth to the rsync tree; if you only have a few packages in overlay then they need not be in subdirectories at all. The key here is to separate the category (metadata) and filesystem layout (implementation detail) from the concept of package name. This opens up all sorts of possibilities, for example different layouts in CVS, on mirrors and on clients (some kind of custom rsync would be necessary) - but that's going perhaps too far... Categories become metadata, formally (this is the root of the problem - including the category in the package name is a pollution of the package name). Once they become properly understood and implemented as metadata, a package being a member of more than one category is a natural consequence. Whether category should be in the ebuild or in metadata.xml is a another issue. We already have some metadata in the ebuilds (e.g. DESCRIPTION). However that's really a separate debate; the question of whether all metadata that isn't version-dependent and doesn't impact the emerge process should be moved from ebuilds to metadata.xml... Portage would essentially ignore categories. Some support would be necessary to allow the user to query categories (since 'ls /usr/portage/' would no longer work) - but searching for packages is already a function and would just need to be adapted (and perhaps optimised ;) ). Indeed just listing out portage directories at the moment is often insufficient to find a suitable package, since package names are often amusing but uninformative acronyms. The real problem with implementing the above is the amount of work it would take to modify portage and the tree, for relatively little gain at the moment. I'm certainly not volunteering :) The benefits include 1) no more "moving packages around the tree" 2) categories can be added to a package in the most natural way 3) overlays can be tidier The downsides include 1) Massive upheaval in the portage tree 2) Massive upheaval in the portage tree 3) Massive upheaval in the portage tree well, it's a big downside... Having said that, some things could be done now. If a flat package namespace is desirable, the existing package name clashes could be resolved by renaming the few packages that clash. Category could be added as a field in metadata.xml, so that a package could "belong" to multiple categories. The query/search tools could be enhanced to scan this metadata (perhaps including the current category directory as an implied entry in the metadata.xml). Kev. -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
On Wed, May 11, 2005 at 01:27:46PM +0900, Georgi Georgiev wrote: > As to whether the categories are good or not... think about it. If they > were good, would we still be seeing packages moving around the tree? > That's why I think that multiple categories are a necessity. Unless of > course, packages stop getting moved around and Gentoo can gurantee that > all packages will stay at their current location. Keep in mind the tree is in constant flux, new packages added, packages removed, etc. Of course there will be a bit of reorganization, unless we add every possible category under the sun (even then, $10 on some weird esoteric category being requested shortly after such a change) ;) Point I'm getting at is that the need for a better groupping occurs depending on the packages being added. One thing that just clicked in the skull on why flat-tree has issues; currently it's possible to have a package with the same name, yet a differing category (app-vim/sudo vs app-admin/sudo). Since our tree layout is based upon category, if you tried shifting the focus of it to packages_in anyway_, you would explicitly disallow same name packages, different category. Doesn't matter how you structure the tree, if you do lookup into the tree based on package, not category, you disallow same named packages. > What about the Mozilla suite. What in the world is it doing in > www-client? After all, the Mozilla suite is > - a www browser (www-client) > - a mail client (mail-client) > - a calendar > - an html composer > - an irc client (net-irc) > > Might as well go to net-misc :-/ This is why I commented that there are exceptions, question is if the exceptions are annoying enough the level of change required, is implemented (I posit no, but that's cause I see issues w/ the resulting namespace, and am lazy). > - I hate the moves of packages between categories which causes enough > problems as it is. I also find the arguments of where to put what > pointless. Who cares if it is mail-client/mutt or net-mail/mutt as > long as it stays in one place and is accessible by its name "mutt". If > you think that mail-client is more descriptive than net-mail, The category labelling of it matters for those who go groking for an app to use, but don't know the possibilities. Example: "well, lets see what mail clients exist, and pick one of 'em for use based upon the description, since I've had it with my current mail client"... >then add > "keywords" (for those who hate the idea of multiple categories) to the > metadata of each package and let emerge -s search by keyword. Does > "mutt" not belong to net-mail? It does, but mail-client is better. > Still, that is no reason to remove its relation to "net-mail". Cache > the keyword information to make the search as fast as possible and > you're done with the searchability part. You can now safely forget > about this thing called "categories" as they become irrelevant, and > hopefully never move another package. > - I also hate being unable to find exactly the package that I need right > away. I want to check mutt's ebuild... cd /usr/portage/... what next? > Is it at the same place that I remember it was the last time I > checked? Do I *have* to know what category it belongs to? Of course I > can do "cd /usr/portage/*/mutt", but shell completion on the mutt part > won't work on this one. Mutt's not quite the example for the necessity > of completions, but it gets worse with longer names like > mozilla-firefox-bin. Re: yeah, it's fricking annoying, agreed. I'd state a faster tool is preferable, rather then a reorganization (imo). See above about why flattening the tree so it's package-centric rather then category-centric has issues. The consequence is that you have to start moving category specific metadata into the package name when valid conflicts occur- the sudo/vim example above, would require vim-sudo or vim-plugins-sudo. Debian does this, they (ab|)use a flattened namespace. I strongly dislike it, even compared to the consequences of our N category approach. Granted they lack (afaik) category data, but the consequences of flattening the namespace still stands imo. :) So... next possibility is doing the additional categories via extra metadata (xyz is *primarily* cat a, but also is cat b and c). Complaints over speed would easily triple if this was added; if you don't find a package within a (on disk dir) categories namespace, you have to walk the metadata for *all* ebuilds to verify that there isn't another package that has allegance to that category. Yes, this can be cached, but it is a pita and is added complexity (in other words, gains needs to offset this extra cost). > - Personal overlays. I think this a point that's clear enough. Gentoo > devs may have scripts that keep the tree in sync after the > loved-by-all move of a package, but that doesn't apply to us, mere > mortals. Got me th
Re: [gentoo-dev] Re: New category proposal
maillog: 10/05/2005-22:30:56(-0500): Brian Harring types > Re: having a package claimed by multiple categories... eh. yeah, > that's a bit valid although I'd think it's either A) an indiciation > our categories need to be adjusted a bit, or B) (hopefully) a rare > case. :) No, no, please not A). 8-O As to whether the categories are good or not... think about it. If they were good, would we still be seeing packages moving around the tree? That's why I think that multiple categories are a necessity. Unless of course, packages stop getting moved around and Gentoo can gurantee that all packages will stay at their current location. What about the Mozilla suite. What in the world is it doing in www-client? After all, the Mozilla suite is - a www browser (www-client) - a mail client (mail-client) - a calendar - an html composer - an irc client (net-irc) Might as well go to net-misc :-/ My personal reasons to start the topic about the flat tree: - I hate the moves of packages between categories which causes enough problems as it is. I also find the arguments of where to put what pointless. Who cares if it is mail-client/mutt or net-mail/mutt as long as it stays in one place and is accessible by its name "mutt". If you think that mail-client is more descriptive than net-mail, then add "keywords" (for those who hate the idea of multiple categories) to the metadata of each package and let emerge -s search by keyword. Does "mutt" not belong to net-mail? It does, but mail-client is better. Still, that is no reason to remove its relation to "net-mail". Cache the keyword information to make the search as fast as possible and you're done with the searchability part. You can now safely forget about this thing called "categories" as they become irrelevant, and hopefully never move another package. - I also hate being unable to find exactly the package that I need right away. I want to check mutt's ebuild... cd /usr/portage/... what next? Is it at the same place that I remember it was the last time I checked? Do I *have* to know what category it belongs to? Of course I can do "cd /usr/portage/*/mutt", but shell completion on the mutt part won't work on this one. Mutt's not quite the example for the necessity of completions, but it gets worse with longer names like mozilla-firefox-bin. - Personal overlays. I think this a point that's clear enough. Gentoo devs may have scripts that keep the tree in sync after the loved-by-all move of a package, but that doesn't apply to us, mere mortals. Disclaimer: I did not intend to be offensive even if at times I seem to. I was not being sarcastic either. -- /\ Georgi Georgiev /\ I'm not an Iranian!! I voted for Dianne /\ \/[EMAIL PROTECTED]\/ Feinstein!! \/ /\ +81(90)2877-8845 /\ /\ pgpXKIpVc1QIF.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
On Tue, May 10, 2005 at 08:04:04PM +0900, Georgi Georgiev wrote: > maillog: 10/05/2005-11:28:21(+0200): Martin Schlemmer types > > On Mon, 2005-05-09 at 13:07 -0400, Aron Griffis wrote: > > > Georgi Georgiev wrote:[Sun May 08 2005, 08:19:20PM EDT] > > > > Would it be inappropriate to start bitching (again) about a flat > > > > tree where each package can go in multiple categories? > > > > > > That's something I'd love to see eventually... I mean the flat tree, > > > not the complaining ;-) > > > > > > > Problem with flat tree, is the search times might then suck even more, > > as last I heard, too many dirs/files in one directory have a huge speed > > penalty. > > The flat tree does not imply a flat hierarchy on disk. Files and > directories can still be organized in a more optimized manner. For > example -- put each package in a directory of its first letter. Maybe > even two letters if you think that the winner 'g' with 736 packages is > too many. This would basically require a totally seperate rsync module for newer portage versions that would handle it. Or portage would have to support both, which (frankly) is something of a no go; it's too fricking ugly imo. Beyond that, drop the optimized manner in reference to speed; addressed below. Optimized for human readability? not so sure, digging through debian's directory structure to dig out certain files has always drove me slightly insane while doing so... > This is only true when the portage tree is stored on a filesystem. I > recall some effort being made in making portage support reading the > portage tree from a zipfile, so we may eventually see some other > backends that would not suffer from this problem. Down the line, viable (should be able to basically go nuts in terms of how you store the tree, locally, remotely, etc). Now? eh, tiz a ways off. Regarding speed comments about look up in the tree, frankly it's a bit minor imo. Initial installed package scan is a heavier hit (it's required for even an emerge --help). The bits in this thread about using xyz fs for the tree are trying to address the effects, not the cause of potentially slow cp_list/cp_all lookups; if the tree were marked as frozen (non-modifiable, iow users don't modify an ebuild here and there), and portage had frozen support (working on it), you could work directly from the cache instead, which would be a bit faster (at the very least, less syscalls, (open|close|read)dir, stat'ing, etc). The speed portion of the arg in other words, I don't think is valid. Better to focus on what benefits the poor human who has to go digging through the tree manually, then try to make portage go faster via it w/ dir structuring/underlying fs. Re: having a package claimed by multiple categories... eh. yeah, that's a bit valid although I'd think it's either A) an indiciation our categories need to be adjusted a bit, or B) (hopefully) a rare case. :) My 2 cents, at least. ~Brian -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
maillog: 10/05/2005-11:28:21(+0200): Martin Schlemmer types > On Mon, 2005-05-09 at 13:07 -0400, Aron Griffis wrote: > > Georgi Georgiev wrote: [Sun May 08 2005, 08:19:20PM EDT] > > > Would it be inappropriate to start bitching (again) about a flat > > > tree where each package can go in multiple categories? > > > > That's something I'd love to see eventually... I mean the flat tree, > > not the complaining ;-) > > > > Problem with flat tree, is the search times might then suck even more, > as last I heard, too many dirs/files in one directory have a huge speed > penalty. The flat tree does not imply a flat hierarchy on disk. Files and directories can still be organized in a more optimized manner. For example -- put each package in a directory of its first letter. Maybe even two letters if you think that the winner 'g' with 736 packages is too many. This is only true when the portage tree is stored on a filesystem. I recall some effort being made in making portage support reading the portage tree from a zipfile, so we may eventually see some other backends that would not suffer from this problem. If that's the only problem you're having with the flat tree, should I consider you a supporter? -- / Georgi Georgiev/ The Golden Rule of Arts and Sciences: He who / \ [EMAIL PROTECTED]\ has the gold makes the rules.\ / +81(90)2877-8845/ / pgpg1FVxNR1UO.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
On Mon, 2005-05-09 at 13:07 -0400, Aron Griffis wrote: > Georgi Georgiev wrote:[Sun May 08 2005, 08:19:20PM EDT] > > Would it be inappropriate to start bitching (again) about a flat > > tree where each package can go in multiple categories? > > That's something I'd love to see eventually... I mean the flat tree, > not the complaining ;-) > Problem with flat tree, is the search times might then suck even more, as last I heard, too many dirs/files in one directory have a huge speed penalty. -- Martin Schlemmer Gentoo Linux Developer, Desktop/System Team Developer Cape Town, South Africa signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] Re: New category proposal
On Mon, 2005-05-09 at 13:07 -0400, Aron Griffis wrote: > Georgi Georgiev wrote:[Sun May 08 2005, 08:19:20PM EDT] > > Would it be inappropriate to start bitching (again) about a flat > > tree where each package can go in multiple categories? > > That's something I'd love to see eventually... I mean the flat tree, > not the complaining ;-) > Problem with flat tree, is the search times might then suck even more, as last I heard, too many dirs/files in one directory have a huge speed penalty. -- Martin Schlemmer Gentoo Linux Developer, Desktop/System Team Developer Cape Town, South Africa signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] Re: New category proposal
On Mon, 2005-05-09 at 13:07 -0400, Aron Griffis wrote: > Georgi Georgiev wrote:[Sun May 08 2005, 08:19:20PM EDT] > > Would it be inappropriate to start bitching (again) about a flat > > tree where each package can go in multiple categories? > > That's something I'd love to see eventually... I mean the flat tree, > not the complaining ;-) > Problem with flat tree, is the search times might then suck even more, as last I heard, too many dirs/files in one directory have a huge speed penalty. -- Martin Schlemmer Gentoo Linux Developer, Desktop/System Team Developer Cape Town, South Africa signature.asc Description: This is a digitally signed message part
Re: [gentoo-dev] Re: New category proposal
Georgi Georgiev wrote: [Sun May 08 2005, 08:19:20PM EDT] > Would it be inappropriate to start bitching (again) about a flat > tree where each package can go in multiple categories? That's something I'd love to see eventually... I mean the flat tree, not the complaining ;-) Regards, Aron -- Aron Griffis Gentoo Linux Developer pgpCdAmHck0dN.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
maillog: 09/05/2005-01:50:04(+0200): Lars Weiler types > * Collins Richey <[EMAIL PROTECTED]> [05/05/08 17:01 -0600]: > > You could always borrow from the Germans and call it app-handy. > > Yeah! That's pure Denglisch :) > > And while we are on it, add all packages for presentations > into an "app-beamer" group ;-) > > Well, back on topic. Some of the suggested packages will > not work with GSM-phones only, but also with DECT-phones. > And if we include VoIP-Applications, they can finally get > into a better home than net-misc... app-telephony? > phone-mobile/phone-net? Would it be inappropriate to start bitching (again) about a flat tree where each package can go in multiple categories? -- (* Georgi Georgiev (* No small art is it to sleep: it is necessary (* *)[EMAIL PROTECTED]*) for that purpose to keep awake all day. -- *) (* +81(90)2877-8845 (* Nietzsche(* pgpy1KbtAQsQz.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
On Sunday 08 May 2005 04:46 pm, Alin Nastac wrote: > R Hill wrote: > > this doesn't include anything like VOIP of course. btw i think > > "cellphone" is an Americanism. i worked for AT&T Wireless before they > > were bought by Cingular and the term "cellphone" was discouraged for > > that reason. maybe just app-phone? > > hmm... I think it should include "cell" or "mobile" in one way or the > other - "phone" is just too generic. app-mobile sounds good to me ... then just use metadata.xml to include a 'fuller' description :P -mike -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
* Collins Richey <[EMAIL PROTECTED]> [05/05/08 17:01 -0600]: > You could always borrow from the Germans and call it app-handy. Yeah! That's pure Denglisch :) And while we are on it, add all packages for presentations into an "app-beamer" group ;-) Well, back on topic. Some of the suggested packages will not work with GSM-phones only, but also with DECT-phones. And if we include VoIP-Applications, they can finally get into a better home than net-misc... app-telephony? phone-mobile/phone-net? Regards, Lars -- Lars Weiler <[EMAIL PROTECTED]> +49-171-1963258 Gentoo Linux PowerPC: Manager and Release Engineer Gentoo Infrastructure : CVS Administrator Gentoo Public Relations : Assistance for Europe pgp24ZR8Dtr0S.pgp Description: PGP signature
Re: [gentoo-dev] Re: New category proposal
On 5/8/05, W.Kenworthy <[EMAIL PROTECTED]> wrote: > In Oz, cellphone is only used in american movies, here they are called > "mobile phones" (formal), "mobiles" (common usage) and "mob" when > written (e.g., Mob: 0419...) > > There's also the upcoming "cell" processor architecture that may clash > in the future. > > How about app-mobphone or app-mobilephone or perhaps > app-mobilephoneutils or some variant of? > You could always borrow from the Germans and call it app-handy. -- Collins When I saw the Iraqi people voting three weeks ago, 8 million of them, it was the start of a new Arab world The Berlin Wall has fallen. - Lebanese Druze leader Walid Jumblatt -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
In Oz, cellphone is only used in american movies, here they are called "mobile phones" (formal), "mobiles" (common usage) and "mob" when written (e.g., Mob: 0419...) There's also the upcoming "cell" processor architecture that may clash in the future. How about app-mobphone or app-mobilephone or perhaps app-mobilephoneutils or some variant of? BillK On Sun, 2005-05-08 at 11:57 -0600, R Hill wrote: > Alin Nastac wrote: > > Hi folks, > > > > I think we should make a new category called app-cellphone containing > > the following packages: > > net-dialup/gammu > > net-dialup/gnokii > > net-dialup/wammu > > net-wireless/gnome-phone-manager > > > > Yes, I know. It is a short list, but shouldn't be a category > > representative for its content? > > net-wireless/obexftp > net-misc/sms > net-misc/linuxsms > net-misc/ksms > net-misc/gsmlib > net-misc/esms > media-sound/bemused > media-plugins/xmms-btexmms > kde-misc/kmobiletools } > kde-base/kandy} these could probably stay in kde > app-pda/x70talk > app-pda/bitpim > app-misc/ringtonetools > > this doesn't include anything like VOIP of course. btw i think > "cellphone" is an Americanism. i worked for AT&T Wireless before they > were bought by Cingular and the term "cellphone" was discouraged for > that reason. maybe just app-phone? > > --de. > -- gentoo-dev@gentoo.org mailing list
Re: [gentoo-dev] Re: New category proposal
R Hill wrote: > > > this doesn't include anything like VOIP of course. btw i think > "cellphone" is an Americanism. i worked for AT&T Wireless before they > were bought by Cingular and the term "cellphone" was discouraged for > that reason. maybe just app-phone? hmm... I think it should include "cell" or "mobile" in one way or the other - "phone" is just too generic. I am not a native English speaker (duh, what a surprise :)), so I'm open to suggestions. signature.asc Description: OpenPGP digital signature
[gentoo-dev] Re: New category proposal
Alin Nastac wrote: Hi folks, I think we should make a new category called app-cellphone containing the following packages: net-dialup/gammu net-dialup/gnokii net-dialup/wammu net-wireless/gnome-phone-manager Yes, I know. It is a short list, but shouldn't be a category representative for its content? net-wireless/obexftp net-misc/sms net-misc/linuxsms net-misc/ksms net-misc/gsmlib net-misc/esms media-sound/bemused media-plugins/xmms-btexmms kde-misc/kmobiletools } kde-base/kandy } these could probably stay in kde app-pda/x70talk app-pda/bitpim app-misc/ringtonetools this doesn't include anything like VOIP of course. btw i think "cellphone" is an Americanism. i worked for AT&T Wireless before they were bought by Cingular and the term "cellphone" was discouraged for that reason. maybe just app-phone? --de. -- gentoo-dev@gentoo.org mailing list