Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
On 25 June 2011 00:57, Nathan Phillip Brink bi...@gentoo.org wrote: On Wed, Jun 22, 2011 at 08:57:47PM -0400, Wyatt Epp wrote: cat tagmedia/tag tagvideo/tag tagkde/tag tageditors/tag /cat I'm strongly of the mind that by making the tag system arbitrarily flat, you might be prematurely limiting yourself, as well as risking a future where the tag index is a sea of meaningless words. Tags in my mind, should be grouped by the sort of information they are trying to convey, as opposed to being arbitrary and completely un-grouped. The present category system only has one namespace, which is more or less what-you-use-it-for, and if your tag system is likewise going to take that vector as the only approach, you will ultimately end up duplicating the category system, albeit without the present limitation that means one package can only exist in one place. This need not be the case, we can suggest alternative tag namespaces, such as : The sorts of files it supports working with, the sorts of things it can read, the sorts of things it can write. At present, things that migrate one type of media to another, such as pdf - image , image - pdf, image - video , video - images , etc have to be forced to a sort of useless categorisation system. However, if via tag data, we were able to annotate a) what can be written and b) what can be read, this system could be leveraged to epic proportions of win. tag-lookup --supporting $( file ./foo ); Read/Write: foobarnator - Blah blah blah Read: foo-bar - Blah blah foo-bjaz - Blah blah blah Write: a2foo - Blah Blah tag-lookup --verbose --supporting $( file ./foo ); Read/Write: foobarnator - Blah blah blah - reads x , y , z , foo - writes a, b, c, foo Read: foo-bar - Blah blah - reads foo - writes text foo-bjaz - Blah blah blah - reads foo, bar - writes text, mp3 Write: a2foo - Blah Blah -reads mp3, png, jpeg -writes foo As a side note, it may be beneficial to tag a package version specifically for some of the above mentioned features. Especially if you wish to support my provides-binary suggestion, because the shipped binary may change from one version/slot to another. I'm not sure if there's a way to provide data on a per-version level yet in Metadata.xml, but I am assuming there's not as I don't see it documented. pkgmetadata versionspecific slot2/slot pkgmetadata ... normal stuff /pkgmetadata /versionspecific versionspecific maxversion1.0/maxversion maxversion1.999/maxversion pkgmetadata ... normal stuff /pkgmetadata /versionspecific /pkgmetadata Or something similar. -- Kent perl -e print substr( \edrgmaM SPA NOcomil.ic\\@tfrken\, \$_ * 3, 3 ) for ( 9,8,0,7,1,6,5,4,3,2 ); http://kent-fredric.fox.geek.nz
Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
On Sat, Jun 25, 2011 at 02:49, Kent Fredric kentfred...@gmail.com wrote: I'm strongly of the mind that by making the tag system arbitrarily flat, you might be prematurely limiting yourself, as well as risking a future where the tag index is a sea of meaningless words. Tags in my mind, should be grouped by the sort of information they are trying to convey, as opposed to being arbitrary and completely un-grouped. The present category system only has one namespace, which is more or less what-you-use-it-for, and if your tag system is likewise going to take that vector as the only approach, you will ultimately end up duplicating the category system, albeit without the present limitation that means one package can only exist in one place. This need not be the case, we can suggest alternative tag namespaces, such as : The sorts of files it supports working with, the sorts of things it can read, the sorts of things it can write. At present, things that migrate one type of media to another, such as pdf - image , image - pdf, image - video , video - images , etc have to be forced to a sort of useless categorisation system. However, if via tag data, we were able to annotate a) what can be written and b) what can be read, this system could be leveraged to epic proportions of win. Okay, apologies in advance for my long-windedness. I hope this all makes sense to everyone. I should probably clarify that cloying strictly to flatness is not what I'm proposing. Reality has borne out the need for implications and aliases in sanitising an unruly dataset with a complex user-generated index, while arbitrary democratised group building has improved some aspects of discovery. However, I would consider these features to be a lower priority than having a system at all. So to break it down: Tags - a concise vocabulary used for search. In their default state they are untyped and non-hierarchical. They identify traits of a package. Suggest using lower-case and simple, descriptive naming conventions. Highest priority. Example: alien {{converter nogui package_management reads_tgz reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm writes_pkg writes_slp writes_lsb}} Alias - a relationship between two tags establishing equivalence. Query of the left term returns results of the right. This type of relationship helps reduce dictionary clutter. Low priority. Example: sound = audio. Attempting to add sound to a package will instead add audio and searches for sound will return the results for audio. Implication - a relationship between two tags where the presence of the left term necessarily requires the right. This relationship reduces menial work. Low priority. Example: mpd - audio. Adding mpd to the package will also add audio. Kent, your idea is pretty interesting and I rather like it. Fortunately, it's completely possible within the context of the basic flat layout, as I outlined with Alien above. It probably looks ugly to you-- this is no illusion; it's pretty ugly. But it also grants us the flexibility to get a basic system in place quickly and without a lot of hassle. We get 90% of the benefit up front, and can extend it as necessary. Unfortunately for real hierarchical methods, people still have difficulty with even simple metadata systems. Fetch some MP3s off the internet and check their tags or look at search engine queries and you'll find an entire class of people hampered by what is currently a largely alien art. In the end, this system needs to be usable by people and by keeping it primarily flat, we ease the conceptual overhead of its implementation and its use. If it can't be implemented on itch-scratching timescales, we have failed. If people can't use it with very little learning curve, we have failed. A word on vocabulary: As you've no doubt noticed, there seems to be a degree of combinatoric explosion of tags in the method I propose. In practical use, it's not as bad as it looks. For Gentoo, I'd recommend a basic canonical list of general tags based on the current category system (subject to discussion and addition/subtraction) and incorporate suggestions like Kent's as they come up. It's okay to control the vocabulary. What you find is that after the initial implementation, it grows fairly slowly. (Even with reads_* and writes_* the number will probably be south of 500 tags for a long time; the current categories dissolve into about 175 tags from what I can see.) Regards, Wyatt
Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
On Wed, Jun 22, 2011 at 08:57:47PM -0400, Wyatt Epp wrote: Tags are basically keywords you can use to describe packages, allowing you to easily search and explore your options based on what the packages actually does (if we want to get technical, anything that identifies a package is a sort of tag: name, version, license, set, checksum, etc.). ??It's just a vocabulary that eases the burden of human lookup. ??The categories we have now are essentially (pairs of) tags tied to a treelike structure in an actual filesystem, and I'd wager that's a decent place to start, too-- probably the most prominent problem I can see with the current method comes from these edge cases where one category is obviously not enough. ??The obvious solution is probably to just stick our semantic metadata into the metadata.xml. ??So for...say, media-video/kdenlive, catmedia-video/cat[1] becomes more like this: cat tagmedia/tag tagvideo/tag tagkde/tag tageditors/tag /cat I'm going to just interpret this as a suggestion for a modification to metadata.xml ;-). Could this not just be: tags tagkde/tag tageditors/tag /tags Then in the category's metadata.xml, at media-video/metdata.xml, you can fill in the rest: tags tagmedia/tag tagvideo/tag /tags It would be nice to take advantage of the existing categories in Gentoo instead of having to duplicate all of this information over and over -- if this is to be done with metadata.xml. -- binki Look out for missing or extraneous apostrophes! pgpLKnm149XfM.pgp Description: PGP signature
Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
On Wed, 22 Jun 2011 21:55:18 +1200 Kent Fredric kentfred...@gmail.com wrote: I'd love a tag solution, that'd be nice, is there a GLEP for it yet? And if so, how long will it take to get this tag feature supported by EAPI standards? The slow parts are coming up with a good design, getting the Council to approve it, and getting Portage to implement it. The fast part is getting the PMS bit done. The problem with tags is that all we've heard so far is we should have tags!, with no description of what tags are, what they'll solve or how they're used. -- Ciaran McCreesh signature.asc Description: PGP signature
Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)
On Wed, Jun 22, 2011 at 14:19, Ciaran McCreesh ciaran.mccre...@googlemail.com wrote: On Wed, 22 Jun 2011 21:55:18 +1200 Kent Fredric kentfred...@gmail.com wrote: I'd love a tag solution, that'd be nice, is there a GLEP for it yet? And if so, how long will it take to get this tag feature supported by EAPI standards? The slow parts are coming up with a good design, getting the Council to approve it, and getting Portage to implement it. The fast part is getting the PMS bit done. The problem with tags is that all we've heard so far is we should have tags!, with no description of what tags are, what they'll solve or how they're used. -- Ciaran McCreesh Tags are basically keywords you can use to describe packages, allowing you to easily search and explore your options based on what the packages actually does (if we want to get technical, anything that identifies a package is a sort of tag: name, version, license, set, checksum, etc.). It's just a vocabulary that eases the burden of human lookup. The categories we have now are essentially (pairs of) tags tied to a treelike structure in an actual filesystem, and I'd wager that's a decent place to start, too-- probably the most prominent problem I can see with the current method comes from these edge cases where one category is obviously not enough. The obvious solution is probably to just stick our semantic metadata into the metadata.xml. So for...say, media-video/kdenlive, catmedia-video/cat[1] becomes more like this: cat tagmedia/tag tagvideo/tag tagkde/tag tageditors/tag /cat The canonical tag list needn't even expand beyond what we have already (for the time being; attempting to keep your vocabulary entirely static is a Bad Thing. Humans are amazing at finding new things that need tagging. Getting ahead of myself, though). In the practical sense, we can probably just whip out a quick script and get 98% coverage; package maintainers should be encouraged to add relevant tags to the packages under their care as needed. --Wyatt, hoping this text is plain as it says it is. Sorry if it's not. [1] Let's just assume for the sake of argument that kdenlive actually has a cat field in its metadata file.