Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-25 Thread Kent Fredric
On 25 June 2011 00:57, Nathan Phillip Brink bi...@gentoo.org wrote:
 On Wed, Jun 22, 2011 at 08:57:47PM -0400, Wyatt Epp wrote:

 cat
 tagmedia/tag
 tagvideo/tag
 tagkde/tag
 tageditors/tag
 /cat


I'm strongly of the mind that by making the tag system arbitrarily
flat, you might be prematurely limiting yourself, as well as risking a
future where the tag index is a sea of meaningless words.

Tags in my mind, should be grouped by the sort of information they are
trying to convey, as opposed to being arbitrary and completely
un-grouped.

The present category system only has one namespace, which is more or
less what-you-use-it-for, and if your tag system is likewise going
to take that vector as the only approach, you will ultimately end up
duplicating the category system, albeit without the present limitation
that means one package can only exist in one place.

This need not be the case, we can suggest alternative tag namespaces,
such as : The sorts of files it supports working with, the sorts of
things it can read, the sorts of things it can write.

At present, things that migrate one type of media to another, such as
pdf - image , image - pdf, image - video , video - images , etc
have to be forced to a sort of useless categorisation system.

However, if via tag data, we were able to annotate a) what can be
written and b) what can be read, this system could be leveraged to
epic proportions of win.

   tag-lookup --supporting $( file ./foo );
   
   Read/Write:
   foobarnator - Blah blah blah
   Read:
   foo-bar   - Blah blah
   foo-bjaz - Blah blah blah
  Write:
   a2foo - Blah Blah


   tag-lookup --verbose --supporting $( file ./foo );
   
   Read/Write:
   foobarnator - Blah blah blah
- reads x , y , z , foo
- writes a, b, c, foo
   Read:
   foo-bar   - Blah blah
- reads foo
- writes text
   foo-bjaz - Blah blah blah
- reads foo, bar
- writes text, mp3
  Write:
   a2foo - Blah Blah
  -reads mp3, png, jpeg
  -writes foo


As a side note, it may be beneficial to tag a package version
specifically for some of the above mentioned features. Especially if
you wish to support my provides-binary suggestion, because the
shipped binary may change from one version/slot to another.

I'm not sure if there's a way to provide data on a per-version level
yet in Metadata.xml, but I am assuming there's not as I don't see it
documented.

pkgmetadata
 versionspecific
 slot2/slot
 pkgmetadata
 ... normal stuff
 /pkgmetadata
/versionspecific
 versionspecific
 maxversion1.0/maxversion
 maxversion1.999/maxversion
 pkgmetadata
 ... normal stuff
 /pkgmetadata
/versionspecific
/pkgmetadata


Or something similar.



-- 
Kent

perl -e  print substr( \edrgmaM  SPA NOcomil.ic\\@tfrken\, \$_ * 3,
3 ) for ( 9,8,0,7,1,6,5,4,3,2 );

http://kent-fredric.fox.geek.nz



Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-25 Thread Wyatt Epp
On Sat, Jun 25, 2011 at 02:49, Kent Fredric kentfred...@gmail.com wrote:
 I'm strongly of the mind that by making the tag system arbitrarily
 flat, you might be prematurely limiting yourself, as well as risking a
 future where the tag index is a sea of meaningless words.

 Tags in my mind, should be grouped by the sort of information they are
 trying to convey, as opposed to being arbitrary and completely
 un-grouped.

 The present category system only has one namespace, which is more or
 less what-you-use-it-for, and if your tag system is likewise going
 to take that vector as the only approach, you will ultimately end up
 duplicating the category system, albeit without the present limitation
 that means one package can only exist in one place.

 This need not be the case, we can suggest alternative tag namespaces,
 such as : The sorts of files it supports working with, the sorts of
 things it can read, the sorts of things it can write.

 At present, things that migrate one type of media to another, such as
 pdf - image , image - pdf, image - video , video - images , etc
 have to be forced to a sort of useless categorisation system.

 However, if via tag data, we were able to annotate a) what can be
 written and b) what can be read, this system could be leveraged to
 epic proportions of win.

Okay, apologies in advance for my long-windedness.  I hope this all
makes sense to everyone.

I should probably clarify that cloying strictly to flatness is not
what I'm proposing.  Reality has borne out the need for implications
and aliases in sanitising an unruly dataset with a complex
user-generated index, while arbitrary democratised group building has
improved some aspects of discovery.  However, I would consider these
features to be a lower priority than having a system at all.

So to break it down:
Tags - a concise vocabulary used for search.  In their default state
they are untyped and non-hierarchical.  They identify traits of a
package.  Suggest using lower-case and simple, descriptive naming
conventions. Highest priority.
Example: alien {{converter nogui package_management reads_tgz
reads_rpm reads_pkg reads_slp reads_lsb writes_tgz writes_rpm
writes_pkg writes_slp writes_lsb}}

Alias - a relationship between two tags establishing equivalence.
Query of the left term returns results of the right.  This type of
relationship helps reduce dictionary clutter. Low priority.
Example: sound = audio.  Attempting to add sound to a package will
instead add audio and searches for sound will return the results for
audio.

Implication - a relationship between two tags where the presence of
the left term necessarily requires the right.  This relationship
reduces menial work.  Low priority.
Example: mpd - audio.  Adding mpd to the package will also add audio.

Kent, your idea is pretty interesting and I rather like it.
Fortunately, it's completely possible within the context of the basic
flat layout, as I outlined with Alien above.  It probably looks ugly
to you-- this is no illusion; it's pretty ugly.  But it also grants us
the flexibility to get a basic system in place quickly and without a
lot of hassle.  We get 90% of the benefit up front, and can extend it
as necessary.

Unfortunately for real hierarchical methods, people still have
difficulty with even simple metadata systems.  Fetch some MP3s off the
internet and check their tags or look at search engine queries and
you'll find an entire class of people hampered by what is currently a
largely alien art.  In the end, this system needs to be usable by
people and by keeping it primarily flat, we ease the conceptual
overhead of its implementation and its use.  If it can't be
implemented on itch-scratching timescales, we have failed.  If people
can't use it with very little learning curve, we have failed.

A word on vocabulary:
As you've no doubt noticed, there seems to be a degree of combinatoric
explosion of tags in the method I propose.  In practical use, it's not
as bad as it looks.  For Gentoo, I'd recommend a basic canonical
list of general tags based on the current category system (subject to
discussion and addition/subtraction) and incorporate suggestions like
Kent's as they come up.  It's okay to control the vocabulary.  What
you find is that after the initial implementation, it grows fairly
slowly. (Even with reads_* and writes_* the number will probably be
south of 500 tags for a long time; the current categories dissolve
into about 175 tags from what I can see.)

Regards,
Wyatt



Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-24 Thread Nathan Phillip Brink
On Wed, Jun 22, 2011 at 08:57:47PM -0400, Wyatt Epp wrote:
 Tags are basically keywords you can use to describe packages, allowing
 you to easily search and explore your options based on what the
 packages actually does (if we want to get technical, anything that
 identifies a package is a sort of tag: name, version, license, set,
 checksum, etc.). ??It's just a vocabulary that eases the burden of
 human lookup. ??The categories we have now are essentially (pairs of)
 tags tied to a treelike structure in an actual filesystem, and I'd
 wager that's a decent place to start, too-- probably the most
 prominent problem I can see with the current method comes from these
 edge cases where one category is obviously not enough. ??The obvious
 solution is probably to just stick our semantic metadata into the
 metadata.xml. ??So for...say, media-video/kdenlive,
 catmedia-video/cat[1] becomes more like this:
 
 cat
 tagmedia/tag
 tagvideo/tag
 tagkde/tag
 tageditors/tag
 /cat

I'm going to just interpret this as a suggestion for a modification to
metadata.xml ;-). Could this not just be:

  tags
tagkde/tag
tageditors/tag
  /tags

Then in the category's metadata.xml, at media-video/metdata.xml, you
can fill in the rest:

  tags
tagmedia/tag
tagvideo/tag
  /tags

It would be nice to take advantage of the existing categories in
Gentoo instead of having to duplicate all of this information over and
over -- if this is to be done with metadata.xml.

-- 
binki

Look out for missing or extraneous apostrophes!


pgpLKnm149XfM.pgp
Description: PGP signature


Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-22 Thread Ciaran McCreesh
On Wed, 22 Jun 2011 21:55:18 +1200
Kent Fredric kentfred...@gmail.com wrote:
 I'd love a tag solution, that'd be nice, is there a GLEP for it yet?
 And if so, how long will it take to get this tag feature supported
 by EAPI standards?

The slow parts are coming up with a good design, getting the Council to
approve it, and getting Portage to implement it. The fast part is
getting the PMS bit done.

The problem with tags is that all we've heard so far is we should have
tags!, with no description of what tags are, what they'll solve or how
they're used.

-- 
Ciaran McCreesh


signature.asc
Description: PGP signature


Re: [gentoo-dev] Tags (Was: RFC: split up media-sound/ category)

2011-06-22 Thread Wyatt Epp
On Wed, Jun 22, 2011 at 14:19, Ciaran McCreesh
ciaran.mccre...@googlemail.com wrote:
 On Wed, 22 Jun 2011 21:55:18 +1200
 Kent Fredric kentfred...@gmail.com wrote:
 I'd love a tag solution, that'd be nice, is there a GLEP for it yet?
 And if so, how long will it take to get this tag feature supported
 by EAPI standards?

 The slow parts are coming up with a good design, getting the Council to
 approve it, and getting Portage to implement it. The fast part is
 getting the PMS bit done.

 The problem with tags is that all we've heard so far is we should have
 tags!, with no description of what tags are, what they'll solve or how
 they're used.

 --
 Ciaran McCreesh


Tags are basically keywords you can use to describe packages, allowing
you to easily search and explore your options based on what the
packages actually does (if we want to get technical, anything that
identifies a package is a sort of tag: name, version, license, set,
checksum, etc.).  It's just a vocabulary that eases the burden of
human lookup.  The categories we have now are essentially (pairs of)
tags tied to a treelike structure in an actual filesystem, and I'd
wager that's a decent place to start, too-- probably the most
prominent problem I can see with the current method comes from these
edge cases where one category is obviously not enough.  The obvious
solution is probably to just stick our semantic metadata into the
metadata.xml.  So for...say, media-video/kdenlive,
catmedia-video/cat[1] becomes more like this:

cat
tagmedia/tag
tagvideo/tag
tagkde/tag
tageditors/tag
/cat

The canonical tag list needn't even expand beyond what we have already
(for the time being; attempting to keep your vocabulary entirely
static is a Bad Thing.  Humans are amazing at finding new things that
need tagging.  Getting ahead of myself, though).

In the practical sense, we can probably just whip out a quick script
and get 98% coverage; package maintainers should be encouraged to add
relevant tags to the packages under their care as needed.

--Wyatt, hoping this text is plain as it says it is.  Sorry if it's not.

[1] Let's just assume for the sake of argument that kdenlive actually
has a cat field in its metadata file.