Re: What is the best place for package meta-data ?

2009-12-17 Thread Olivier Berger
Le mercredi 16 décembre 2009 à 10:14 +0900, Charles Plessy a écrit :

 Dear Guillem and Olivier,
 
 yes, I have been pointed DOAP (and PackageMap) on the debian-qa and
 debian-mentors mailing lists. I have spent a couple of hours this week reading
 things about “Semantic web” and related things. My conclusion is that the
 languages for linking concepts that are formalised in RDF files (XML, Notation
 3, Turtle, N-triples, …), are too complex compared to simple YAML files.
 However, if we consider the DOAP as a simple list of keywords on which to
 standardise, then I can do my best to stick to them as far as possible.

RDF+XML may be too complex and verbose and probably has many other
aspects that can be criticized, but it is nevertheless a standard that's
the only one so far that helps contruct the Semantic Web (unless using
other forms of RDF, like RDFa)... so do whatever you want, but if you
limit yourself to custom ad-hoc local formats, and don't use standards,
you'll limit the potential reuse of what you did for so-far unexpected
applications.

It's up to you to eventually think beyond current needs, and then open
the door for others to build on top of what you did, on the Semantic
Web ;)

Best regards,

-- 
Olivier BERGER olivier.ber...@it-sudparis.eu
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-12-15 Thread Charles Plessy
Le Mon, Dec 14, 2009 at 12:37:20PM +0100, Guillem Jover a écrit :
 
 Given that this is supposed to be upstream-only information, are you
 aware of DOAP [0]? It seems to me it would be better to reuse already
 existing infrastructure than to create yet a new one, that only a
 subset of Debian might end up using. This could also be submitted
 upstream I think.
 
 It's XML though, but it could be easily transformed to any output format
 we'd want to use, say control-style for example, which most of our tools
 already handle, if need be.

Dear Guillem and Olivier,

yes, I have been pointed DOAP (and PackageMap) on the debian-qa and
debian-mentors mailing lists. I have spent a couple of hours this week reading
things about “Semantic web” and related things. My conclusion is that the
languages for linking concepts that are formalised in RDF files (XML, Notation
3, Turtle, N-triples, …), are too complex compared to simple YAML files.
However, if we consider the DOAP as a simple list of keywords on which to
standardise, then I can do my best to stick to them as far as possible.

This would allow to do the reverse of what you propose: it would ease the
translation of the metadata we collect from a simple YAML format (which is very
similar to Debian ‘paragraph’ control files) to XML, if there would be a
volunteer to do so.

Let's finish by an example: how to declare a homepage.

In YAML
---

Homepage: http://toto.example.com


In XML
--

rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#;
xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#;
xmlns:foaf=http://xmlns.com/foaf/0.1/;
xmlns=http://usefulinc.com/ns/doap#;

Project
homepage rdf:resource=http://toto.example.com; /
/Project

/rdf:RDF


I hope that it demonstrates the case that if we want the package maintainers
themselves to enter the information (which is what I propose), then the
complexity of RDF is a strong barrier to adoption.

In summary: I will try to use the same keywords as DOAP, in order to keep a
door open, but I think that using something as complex as RDF is prematurate.

Have a nice day,

-- 
Charles Plessy
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-12-14 Thread Guillem Jover
Hi!

On Sun, 2009-08-02 at 18:47:00 +0900, Charles Plessy wrote:
 In the Debian Med and Science teams, we are looking for efficient ways to
 document slow-changing metadata relevant to our packages, in particular:

 Alternatives will be much easier to build if we manage to centralise the
 information in a common place. This could be in the source packages
 themselves, either in a dedicated file or in debian/control (but not
 necessarly ending in the Packages and Sources files), or in the file we
 use to create our metapackages. Ultimately we would like to be able to
 have this information flow in places like the Ultimate Debian Database
 and the web pages proposing the packages for download.

Given that this is supposed to be upstream-only information, are you
aware of DOAP [0]? It seems to me it would be better to reuse already
existing infrastructure than to create yet a new one, that only a
subset of Debian might end up using. This could also be submitted
upstream I think.

It's XML though, but it could be easily transformed to any output format
we'd want to use, say control-style for example, which most of our tools
already handle, if need be.

[0] http://trac.usefulinc.com/doap
http://www.oss-watch.ac.uk/resources/doap.xml

regards,
guillem


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-12-14 Thread Olivier Berger
Le lundi 14 décembre 2009 à 12:37 +0100, Guillem Jover a écrit :
 Hi!
 
 On Sun, 2009-08-02 at 18:47:00 +0900, Charles Plessy wrote:
  In the Debian Med and Science teams, we are looking for efficient ways to
  document slow-changing metadata relevant to our packages, in particular:
 
  Alternatives will be much easier to build if we manage to centralise the
  information in a common place. This could be in the source packages
  themselves, either in a dedicated file or in debian/control (but not
  necessarly ending in the Packages and Sources files), or in the file we
  use to create our metapackages. Ultimately we would like to be able to
  have this information flow in places like the Ultimate Debian Database
  and the web pages proposing the packages for download.
 
 Given that this is supposed to be upstream-only information, are you
 aware of DOAP [0]? It seems to me it would be better to reuse already
 existing infrastructure than to create yet a new one, that only a
 subset of Debian might end up using. This could also be submitted
 upstream I think.
 
 It's XML though, but it could be easily transformed to any output format
 we'd want to use, say control-style for example, which most of our tools
 already handle, if need be.
 
 [0] http://trac.usefulinc.com/doap
 http://www.oss-watch.ac.uk/resources/doap.xml
 

+1 for the use of RDF and appropriate ontologies, like DOAP, FOAF and
other like scientific publication related, as already discussed
previously on -da list IIRC.

My 2 cents,

-- 
Olivier BERGER olivier.ber...@it-sudparis.eu
http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC
Ingénieur Recherche - Dept INF
Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France)


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-08 Thread Charles Plessy
Le Thu, Aug 06, 2009 at 09:17:22PM +0200, Paul Wise a écrit :
 
 I'd put the homepage in a user category and the VCS URLs in a
 developer category.
 
 The data in that database is gathered from .changes files and binary
 and source packages uploaded to ftp-master, except for debtags and
 translated descriptions (IIRC, not sure how those get in). Re-using
 that workflow for meta-data updates, say, by uploading metadata
 updates in .changes files instead of full packages could be useful.

Interesting idea…

I was about to propose to put all upstream-related metadata in a YAML-encoded
file, but if a .changes file is to be generated, the Debian control format may
be preferrable. Anyway, this would only make a difference if there were
multi-line field contents. Here is an example:

aqwa『debian-med』$ cat samtools/debian/upstream-metadata.yaml 
DOI: 10.1093/bioinformatics/btp352
Homepage: http://samtools.sourceforge.net
PMID: 19505943
Reference: |
 @article{HengLi06082009,
 author = {Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and 
Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, 
Richard and 1000 Genome Project Data Processing Subgroup,  },
 title = {{The Sequence Alignment/Map (SAM) Format and SAMtools}},
 journal = {Bioinformatics},
 volume = {},
 number = {},
 pages = {btp352},
 doi = {10.1093/bioinformatics/btp352},
 year = {2009},
 URL = {http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp352v1},
 eprint = {http://bioinformatics.oxfordjournals.org/cgi/reprint/btp352v1.pdf}
 }
Repository: https://samtools.svn.sourceforge.net/svnroot/samtools

The advantage of yaml format is that it is trivial to parse using existing 
libraries:

aqwa『debian-med』$ perl -MYAML -e '$/=; my($fields) = Load(STDIN);  print 
$fields-{'DOI'}'  samtools/debian/upstream-metadata.yaml 
10.1093/bioinformatics/btp352

I am unsure if it is a good idea to manage multi-line upstream meta-data
anyway. Are there other opinions on this ?

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-06 Thread Andreas Tille
On Wed, Aug 05, 2009 at 07:47:25PM +0900, Charles Plessy wrote:
 However, I realised that the Ultimate Debian Database, which I thought would 
 be
 a nice place to host the data, works on a retreiving model rather than a
 pushing model. Before elaborating a complex workaround involving an
 intermediate place where maintainers could push their meta-data, does anybody
 think about an alternative? Andreas Tille suggested me the Package Entropy
 Tracker, but it would limit the system to packages hosted in a Subversion
 repository. This said, since many of the packages that caused us dig that
 question (software for which we would like to provide registration and
 bibliographic information) are mostly stored in a Svn, that may not be a
 blocker for making a poof of principle???

Well, I think my mail [1] was a bit missinterpreted.  *Currently* all gatherers
to feed information into UDD are using a retrieving model.  But PET has a
need to use a pushing model and now we might have another case where pushing
information makes much more sense than the currently implemented gatherers.
I did not intended to copy the PET solution (even if it is somehow similar
to what we might need) but I rather wanted to mention that chances are good
that a pushing modell might be implemented as well if the nature of the data
and their use suggests this.  There are no decisions made yet but at least it
was discussed in the PET Bof[2] at DebConf (but I don't think there were
recordings available).

Kind regards

 Andreas.

[1] http://lists.debian.org/debian-med/2009/08/msg9.html
[2] https://penta.debconf.org/dc9_schedule/events/515.en.html

-- 
http://fam-tille.de
Klarmachen zum Ändern!


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-06 Thread Paul Wise
On Wed, Aug 5, 2009 at 12:47 PM, Charles Plessyple...@debian.org wrote:

 I think that you asked the key question, and that the answer will help us to
 sort out the metadata contents in Debian packages.

 Currently, debian/control contains:

  - Informations for the package manager (dpkg). For instance, the package 
 name,
   the build dependancies, the binary dependancies, the Essential field,…

  - Informations for the archive manager (apt). For instance, the section and
   the priority, the package description,…

I'd put the package description in a user category or just its own category.

  - Informations for the online user. For instance the homepage and VCS URLs.

I'd put the homepage in a user category and the VCS URLs in a
developer category.

 Typically, informations for the archive manager that are provided by a package
 repository can differ from the contents of the source package. Descriptions 
 can
 be translated, section can be overriden (the Section: field in the source
 package is not authoritative), Debtags can be added, …

 Informations for the online user could follow the same logic: a copy could be
 included in the source packages, for the benefit of providing it in a central
 place and to give an easy interface to the package maintainers, but the one
 that the users get on-line could be refreshed independantly of package 
 uploads.

 I was thinking to propose to have a supplementary file in the debian directory
 following the ‘Name: contents’ convention of Debian control files (same as 
 YAML
 if we do not do wrapping), that maintainers could update in the source
 package’s VCS (or at worse on their local hard drive) and use to push the
 meta-data in a central database between two uploads if need is.

 However, I realised that the Ultimate Debian Database, which I thought would 
 be
 a nice place to host the data, works on a retreiving model rather than a
 pushing model. Before elaborating a complex workaround involving an
 intermediate place where maintainers could push their meta-data, does anybody
 think about an alternative? Andreas Tille suggested me the Package Entropy
 Tracker, but it would limit the system to packages hosted in a Subversion
 repository. This said, since many of the packages that caused us dig that
 question (software for which we would like to provide registration and
 bibliographic information) are mostly stored in a Svn, that may not be a
 blocker for making a poof of principle…

It seems to me that all this metadata we have about packages, the
canonical location for it is dak's database on ftp-master, the
Packages/Sources files are generated from there.

The data in that database is gathered from .changes files and binary
and source packages uploaded to ftp-master, except for debtags and
translated descriptions (IIRC, not sure how those get in). Re-using
that workflow for meta-data updates, say, by uploading metadata
updates in .changes files instead of full packages could be useful.

How to split up the Packages/Sources files into more granular pieces
would be nice, but which fields should go into which sets needs
defining, and which set of sets should be the default. For
compatability, it could continue to generate Packages/Sources as they
are and add Packages-Homepage, Packages-dpkg, Packages-user etc files
for updated apt/dpkg to use, allowing us to avoid waiting a whole
release cycle to use this stuff.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


--
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-06 Thread Paul Wise
On Thu, Aug 6, 2009 at 9:17 PM, Paul Wise p...@debian.org wrote:

 How to split up the Packages/Sources files into more granular pieces
 would be nice, but which fields should go into which sets needs
 defining, and which set of sets should be the default. For
 compatability, it could continue to generate Packages/Sources as they
 are and add Packages-Homepage, Packages-dpkg, Packages-user etc files
 for updated apt/dpkg to use, allowing us to avoid waiting a whole
 release cycle to use this stuff.

Looks like this bit is already being worked on:

http://lists.debian.org/debian-announce/2009/msg00010.html

 * Move of packages' long descriptions into a separate translated
   package list, which will facilitate their translation and also
   provide a smaller footprint for embedded systems thanks to smaller
   Packages files.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-05 Thread Charles Plessy
Le Sun, Aug 02, 2009 at 01:37:29PM +0200, Paul Wise a écrit :
 I think tying such information to a source or binary package is a bad
 idea since it changes independently of the package. I have similar
 issues with the Homepage field and to a lesser extent, watch files.
 
 Do you think that apt needs to have access to this information?

Hi Paul,

I think that you asked the key question, and that the answer will help us to
sort out the metadata contents in Debian packages.

Currently, debian/control contains:

 - Informations for the package manager (dpkg). For instance, the package name,
   the build dependancies, the binary dependancies, the Essential field,…

 - Informations for the archive manager (apt). For instance, the section and
   the priority, the package description,…

 - Informations for the online user. For instance the homepage and VCS URLs.

Typically, informations for the archive manager that are provided by a package
repository can differ from the contents of the source package. Descriptions can
be translated, section can be overriden (the Section: field in the source
package is not authoritative), Debtags can be added, …

Informations for the online user could follow the same logic: a copy could be
included in the source packages, for the benefit of providing it in a central
place and to give an easy interface to the package maintainers, but the one
that the users get on-line could be refreshed independantly of package uploads.

I was thinking to propose to have a supplementary file in the debian directory
following the ‘Name: contents’ convention of Debian control files (same as YAML
if we do not do wrapping), that maintainers could update in the source
package’s VCS (or at worse on their local hard drive) and use to push the
meta-data in a central database between two uploads if need is.

However, I realised that the Ultimate Debian Database, which I thought would be
a nice place to host the data, works on a retreiving model rather than a
pushing model. Before elaborating a complex workaround involving an
intermediate place where maintainers could push their meta-data, does anybody
think about an alternative? Andreas Tille suggested me the Package Entropy
Tracker, but it would limit the system to packages hosted in a Subversion
repository. This said, since many of the packages that caused us dig that
question (software for which we would like to provide registration and
bibliographic information) are mostly stored in a Svn, that may not be a
blocker for making a poof of principle…

Have a nice day,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-05 Thread Goswin von Brederlow
Charles Plessy ple...@debian.org writes:

 Le Sun, Aug 02, 2009 at 01:37:29PM +0200, Paul Wise a écrit :
 I think tying such information to a source or binary package is a bad
 idea since it changes independently of the package. I have similar
 issues with the Homepage field and to a lesser extent, watch files.
 
 Do you think that apt needs to have access to this information?

 Hi Paul,

 I think that you asked the key question, and that the answer will help us to
 sort out the metadata contents in Debian packages.

 Currently, debian/control contains:

  - Informations for the package manager (dpkg). For instance, the package 
 name,
the build dependancies, the binary dependancies, the Essential field,…

  - Informations for the archive manager (apt). For instance, the section and
the priority, the package description,…

  - Informations for the online user. For instance the homepage and VCS URLs.

- Information for the BTS (maintainer)

- Information for DAK (maintainer, uploader, DM-Allowed)

MfG
Goswin


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



What is the best place for package meta-data ?

2009-08-02 Thread Charles Plessy
Dear all,

In the Debian Med and Science teams, we are looking for efficient ways to
document slow-changing metadata relevant to our packages, in particular:

 - Bibliographic information: which article to cite when a software is used
   in a published scientific work. This can be summarised by a digtal object
   identifier, like http://dx.doi.org/10.1016/S0168-9525(00)02024-2, or
   without the reslover part (http://dx.doi.org/). 

 - Registration information: where is the registration page that the users
   would have had to go through if they had not used our packages.

These two pieces of information are crucial to researchers who periodically
need to justify the spending of public money in maintaining academic software.
Usefullness is often measured by counting citations or registered users, and
the Debian popcon scores are not a good replacement since they are either
skewed (install) or under-estimations (votes), and that anyway they only count
the Debian contribution. 

One possibility to guide our users to the upstream registration page is to use
Debconf. I think that I do not need to explain on this list why it is not
satisfactory.

Alternatives will be much easier to build if we manage to centralise the
information in a common place. This could be in the source packages themselves,
either in a dedicated file or in debian/control (but not necessarly ending in
the Packages and Sources files), or in the file we use to create our
metapackages. Ultimately we would like to be able to have this information flow
in places like the Ultimate Debian Database and the web pages proposing the
packages for download.

If the concept is popular, it could be expanded, in particular to the software
that allow the users to contribute some money (the now famous “Paypal
buttons“).

The two possibilities that do not require much new development, storing the
meta-data in the source packages or in metapackages, have some opposite
features. In particular, if the meta-data is in the source packages, but not in
the Packages and Source files in the mirrors, then it becomes difficult to
access it when the source package is not stored in a VCS.  Conversly, if we
store the meta-data in our metapackages, the data is easy to access to us and
the users of the Debian Blends, but not to those who use the packages directly,
unless we commit ourselves to feed central information places like the Ultimate
Debian Database and to keep the information up-to-date, which will be at best
limited to the packages relevant to our projects.

I am therefore seeking comments and insights to better manage our packages
metadata.

Have a nice Sunday,

-- 
Charles Plessy
Debian Med packaging team,
http://www.debian.org/devel/debian-med
Tsurumi, Kanagawa, Japan


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Re: What is the best place for package meta-data ?

2009-08-02 Thread Paul Wise
I think tying such information to a source or binary package is a bad
idea since it changes independently of the package. I have similar
issues with the Homepage field and to a lesser extent, watch files.

Do you think that apt needs to have access to this information?

The Packages/Sources files are already very huge, which is problematic
for embedded systems with little storage space.

I've been slowly coming to the conclusion that the monolithic Packages
files need to be split into different parts for different use-cases;
for eg dependency resolution, checksums/filenames, descriptions,
homepages, debtags. I'm not sure exactly how to split up the existing
info though. With this split could come extra meta-data, like
PackageMap/CPE names, screenshot URLs, number of bugs, number of
lintian info/warning/etc.

-- 
bye,
pabs

http://wiki.debian.org/PaulWise


-- 
To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org