Re: What is the best place for package meta-data ?
Le mercredi 16 décembre 2009 à 10:14 +0900, Charles Plessy a écrit : Dear Guillem and Olivier, yes, I have been pointed DOAP (and PackageMap) on the debian-qa and debian-mentors mailing lists. I have spent a couple of hours this week reading things about “Semantic web” and related things. My conclusion is that the languages for linking concepts that are formalised in RDF files (XML, Notation 3, Turtle, N-triples, …), are too complex compared to simple YAML files. However, if we consider the DOAP as a simple list of keywords on which to standardise, then I can do my best to stick to them as far as possible. RDF+XML may be too complex and verbose and probably has many other aspects that can be criticized, but it is nevertheless a standard that's the only one so far that helps contruct the Semantic Web (unless using other forms of RDF, like RDFa)... so do whatever you want, but if you limit yourself to custom ad-hoc local formats, and don't use standards, you'll limit the potential reuse of what you did for so-far unexpected applications. It's up to you to eventually think beyond current needs, and then open the door for others to build on top of what you did, on the Semantic Web ;) Best regards, -- Olivier BERGER olivier.ber...@it-sudparis.eu http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Le Mon, Dec 14, 2009 at 12:37:20PM +0100, Guillem Jover a écrit : Given that this is supposed to be upstream-only information, are you aware of DOAP [0]? It seems to me it would be better to reuse already existing infrastructure than to create yet a new one, that only a subset of Debian might end up using. This could also be submitted upstream I think. It's XML though, but it could be easily transformed to any output format we'd want to use, say control-style for example, which most of our tools already handle, if need be. Dear Guillem and Olivier, yes, I have been pointed DOAP (and PackageMap) on the debian-qa and debian-mentors mailing lists. I have spent a couple of hours this week reading things about “Semantic web” and related things. My conclusion is that the languages for linking concepts that are formalised in RDF files (XML, Notation 3, Turtle, N-triples, …), are too complex compared to simple YAML files. However, if we consider the DOAP as a simple list of keywords on which to standardise, then I can do my best to stick to them as far as possible. This would allow to do the reverse of what you propose: it would ease the translation of the metadata we collect from a simple YAML format (which is very similar to Debian ‘paragraph’ control files) to XML, if there would be a volunteer to do so. Let's finish by an example: how to declare a homepage. In YAML --- Homepage: http://toto.example.com In XML -- rdf:RDF xmlns:rdf=http://www.w3.org/1999/02/22-rdf-syntax-ns#; xmlns:rdfs=http://www.w3.org/2000/01/rdf-schema#; xmlns:foaf=http://xmlns.com/foaf/0.1/; xmlns=http://usefulinc.com/ns/doap#; Project homepage rdf:resource=http://toto.example.com; / /Project /rdf:RDF I hope that it demonstrates the case that if we want the package maintainers themselves to enter the information (which is what I propose), then the complexity of RDF is a strong barrier to adoption. In summary: I will try to use the same keywords as DOAP, in order to keep a door open, but I think that using something as complex as RDF is prematurate. Have a nice day, -- Charles Plessy Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Hi! On Sun, 2009-08-02 at 18:47:00 +0900, Charles Plessy wrote: In the Debian Med and Science teams, we are looking for efficient ways to document slow-changing metadata relevant to our packages, in particular: Alternatives will be much easier to build if we manage to centralise the information in a common place. This could be in the source packages themselves, either in a dedicated file or in debian/control (but not necessarly ending in the Packages and Sources files), or in the file we use to create our metapackages. Ultimately we would like to be able to have this information flow in places like the Ultimate Debian Database and the web pages proposing the packages for download. Given that this is supposed to be upstream-only information, are you aware of DOAP [0]? It seems to me it would be better to reuse already existing infrastructure than to create yet a new one, that only a subset of Debian might end up using. This could also be submitted upstream I think. It's XML though, but it could be easily transformed to any output format we'd want to use, say control-style for example, which most of our tools already handle, if need be. [0] http://trac.usefulinc.com/doap http://www.oss-watch.ac.uk/resources/doap.xml regards, guillem -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Le lundi 14 décembre 2009 à 12:37 +0100, Guillem Jover a écrit : Hi! On Sun, 2009-08-02 at 18:47:00 +0900, Charles Plessy wrote: In the Debian Med and Science teams, we are looking for efficient ways to document slow-changing metadata relevant to our packages, in particular: Alternatives will be much easier to build if we manage to centralise the information in a common place. This could be in the source packages themselves, either in a dedicated file or in debian/control (but not necessarly ending in the Packages and Sources files), or in the file we use to create our metapackages. Ultimately we would like to be able to have this information flow in places like the Ultimate Debian Database and the web pages proposing the packages for download. Given that this is supposed to be upstream-only information, are you aware of DOAP [0]? It seems to me it would be better to reuse already existing infrastructure than to create yet a new one, that only a subset of Debian might end up using. This could also be submitted upstream I think. It's XML though, but it could be easily transformed to any output format we'd want to use, say control-style for example, which most of our tools already handle, if need be. [0] http://trac.usefulinc.com/doap http://www.oss-watch.ac.uk/resources/doap.xml +1 for the use of RDF and appropriate ontologies, like DOAP, FOAF and other like scientific publication related, as already discussed previously on -da list IIRC. My 2 cents, -- Olivier BERGER olivier.ber...@it-sudparis.eu http://www-public.it-sudparis.eu/~berger_o/ - OpenPGP-Id: 1024D/6B829EEC Ingénieur Recherche - Dept INF Institut TELECOM, SudParis (http://www.it-sudparis.eu/), Evry (France) -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Le Thu, Aug 06, 2009 at 09:17:22PM +0200, Paul Wise a écrit : I'd put the homepage in a user category and the VCS URLs in a developer category. The data in that database is gathered from .changes files and binary and source packages uploaded to ftp-master, except for debtags and translated descriptions (IIRC, not sure how those get in). Re-using that workflow for meta-data updates, say, by uploading metadata updates in .changes files instead of full packages could be useful. Interesting idea… I was about to propose to put all upstream-related metadata in a YAML-encoded file, but if a .changes file is to be generated, the Debian control format may be preferrable. Anyway, this would only make a difference if there were multi-line field contents. Here is an example: aqwa『debian-med』$ cat samtools/debian/upstream-metadata.yaml DOI: 10.1093/bioinformatics/btp352 Homepage: http://samtools.sourceforge.net PMID: 19505943 Reference: | @article{HengLi06082009, author = {Li, Heng and Handsaker, Bob and Wysoker, Alec and Fennell, Tim and Ruan, Jue and Homer, Nils and Marth, Gabor and Abecasis, Goncalo and Durbin, Richard and 1000 Genome Project Data Processing Subgroup, }, title = {{The Sequence Alignment/Map (SAM) Format and SAMtools}}, journal = {Bioinformatics}, volume = {}, number = {}, pages = {btp352}, doi = {10.1093/bioinformatics/btp352}, year = {2009}, URL = {http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btp352v1}, eprint = {http://bioinformatics.oxfordjournals.org/cgi/reprint/btp352v1.pdf} } Repository: https://samtools.svn.sourceforge.net/svnroot/samtools The advantage of yaml format is that it is trivial to parse using existing libraries: aqwa『debian-med』$ perl -MYAML -e '$/=; my($fields) = Load(STDIN); print $fields-{'DOI'}' samtools/debian/upstream-metadata.yaml 10.1093/bioinformatics/btp352 I am unsure if it is a good idea to manage multi-line upstream meta-data anyway. Are there other opinions on this ? Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
On Wed, Aug 05, 2009 at 07:47:25PM +0900, Charles Plessy wrote: However, I realised that the Ultimate Debian Database, which I thought would be a nice place to host the data, works on a retreiving model rather than a pushing model. Before elaborating a complex workaround involving an intermediate place where maintainers could push their meta-data, does anybody think about an alternative? Andreas Tille suggested me the Package Entropy Tracker, but it would limit the system to packages hosted in a Subversion repository. This said, since many of the packages that caused us dig that question (software for which we would like to provide registration and bibliographic information) are mostly stored in a Svn, that may not be a blocker for making a poof of principle??? Well, I think my mail [1] was a bit missinterpreted. *Currently* all gatherers to feed information into UDD are using a retrieving model. But PET has a need to use a pushing model and now we might have another case where pushing information makes much more sense than the currently implemented gatherers. I did not intended to copy the PET solution (even if it is somehow similar to what we might need) but I rather wanted to mention that chances are good that a pushing modell might be implemented as well if the nature of the data and their use suggests this. There are no decisions made yet but at least it was discussed in the PET Bof[2] at DebConf (but I don't think there were recordings available). Kind regards Andreas. [1] http://lists.debian.org/debian-med/2009/08/msg9.html [2] https://penta.debconf.org/dc9_schedule/events/515.en.html -- http://fam-tille.de Klarmachen zum Ändern! -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
On Wed, Aug 5, 2009 at 12:47 PM, Charles Plessyple...@debian.org wrote: I think that you asked the key question, and that the answer will help us to sort out the metadata contents in Debian packages. Currently, debian/control contains: - Informations for the package manager (dpkg). For instance, the package name, the build dependancies, the binary dependancies, the Essential field,… - Informations for the archive manager (apt). For instance, the section and the priority, the package description,… I'd put the package description in a user category or just its own category. - Informations for the online user. For instance the homepage and VCS URLs. I'd put the homepage in a user category and the VCS URLs in a developer category. Typically, informations for the archive manager that are provided by a package repository can differ from the contents of the source package. Descriptions can be translated, section can be overriden (the Section: field in the source package is not authoritative), Debtags can be added, … Informations for the online user could follow the same logic: a copy could be included in the source packages, for the benefit of providing it in a central place and to give an easy interface to the package maintainers, but the one that the users get on-line could be refreshed independantly of package uploads. I was thinking to propose to have a supplementary file in the debian directory following the ‘Name: contents’ convention of Debian control files (same as YAML if we do not do wrapping), that maintainers could update in the source package’s VCS (or at worse on their local hard drive) and use to push the meta-data in a central database between two uploads if need is. However, I realised that the Ultimate Debian Database, which I thought would be a nice place to host the data, works on a retreiving model rather than a pushing model. Before elaborating a complex workaround involving an intermediate place where maintainers could push their meta-data, does anybody think about an alternative? Andreas Tille suggested me the Package Entropy Tracker, but it would limit the system to packages hosted in a Subversion repository. This said, since many of the packages that caused us dig that question (software for which we would like to provide registration and bibliographic information) are mostly stored in a Svn, that may not be a blocker for making a poof of principle… It seems to me that all this metadata we have about packages, the canonical location for it is dak's database on ftp-master, the Packages/Sources files are generated from there. The data in that database is gathered from .changes files and binary and source packages uploaded to ftp-master, except for debtags and translated descriptions (IIRC, not sure how those get in). Re-using that workflow for meta-data updates, say, by uploading metadata updates in .changes files instead of full packages could be useful. How to split up the Packages/Sources files into more granular pieces would be nice, but which fields should go into which sets needs defining, and which set of sets should be the default. For compatability, it could continue to generate Packages/Sources as they are and add Packages-Homepage, Packages-dpkg, Packages-user etc files for updated apt/dpkg to use, allowing us to avoid waiting a whole release cycle to use this stuff. -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
On Thu, Aug 6, 2009 at 9:17 PM, Paul Wise p...@debian.org wrote: How to split up the Packages/Sources files into more granular pieces would be nice, but which fields should go into which sets needs defining, and which set of sets should be the default. For compatability, it could continue to generate Packages/Sources as they are and add Packages-Homepage, Packages-dpkg, Packages-user etc files for updated apt/dpkg to use, allowing us to avoid waiting a whole release cycle to use this stuff. Looks like this bit is already being worked on: http://lists.debian.org/debian-announce/2009/msg00010.html * Move of packages' long descriptions into a separate translated package list, which will facilitate their translation and also provide a smaller footprint for embedded systems thanks to smaller Packages files. -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Le Sun, Aug 02, 2009 at 01:37:29PM +0200, Paul Wise a écrit : I think tying such information to a source or binary package is a bad idea since it changes independently of the package. I have similar issues with the Homepage field and to a lesser extent, watch files. Do you think that apt needs to have access to this information? Hi Paul, I think that you asked the key question, and that the answer will help us to sort out the metadata contents in Debian packages. Currently, debian/control contains: - Informations for the package manager (dpkg). For instance, the package name, the build dependancies, the binary dependancies, the Essential field,… - Informations for the archive manager (apt). For instance, the section and the priority, the package description,… - Informations for the online user. For instance the homepage and VCS URLs. Typically, informations for the archive manager that are provided by a package repository can differ from the contents of the source package. Descriptions can be translated, section can be overriden (the Section: field in the source package is not authoritative), Debtags can be added, … Informations for the online user could follow the same logic: a copy could be included in the source packages, for the benefit of providing it in a central place and to give an easy interface to the package maintainers, but the one that the users get on-line could be refreshed independantly of package uploads. I was thinking to propose to have a supplementary file in the debian directory following the ‘Name: contents’ convention of Debian control files (same as YAML if we do not do wrapping), that maintainers could update in the source package’s VCS (or at worse on their local hard drive) and use to push the meta-data in a central database between two uploads if need is. However, I realised that the Ultimate Debian Database, which I thought would be a nice place to host the data, works on a retreiving model rather than a pushing model. Before elaborating a complex workaround involving an intermediate place where maintainers could push their meta-data, does anybody think about an alternative? Andreas Tille suggested me the Package Entropy Tracker, but it would limit the system to packages hosted in a Subversion repository. This said, since many of the packages that caused us dig that question (software for which we would like to provide registration and bibliographic information) are mostly stored in a Svn, that may not be a blocker for making a poof of principle… Have a nice day, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
Charles Plessy ple...@debian.org writes: Le Sun, Aug 02, 2009 at 01:37:29PM +0200, Paul Wise a écrit : I think tying such information to a source or binary package is a bad idea since it changes independently of the package. I have similar issues with the Homepage field and to a lesser extent, watch files. Do you think that apt needs to have access to this information? Hi Paul, I think that you asked the key question, and that the answer will help us to sort out the metadata contents in Debian packages. Currently, debian/control contains: - Informations for the package manager (dpkg). For instance, the package name, the build dependancies, the binary dependancies, the Essential field,⦠- Informations for the archive manager (apt). For instance, the section and the priority, the package description,⦠- Informations for the online user. For instance the homepage and VCS URLs. - Information for the BTS (maintainer) - Information for DAK (maintainer, uploader, DM-Allowed) MfG Goswin -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
What is the best place for package meta-data ?
Dear all, In the Debian Med and Science teams, we are looking for efficient ways to document slow-changing metadata relevant to our packages, in particular: - Bibliographic information: which article to cite when a software is used in a published scientific work. This can be summarised by a digtal object identifier, like http://dx.doi.org/10.1016/S0168-9525(00)02024-2, or without the reslover part (http://dx.doi.org/). - Registration information: where is the registration page that the users would have had to go through if they had not used our packages. These two pieces of information are crucial to researchers who periodically need to justify the spending of public money in maintaining academic software. Usefullness is often measured by counting citations or registered users, and the Debian popcon scores are not a good replacement since they are either skewed (install) or under-estimations (votes), and that anyway they only count the Debian contribution. One possibility to guide our users to the upstream registration page is to use Debconf. I think that I do not need to explain on this list why it is not satisfactory. Alternatives will be much easier to build if we manage to centralise the information in a common place. This could be in the source packages themselves, either in a dedicated file or in debian/control (but not necessarly ending in the Packages and Sources files), or in the file we use to create our metapackages. Ultimately we would like to be able to have this information flow in places like the Ultimate Debian Database and the web pages proposing the packages for download. If the concept is popular, it could be expanded, in particular to the software that allow the users to contribute some money (the now famous “Paypal buttons“). The two possibilities that do not require much new development, storing the meta-data in the source packages or in metapackages, have some opposite features. In particular, if the meta-data is in the source packages, but not in the Packages and Source files in the mirrors, then it becomes difficult to access it when the source package is not stored in a VCS. Conversly, if we store the meta-data in our metapackages, the data is easy to access to us and the users of the Debian Blends, but not to those who use the packages directly, unless we commit ourselves to feed central information places like the Ultimate Debian Database and to keep the information up-to-date, which will be at best limited to the packages relevant to our projects. I am therefore seeking comments and insights to better manage our packages metadata. Have a nice Sunday, -- Charles Plessy Debian Med packaging team, http://www.debian.org/devel/debian-med Tsurumi, Kanagawa, Japan -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Re: What is the best place for package meta-data ?
I think tying such information to a source or binary package is a bad idea since it changes independently of the package. I have similar issues with the Homepage field and to a lesser extent, watch files. Do you think that apt needs to have access to this information? The Packages/Sources files are already very huge, which is problematic for embedded systems with little storage space. I've been slowly coming to the conclusion that the monolithic Packages files need to be split into different parts for different use-cases; for eg dependency resolution, checksums/filenames, descriptions, homepages, debtags. I'm not sure exactly how to split up the existing info though. With this split could come extra meta-data, like PackageMap/CPE names, screenshot URLs, number of bugs, number of lintian info/warning/etc. -- bye, pabs http://wiki.debian.org/PaulWise -- To UNSUBSCRIBE, email to debian-devel-requ...@lists.debian.org with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org