Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. The extended syntax allows list items to be treated specially by frontends (for instance, bullet characters can be replaced with graphics, and the body of the list item can be word-wrapped); current Descriptions should either parse correctly or be no worse off than they currently are. Current versions of aptitude implement this proposal. Daniel Abstract: This document describes a proposed extension to the Description formatting policy (policy section 5.6.13) to support better formatting of bullet lists in package descriptions. The proposed policy is primarily a formalization of existing best practice regarding bullets; most current package descriptions will parse as expected with no changes, and packages that do not can easily be converted to the new format without degrading presentation in legacy package management tools. The Description extensions described in this document are presently implemented in aptitude (= 0.4.0). Background and Motivation: Policy 5.6.23 provides for preformatted lines in descriptions. These are lines beginning with at least two spaces, and will be displayed verbatim; they are either unwrapped or hard-wrapped. The current best-practice approach to including bullet lists in package descriptions is to write each line of the list as a preformatted line; for instance, = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: . * User mode emulation: QEMU can launch Linux processes compiled for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, including a processor and various peripherials. It enables easier testing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. . As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = This convention has several serious limitations, however. Perhaps most importantly, it does not gracefully accomadate smaller terminals; while other paragraphs are word-wrapped by conforming user interfaces, word-wrapping of these preformatted paragraphs is (rightly) forbidden. This leads to poor readability when the terminal size is decreased; for instance, formatting to 60 columns produces: = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: * User mode emulation: QEMU can launch Linux processes compi led for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, includ ing a processor and various peripherials. It enables easier test ing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. . As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = In contrast, the proposed mechanism allows this description to be formatted in 60 columns as follows: = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: * User mode emulation: QEMU can launch Linux processes compiled for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, including a processor and various peripherials. It enables easier testing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = Extensions to the syntax of Description blocks: As mentioned above, all lines beginning with two or more spaces are treated identically under current Policy. This proposal introduces the concept of a /bulleted block/. A bulleted block consists of a series of lines such that: (1) The first line begins with N 2 spaces, (2) The first non-space character of the first line is a bullet character, and (3) Each subsequent line begins with at least N + 1 + M spaces, where M is the number of spaces immediately following the first non-space character of the first line. For the purposes of this definition, the bullet characters are [*-+]. The following are examples of bulleted blocks: = snip here = * If Peter Piper picked a peck of pickled
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Mon, Dec 12, 2005 at 03:09:52PM -0800, Daniel Burrows wrote: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. The extended wow! that's quite a document. i'm glad to see that people are focusing on the Really Big problems facing debian today. okay, that was a bit punchy... sorry i couldn't help it :) seriously though, i think the proposal is quite well written. the only critique i have is that i think it's maybe going a little too far out there to talk about nested lists, as i can't imagine them being at all practical in what's supposed to be a short, informative description of a package. sean signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
[Daniel Burrows] (1) The first line begins with N 2 spaces, Don't you mean N = 2? (2) The first non-space character of the first line is a bullet character, and (3) Each subsequent line begins with at least N + 1 + M spaces, where M is the number of spaces immediately following the first non-space character of the first line. I think that's overgeneral. I do not see the point in allowing M != 1. I say keep the spec simple; forcing frontends to include parsing complexity that doesn't even add anything useful is a Bad Thing. (If aptitude wants to handle M != 1 in order to make certain legacy descriptions look better, fine, but I don't think it should be explicitly condoned.) For the purposes of this definition, the bullet characters are [*-+]. Agreed, o was always a hack, though one I myself have been guilty of. The individual lines within a bulleted block should be parsed as if N+M characters were stripped from the left-hand side of the block (if M=0, the initial bullet character should be treated as if it were a space). See above about supporting M == 0. It doesn't even look good in legacy frontends. And as you say later on a related subject: Furthermore, fixing the description in this case is trivial. Peter signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Burrows [EMAIL PROTECTED] writes: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. That's quite a complex document for something I believe should be quite simple. When I use Emacs, it can reflow text (M-q) by looking at the indentation level of the following lines. It can even cope with bullets, outdents, indents, etc. If a frontend display routine could handle that, that would solve the problem generically, and would handle any level of indentation required. Specifically regarding bullets: We now have UTF-8 encoded control files, so why not simply use the UCS bullet character (U+2022)? Regards, Roger - -- Roger Leigh Printing on GNU/Linux? http://gimp-print.sourceforge.net/ Debian GNU/Linuxhttp://www.debian.org/ GPG Public Key: 0x25BFB848. Please sign and encrypt your mail. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/ iD8DBQFDng9YVcFcaSW/uEgRApKxAKCVYU3MScc4m28D7wEuUHzRG2hRgACfaYIn m0MyBHEJLb5GGqmzOigDuis= =79Bc -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Mon, Dec 12, 2005 at 06:38:59PM -0500, sean finney [EMAIL PROTECTED] was heard to say: On Mon, Dec 12, 2005 at 03:09:52PM -0800, Daniel Burrows wrote: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. The extended wow! that's quite a document. i'm glad to see that people are focusing on the Really Big problems facing debian today. okay, that was a bit punchy... sorry i couldn't help it :) seriously though, i think the proposal is quite well written. the only critique i have is that i think it's maybe going a little too far out there to talk about nested lists, as i can't imagine them being at all practical in what's supposed to be a short, informative description of a package. That's a fair point, but I felt that there wasn't any reason to artificially restrict the sorts of lists that could be handled when nested lists can be dealt with in such a natural fashion. There are at least a few cases where I can imagine two-level lists being useful, and they seem to exist in the wild (see samhain and xml-core, for instance). One interesting question that I can see is whether to treat a *word-wrapped* line that starts with a bullet character as a bulleted item. Doing so would make several more natural ways of expressing lists work, especially nested lists -- the example in my document is actually wrong! On the other hand, doing this at the top-level would mean that conforming descriptions wouldn't degrade cleanly, while doing it only for sub-lists is inelegant. Daniel signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Mon, Dec 12, 2005 at 03:09:52PM -0800, Daniel Burrows wrote: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. The extended syntax allows list items to be treated specially by frontends (for instance, bullet characters can be replaced with graphics, and the body of the list item can be word-wrapped); current Descriptions should either parse correctly or be no worse off than they currently are. Current versions of aptitude implement this proposal. Excellent initiative to work on this. Extensions to the syntax of Description blocks: As mentioned above, all lines beginning with two or more spaces are treated identically under current Policy. This proposal introduces the concept of a /bulleted block/. A bulleted block consists of a series of lines such that: s/lines/series of items/, otherwise, you'd have a a series of lines that each consist of a number of lines: ambigious (or at least confusing) wording. (1) The first line begins with N 2 spaces, That should be N = 2 I presume, otherwise, your examples are inconsistent with this definition. Also, I think that for inclusion in policy the text should make clear that the leading space required for line continuation in the control file block is included in N. (2) The first non-space character of the first line is a bullet character, and (3) Each subsequent line begins with at least N + 1 + M spaces, where M is the number of spaces immediately following the first non-space character of the first line. I'd note that M might be as small as zero. This is implied by the lack of prohibition in (2), but it can't hurt to be clear. For the purposes of this definition, the bullet characters are [*-+]. Better write [*+-], to prevent needless ambiguity with regex syntax, where [*-+] actually means '*' or '+', excluding '-'. Or just list the three characters. In concreto, I'd suggest using the following definition instead, also covering the nested bulleting that's only defined-by-example below: For in policy 5.6.12: * Those starting with two or more spaces. These will be displayed verbatim, unless it is part of a bulleted list. The first line of a bulleted list will start, after the two or more spaces, with a bullet character ('*', '-' or '+'), followed by zero or more spaces, followed by the beginning of the item text. Each line after this line will be either: - continued bullet item text, indented at least as far as the beginning of the bullet item text on the line of the last bullet, or - a subsequent bullet item, with the same bullet item character, indented at the same level, with the item text also starting at the same (or deeper) level as the first bullet item - a nested bullet item, according to the rules of a first top-level bullet item line, but indented at least as deep as the bullet item text of the current level Every other line not matching the definition above is considered to be part of a verbatim text block, and the bullet item list is then supposed to be terminated on the preceeding line. /definition I do think this does make it a bit complicated though... hm. The following are examples of bulleted blocks: = snip here = * If Peter Piper picked a peck of pickled peppers, how many pickled peppers did Peter Piper pick? *Fourscore and seven years ago, our forefathers brought forth upon this continent a new nation. . -- Abraham Lincoln, 16th President of the United States of America According to your and my definition, this would be a bullet item too, like (in HTML) li- Abraham Lincoln (...) America/li ? Or how do I misread your definition then, if this is *not* an instance of a bulleted list? Alternative: A totally different way might be to exploit policy's opportunity to extending the description syntax. Policy 5.6.12 explicitely states that lines starting with a dot should not be used and are left open for future expansions. So why not use *that*? Like, definition by example: Descrition: Great package to moo This package totally rocks, because: . It has super cow powers . The internet access point at the Wig and Pen bar is using this package too, and customers are happy with it . This includes Lachlan McOmish . And more stuff This is a verbatim block (indented at least one space more than normal continuation bullet text would need to be) . It's written in whitespace, with a level editor in brainfuck This will allow for both verbatim blocks in bulleted lists, will leave verbatim blocks totally like they are today, no redefinitions, and enables to be defined more easily (no need to care for what happens with looks-like-bulleted-but-isn't-bulleted stuff). Only disadvantage I see is closing the door for even further
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Mon, Dec 12, 2005 at 05:49:02PM -0600, Peter Samuelson [EMAIL PROTECTED] was heard to say: [Daniel Burrows] (1) The first line begins with N 2 spaces, Don't you mean N = 2? (2) The first non-space character of the first line is a bullet character, and (3) Each subsequent line begins with at least N + 1 + M spaces, where M is the number of spaces immediately following the first non-space character of the first line. I think that's overgeneral. I do not see the point in allowing M != 1. I say keep the spec simple; forcing frontends to include parsing complexity that doesn't even add anything useful is a Bad Thing. (If aptitude wants to handle M != 1 in order to make certain legacy descriptions look better, fine, but I don't think it should be explicitly condoned.) Good point. I've adjusted this in my copy (updated version attached). Daniel Abstract: This document describes a proposed extension to the Description formatting policy (policy section 5.6.13) to support better formatting of bullet lists in package descriptions. The proposed policy is primarily a formalization of existing best practice regarding bullets; most current package descriptions will parse as expected with no changes, and packages that do not can easily be converted to the new format without degrading presentation in legacy package management tools. The Description extensions described in this document are presently implemented in aptitude (= 0.4.0). Background and Motivation: Policy 5.6.23 provides for preformatted lines in descriptions. These are lines beginning with at least two spaces, and will be displayed verbatim; they are either unwrapped or hard-wrapped. The current best-practice approach to including bullet lists in package descriptions is to write each line of the list as a preformatted line; for instance, = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: . * User mode emulation: QEMU can launch Linux processes compiled for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, including a processor and various peripherials. It enables easier testing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. . As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = This convention has several serious limitations, however. Perhaps most importantly, it does not gracefully accomadate smaller terminals; while other paragraphs are word-wrapped by conforming user interfaces, word-wrapping of these preformatted paragraphs is (rightly) forbidden. This leads to poor readability when the terminal size is decreased; for instance, formatting to 60 columns produces: = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: * User mode emulation: QEMU can launch Linux processes compi led for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, includ ing a processor and various peripherials. It enables easier test ing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. . As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = In contrast, the proposed mechanism allows this description to be formatted in 60 columns as follows: = snip here = QEMU is a FAST! processor emulator: currently the package supports arm, powerpc, sparc and x86 emulation. By using dynamic translation it achieves reasonable speed while being easy to port on new host CPUs. QEMU has two operating modes: * User mode emulation: QEMU can launch Linux processes compiled for one CPU on another CPU. * Full system emulation: QEMU emulates a full system, including a processor and various peripherials. It enables easier testing and debugging of system code. It can also be used to provide virtual hosting of several virtual PC on a single server. As QEMU requires no host kernel patches to run, it is very safe and easy to use. = snip here = Extensions to the syntax of Description blocks: As mentioned above, all lines beginning with two or more spaces are treated identically under current Policy. This proposal introduces the concept of a /bulleted block/. A bulleted block consists of a series of lines such that: (1) The first line begins with N = 2 spaces, (2) The first
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Tue, Dec 13, 2005 at 12:01:52AM +, Roger Leigh [EMAIL PROTECTED] was heard to say: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Burrows [EMAIL PROTECTED] writes: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. That's quite a complex document for something I believe should be quite simple. When I use Emacs, it can reflow text (M-q) by looking at the indentation level of the following lines. It can even cope with bullets, outdents, indents, etc. If a frontend display routine could handle that, that would solve the problem generically, and would handle any level of indentation required. The heart of the document describes how to do this in a simple and precise way. The first section explains some reasons that it's useful to recognize bulleted lists, while the last couple sections have implementation notes and analysis of the impact on current frontends and Descriptions. Specifically regarding bullets: We now have UTF-8 encoded control files, so why not simply use the UCS bullet character (U+2022)? It might make sense to recognize the Unicode bullet character, but forcing people to use it is not a good idea for several reasons, with backwards-compatibility being a major one. Daniel signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
On Tue, Dec 13, 2005 at 01:21:41AM +0100, Jeroen van Wolffelaar [EMAIL PROTECTED] was heard to say: On Mon, Dec 12, 2005 at 03:09:52PM -0800, Daniel Burrows wrote: Extensions to the syntax of Description blocks: As mentioned above, all lines beginning with two or more spaces are treated identically under current Policy. This proposal introduces the concept of a /bulleted block/. A bulleted block consists of a series of lines such that: s/lines/series of items/, otherwise, you'd have a a series of lines that each consist of a number of lines: ambigious (or at least confusing) wording. Each bulleted block is one bullet item -- the syntax doesn't care about sequences of items, just individual ones (although the frontend is free to treat sequences differently from lone items). This can probably be made clearer. How about this wording? = snip here = As mentioned above, all lines beginning with two or more spaces are treated identically under current Policy. This proposal introduces the concept of a /bulleted block/. A bulleted block represents the contents of a single list item; it consists of a series of lines such that: = snip here = (2) The first non-space character of the first line is a bullet character, and (3) Each subsequent line begins with at least N + 1 + M spaces, where M is the number of spaces immediately following the first non-space character of the first line. I'd note that M might be as small as zero. This is implied by the lack of prohibition in (2), but it can't hurt to be clear. As was noted in another reply, it probably makes sense to require M == 1 in the spec; only a few packages that I've seen use anything else and it looks best in legacy frontends. For the purposes of this definition, the bullet characters are [*-+]. Better write [*+-], to prevent needless ambiguity with regex syntax, where [*-+] actually means '*' or '+', excluding '-'. Or just list the three characters. Good point. I'm afraid that I'm guilty of being a bit lazy here. In concreto, I'd suggest using the following definition instead, also covering the nested bulleting that's only defined-by-example below: It actually is defined, but you have to read between the lines. (it's the bit about parsing as if N+M spaces were stripped) Unfortunately, my example was wrong! (see attachment) For in policy 5.6.12: [snip] As you noted, this is a bit hard to read; I'll probably tackle the problem of writing this up for Policy at some point in the future, once there's some agreement on what the format should be. The following are examples of bulleted blocks: = snip here = * If Peter Piper picked a peck of pickled peppers, how many pickled peppers did Peter Piper pick? *Fourscore and seven years ago, our forefathers brought forth upon this continent a new nation. . -- Abraham Lincoln, 16th President of the United States of America According to your and my definition, this would be a bullet item too, like (in HTML) li- Abraham Lincoln (...) America/li ? Or how do I misread your definition then, if this is *not* an instance of a bulleted list? Yes, that's a good point. My example includes something that parses incorrectly :-). Note, however, that in the wild there are exactly two packages that break this way in aptitude out of the whole archive (checked with a regexp search), and that even these would not break if we required a space after the bullet character. Alternative: A totally different way might be to exploit policy's opportunity to extending the description syntax. Policy 5.6.12 explicitely states that lines starting with a dot should not be used and are left open for future expansions. So why not use *that*? Like, definition by example: I don't like this approach as much for a couple reasons -- it's not as natural (the format I proposed looks fine without any interpretation at all), you'd have to edit a lot of packages (most bulleted lists are in the right format already, in my observation), and it might not display properly in legacy frontends (frontends might display them literally, but Policy is silent on how these lines should be handled; aptitude IGNORES everything past the dot!) Daniel signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Daniel Burrows [EMAIL PROTECTED] writes: On Tue, Dec 13, 2005 at 12:01:52AM +, Roger Leigh [EMAIL PROTECTED] was heard to say: Daniel Burrows [EMAIL PROTECTED] writes: The attached text is a first draft of a proposed extension to the Description field to explicitly handle bulleted lists. That's quite a complex document for something I believe should be quite simple. When I use Emacs, it can reflow text (M-q) by looking at the indentation level of the following lines. It can even cope with bullets, outdents, indents, etc. If a frontend display routine could handle that, that would solve the problem generically, and would handle any level of indentation required. The heart of the document describes how to do this in a simple and precise way. The first section explains some reasons that it's useful to recognize bulleted lists, while the last couple sections have implementation notes and analysis of the impact on current frontends and Descriptions. Sure. This was not meant to be overly critical. It's just that Emacs has already solved the problem, and can even cope with the case of a bullet appearing as the first character of a paragraph line. You could just copy that algorithm. Specifically regarding bullets: We now have UTF-8 encoded control files, so why not simply use the UCS bullet character (U+2022)? It might make sense to recognize the Unicode bullet character, but forcing people to use it is not a good idea for several reasons, with backwards-compatibility being a major one. ACK. - -- Roger Leigh Printing on GNU/Linux? http://gimp-print.sourceforge.net/ Debian GNU/Linuxhttp://www.debian.org/ GPG Public Key: 0x25BFB848. Please sign and encrypt your mail. -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.2 (GNU/Linux) Comment: Processed by Mailcrypt 3.5.8 http://mailcrypt.sourceforge.net/ iD8DBQFDnh1JVcFcaSW/uEgRAlMzAKDqO6WNkjnc3n57AmLucFVGWjp9EwCfQTWg K9wkKPDWKwTCmbj1+X0nc9o= =IroD -END PGP SIGNATURE- -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of unsubscribe. Trouble? Contact [EMAIL PROTECTED]
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
[Daniel Burrows] (1) The first line begins with N = 2 spaces, (3) Each subsequent line begins with at least N + 2 spaces. Hm. That brings up the minor point of whether N should ever be anything but (2 * nest_level). I don't feel strongly about that one, though. Also, in the Best Practices section, it would be good to decide whether blank lines ( .\n) should be used before or after bulleted lists, or between items. This is something existing packages are not at all consistent about. My feeling is that a blank line should be used before and after, but *not* in between items. signature.asc Description: Digital signature
Re: Proposal/Request for Comments: Formally extending package Descriptions to handle bulleted lists.
[Peter Samuelson] Hm. That brings up the minor point of whether N should ever be anything but (2 * nest_level). Or if you consider nest_level to be zero-based, (2 + 2 * nest_level). It occurs to me, though, that some might prefer the raw presentation look of (2 + 3 * nest_level). I might even prefer it myself. Specifying either of those, or leaving it open as it is now, would be fine with me. signature.asc Description: Digital signature