Re: DOCBOOK: On the size of DocBook...
[this is a long posting -- I didn't have time to make it more concise] Norman Walsh [EMAIL PROTECTED] writes: The recent thread about DocBook and LaTeX raised the issue of the size of DocBook (measured as the number of elements). (It's not the first thread to raise the issue, just the most recent.) [...] Whenever I think of adding new elements to DocBook, I think about these content models and wonder if it's really worth it. Now, in a sense, this is completely unfair. It's quite possible that the proposed element is just as valuable, to someone certainly and to everyone maybe, as, say errorcode. The fact that errorcode got there first doesn't seem like a very satisfying criteria on which to choose between them. I absolutely agree with this. I don't think the there are already too many elements argument should prevent the TC from giving very careful and objective consideration to adding elements that really should be in there. Take the proposed url element for example. I'm sure not everybody would agree that should be added, but I think Elliotte Rusty Harold has stated some good reasons for adding it -- and adding it as a new element, not as a new class value on another element. aside [I know I've already said this, but I'll take the opportunity to trot the hobbyhouse back out...] I think we also need to be careful about trying to solve the there are already too many elements issue just by adding new class values on existing elements -- systemitem or whatever -- rather than adding them as new elements. It seems like adding new class values increases the complexity of the DTD just as much, but does it in a way that obscures the complexity more. What I mean is, when that's done, it's still adding to the overall number of logical units or semantic components in the DTD. But it's just adding them in a way that makes those logical units: * less intuitive to users * less versatile (you can't sub-class attributes) /aside [...] A few things occur to me. 1. The difference between 400 elements and 800 elements isn't significant, just add 'em all. Sort of a straw man, I think :-) 2. 400 is just too many, we need to make DocBook smaller. A straw man with a little less straw? Given the backward compatibility issues and user-community needs, this seems like the least-likely-to-happen solution -- and maybe the least desirable. 3. Some sort of pizza cutter a la TEI could be invented to allow selection of just the right elements. (But what will that do to interchange?!) 4. Refactoring the parameter entity structure in a more satisfying way might make it easier to customize which would offer some sort of a compromise between 1 and 3. Definitely not straw men. I think we ought to consider these carefully. (For anybody who doesn't know what the TEI pizza cutter is: basically, it's a sort of configurator that lets you choose sets of elements that you want to include or not include in the DTD you use for authoring your documents, and then generates a custom DTD that includes just the element sets you want and excludes the rest.) First, I think implementing 3 might actually require that we do 4. I'm not sure a really useful pizza cutter would even be practical with the current parameter entity organization, at least not the parameter entity organization at the information-pool level. I think TEI was actually designed around the specific requirement to include/exclude element-sets at the information-pool level, and DocBook wasn't nearly as much. But all that said, I wonder whether that kind of parameter-entity reorg is possible and/or prudent. There's a paragraph in Eve Maler and Jeanne El Andaloussi's Developing SGML DTDs that reads: Some DTD implementors choose to store declarations for individual element types (particularly those in the information pool) in separate modules, building up a so-called element library that can be recombined in different ways for different DTDs. However, in our experience the complex interdependencies between information pool elements are easier to understand and maintain if the entire information pool is stored in a single module, with marked sections used to modularize individual element types. Anyway, about the question at the end of number 3 above -- But what will that do to interchange? -- It seems like interchange isn't an issue if * the customized DTDs are strict subsets of the complete DTD * and users/user communities treat their customized DTDs as authoring DTDs and continue to use the full DTD for validation (that is, don't expect that DTDs that others interchange with their community will validate against their custom authoring-DTD subset) Which makes me think of another possibility to add -- something that's sort of already been discussed on this list: 5. The DocBook TC, with suggestions/feedback from the various DocBook users communities, produces a set of standard off-the-shelf strict-subset
Re: DOCBOOK: On the size of DocBook...
On Thu, Sep 05, 2002 at 09:39:25AM -0400, Norman Walsh wrote: 3. Some sort of pizza cutter a la TEI could be invented to allow selection of just the right elements. (But what will that do to interchange?!) People tempted by this approach may want to get a look on the dtd-customizer specs at http://savannah.gnu.org/projects/dtd-cust/. Note there are newer ideas in the July slides than in the specs, which I still have to update. The idea is to allow people to define DTD subsets (or any variant for those wanting that) in a simple and intuitive way. That should allow current dtd-aware authoring tools to provide a more accurate environement, while still allowing to use standard stylesheets. -- Yann Dirson [EMAIL PROTECTED] http://www.alcove.com/ Technical support managerResponsable de l'assistance technique Senior Free-Software Consultant Consultant senior en Logiciels Libres Debian developer ([EMAIL PROTECTED])Développeur Debian
Re: DOCBOOK: On the size of DocBook...
At 12:42 06/09/2002, Michael Smith wrote: Anyway, about the question at the end of number 3 above -- But what will that do to interchange? -- It seems like interchange isn't an issue if * the customized DTDs are strict subsets of the complete DTD * and users/user communities treat their customized DTDs as authoring DTDs and continue to use the full DTD for validation (that is, don't expect that DTDs that others interchange with their community will validate against their custom authoring-DTD subset) I had an 'ah ha' moment at xml-extreme this year. People don't give a ... about markup validity. Its our XML tools that do. The 'authoring' environment vs the 'interchange' environment? Hence Michaels point is quite valid. If they are all good subsets then we shouldn't see that problem. One of the values of having a set of standard strict-subset authoring DTDs is that would be carefully considered by the TC, potentially a lot more carefully than possibly-not-compatible-with-one-antoher ad-hoc custom authoring DTDs that users from the same community might end up creating and propagating and using. What impact might that have on the stylesheets Norm? Divergent sets of stylesheets for pizza slices? What I mean is, I think maybe there are some identifiable DocBook user sub-communities within which users have the same basic markup needs -- their needs within their community are not that radically different from one another. If the TC doesn't produce a subset that meets their needs, and that community is not well-organized enough to produce a suitable custom authoring DTD on its own, we risk having individual users within those communities producing conflicting, sub-optimal customizations. IMO it's the combination of dtd and stylesheets that make it what it is. One without the other would be a minor nicety. My experience is that users and user communities -- especially those that might be considered casual document authors (for example, individual open-source developers who write docs for their own applications) really, really, don't like to be told, DocBook is highly customizable -- go ahead and customize it to meet your needs. It seems like what they want typically want instead is something that just works right off the shelf. Bottom line its just too hard unless you've been there before? Time could be better spent elsewhere. That's it for now. But I really hope we can continue the discussion about this and maybe arrive at some resolutions. Picking up Pauls point, user demand is for 'less necessity' for customisation, i.e. easier out of the box usage. Less tags in a vertical slice of pizza, i.e. still valid to BBdocbook (big brother), but 'appropriate' to my niche? The stylesheets? I'd leave that to Norm. I have a nasty feeling they *could* ride such a divide? regards DaveP
Re: DOCBOOK: On the size of DocBook...
At 21:32 2002 09 05 -0400, ed nixon wrote: Paul Grosso wrote: At 15:36 2002 09 05 -0400, ed nixon wrote: Paul Grosso wrote: snip/ A big problem for me is that I still have not seen a satisfactory explanation of the user requirement(s) that is(are) driving this discussion. You are right, of course. I think the people who read this list regularly become aculturated to the atmosphere of an exchange. For example, the implicit, assumed goal that I saw immerging is two-fold: 1. significantly accelerate the learning curve of new and inexperienced users of the DocBook schema I understand the desire to reduce the learning curve, but I'm not convinced reducing the size of the DTD does that. Reducing the number of tags in the DTD to N doesn't make it any easier for a user to learn the M N tags s/he wants to use. You can make it easier to learn by just learning the ones you need, you don't have to reduce the number in the DTD. 2. further reduce the support overhead of this and the APPS list significantly for a certain class of question by simplifying and/or compartmentalizing. I detected that in the wind over the past months; I assume there are others who have developed the same impression. Perhaps not. Most people ask what tag do I use to do X? I don't see how removing tags so that X can no longer be done reduces the overhead. snip/ What user requirements do bolting or unbolting components of a per application basis address? Are there significant and identifiable genres of DocBook application? For example, is there a significant delimitable difference between the markup required for software versus hardware documentation? Are there other types of publication that lend themselves to a segmentation exercise of some sort? I believe splitting up the DocBook app into genres just adds yet another complication to learning it. (Which genre do I learn, and what if it turns out the way the DocBook committee split things into genres doesn't work for my app?) Personally, I see three disadvantages of that: 1. someone has to do the bolting for a given application; 2. the tools have to support bolting/unbolting; 3. as soon as you bolt together one setup, someone is going to want/use/expect a tag you didn't bolt in. Yes. And that's what I meant by the difficult challenge of what to do, how and when. But the DocBook direction has always been toward customization. Is there an easier way, or a more generic way of doing it? Or, put in terms of user requirements, I see your suggestion accrues negative rather than positive points in the corresponding three (plus) user requirement areas: 1. I can use the off-the-shelf DocBook application with no extra work. Could you please list them for me? I gather Epic makes the claim and there are one, perhaps two of the under $100 editors. Are there a significant number of others? Using current, XML versions of DocBook? I use XMetaL and it is, unfortunately, a fair amount of work just getting something into the editing area that looks half-decent. Getting to a really smooth and robust editing environment for a MSWord convert is a tremendous amount of work. I'm saying the statement #1 above is a user requirement (or perhaps user goal would be more accurate). I'm not saying it is necessarily currently a true statement. I have to admit I haven't done a survey lately, but I do believe there are tools out there. Even if you're just talking about using XSLT, you can point your XSLT processor to the DTD and stylesheets on the web right now without touching them. If you require bolting, you can no longer do that. 2. I can find lots of tools that handle my application with little or no configuration. Lots? Same as above? Same as above. 3. I can expect all of DocBook to be available; I can use TDG as a reference with no surprises; All of DocBook and the general, newbee, somewhat doubtful about this new markup and controlled editing thing freezes in his tracks, confounded by a wealth of (to him) meaningless choice. No surprises? This is clearly not the case, Paul, otherwise there would be significantly less traffic on this list. Every day there are surprises and inconsistencies because it is all a living, breathing, evolving system. There would be more surprises if someone found a tag in TDG that they thought did what they want and they tried to use it but are told that tag doesn't exist. Right now, the list never hears about those that used TDG, found a tag, tried it and liked it. Once you have different DTDs, people are going to find that DocBook isn't just DocBook, TDG doesn't really document their application, and someone on the list will say you should use to FOO tag but the user will then find their genre doesn't include the FOO tag, and we will have a much bigger mess. paul
Re: DOCBOOK: On the size of DocBook...
At 20:42 2002 09 06 +0900, Michael Smith wrote: Anyway, about the question at the end of number 3 above -- But what will that do to interchange? -- It seems like interchange isn't an issue if * the customized DTDs are strict subsets of the complete DTD * and users/user communities treat their customized DTDs as authoring DTDs and continue to use the full DTD for validation (that is, don't expect that DTDs that others interchange with their community will validate against their custom authoring-DTD subset) But that leads me to conclude you don't really want to change/subset the DTD, you just want some way to reduce the set a given author has to understand/work with. And I don't see that requirement as being addressed at the DTD level, I see it being addressed at the tool level and/or document/education level. paul
Re: DOCBOOK: On the size of DocBook...
At 09:39 2002 09 05 -0400, Norman Walsh wrote: The recent thread about DocBook and LaTeX raised the issue of the size of DocBook (measured as the number of elements). (It's not the first thread to raise the issue, just the most recent.) Certainly one of the complaints that new users make about DocBook is that it's too big. Yep, it's big. And I'm a minimalist at heart, and I share the concern about adding elements to DocBook. But just what are the user requirements in this issue? In what way does having 400 (or 800) elements in the DTD affect the end user whose document contains 10 different elements? A few things occur to me. 1. The difference between 400 elements and 800 elements isn't significant, just add 'em all. 2. 400 is just too many, we need to make DocBook smaller. 3. Some sort of pizza cutter a la TEI could be invented to allow selection of just the right elements. (But what will that do to interchange?!) 4. Refactoring the parameter entity structure in a more satisfying way might make it easier to customize which would offer some sort of a compromise between 1 and 3. Any thoughts? As far as I see it, 4 shares interchange/interoperability issues with 3. Either your application handles an element or it doesn't. I don't see how point 4 reduces any of the effort associated with creating DocBook aware tools or maintaining the DocBook application. Again, just what are we trying to accomplish? Only point 2 will make a dent on the effort to produce tools and maintain the application. And I don't see that any of the points make a dent on the end user experience. If users are saying when I go to enter a tag, my tool shows me hundreds of possibilities and that overwhelms me, then my answer is to fix this problem at the tool level. For example, the tool should provide a way for the user (or a site administrator) to configure things so that only the tags a user expects to use are shown in the tag choosing panel. The only other effect of size is performance. And I suggest that any attempt to save milliseconds in performance is going to be overshadowed by the hours spent in interoperability problems inherent in approaches 3 and 4 above. So I don't have a particularly satisfying response. I think we should try to avoid adding elements when there is no strong reason, but if we feel a new element is important to a non-trivial population and it is within the scope of the DocBook application's purpose, we can add it. If you're trying to come up with some automatic, unbiased way to decide if we add a new element or not, I don't have a better answer than our current process--the TC gets to decide. If you don't like that, join the TC. paul
RE: DOCBOOK: On the size of DocBook...
Just my 2c here - spit on it and throw it in the gutter as you see fit: Taking Visio as an example of an approach that would work well for me, when creating a new document the user doesn't get presented with a window containing all the symbols available. Instead, they pull up the relevant window (or windows) containing the set (or sets) they need for a particular type of design task. If they need a symbol that isn't provided by the group of symbols windows they currently have open, they pull up the relevant window and use whatever they need. Dividing up the tags into (hopefully) logical subsets appropriate for a particular purpose would IMHO make life a lot easier. The problems arise when trying to decide what those subsets should be and what they should contain :) I know that my productivity is degraded when I have to frequently scroll down through a long list that contains more than I need in order to find what I'm looking for. And I'm only using Simplified DocBook... Peter Again, just what are we trying to accomplish? Only point 2 will make a dent on the effort to produce tools and maintain the application. And I don't see that any of the points make a dent on the end user experience. If users are saying when I go to enter a tag, my tool shows me hundreds of possibilities and that overwhelms me, then my answer is to fix this problem at the tool level. ?Or at the documentation level? My tools don't tell me what those words mean? tdg does. For example, the tool should provide a way for the user (or a site administrator) to configure things so that only the tags a user expects to use are shown in the tag choosing panel. Except for when I do that odd job that needs another set? The only other effect of size is performance. And I suggest that any attempt to save milliseconds in performance is going to be overshadowed by the hours spent in interoperability problems inherent in approaches 3 and 4 above. Sorry Paul, I don't see that. Its my head that can't handle it, not the tools. Hence the interop issue is a non starter for me. So I don't have a particularly satisfying response. I think we should try to avoid adding elements when there is no strong reason, but if we feel a new element is important to a non-trivial population and it is within the scope of the DocBook application's purpose, we can add it. ?Status quo? Seems to me that's how you operate now (TC that is) Regards DaveP
Re: DOCBOOK: On the size of DocBook...
Paul Grosso wrote: At 19:01 2002 09 05 +0100, Dave Pawson wrote: snip/ ?Status quo? Seems to me that's how you operate now (TC that is) Yes, that's what I'm suggesting. Isn't that a little like: Let's discuss this issue by not discussing it; we'll solve it by sweeping it from under our carpet, across the room and under the user's carpet. Facetiousness aside with apologies to Paul: - it's highly unlikely all tools being used currently can (easily) be configured in the way you suggest (and it *is* a good idea for tools that work that way, although it raises the cost of implementing DocBook even further in any particular context.) - this approach ignores the possibile (shall I say probable?) benefits of working through some house cleaning or reorganizing or refactoring of DocBook that might benefit everyone -- users, Technical Committee, volunteer support folks, and stylesheet developer / maintainers. Easily bolting on or unbolting componets on a per application basis is an idea that has tremendous appeal, at least to me. On the other hand, this approach *does* circumvent the inevitable difficulties and complexities of actually deciding what to do, how and when. This is a key factor when I consider the amount of time and energy donated to the cause by the DocBook core team. It's a complex short-term-pain / long-term-gain question. Regards. ...edN
Re: DOCBOOK: On the size of DocBook...
At 15:36 2002 09 05 -0400, ed nixon wrote: Paul Grosso wrote: At 19:01 2002 09 05 +0100, Dave Pawson wrote: snip/ ?Status quo? Seems to me that's how you operate now (TC that is) Yes, that's what I'm suggesting. Isn't that a little like: Let's discuss this issue by not discussing it; we'll solve it by sweeping it from under our carpet, across the room and under the user's carpet. No, I'm saying that my opinion in this discussion is that we may have already found the best cost/benefit tradeoff. But it's a reasonable discussion to have. A big problem for me is that I still have not seen a satisfactory explanation of the user requirement(s) that is(are) driving this discussion. Facetiousness aside with apologies to Paul: - it's highly unlikely all tools being used currently can (easily) be configured in the way you suggest (and it *is* a good idea for tools that work that way, although it raises the cost of implementing DocBook even further in any particular context.) - this approach ignores the possibile (shall I say probable?) benefits of working through some house cleaning or reorganizing or refactoring of DocBook that might benefit everyone -- users, Technical Committee, volunteer support folks, and stylesheet developer / maintainers. Easily bolting on or unbolting componets on a per application basis is an idea that has tremendous appeal, at least to me. What user requirements do bolting or unbolting components of a per application basis address? Personally, I see three disadvantages of that: 1. someone has to do the bolting for a given application; 2. the tools have to support bolting/unbolting; 3. as soon as you bolt together one setup, someone is going to want/use/expect a tag you didn't bolt in. Or, put in terms of user requirements, I see your suggestion accrues negative rather than positive points in the corresponding three (plus) user requirement areas: 1. I can use the off-the-shelf DocBook application with no extra work. 2. I can find lots of tools that handle my application with little or no configuration. 3. I can expect all of DocBook to be available; I can use TDG as a reference with no surprises; I can transfer my knowledge gained using other DocBook applications to this one; all I have to tell someone else is use DocBook and I'll be able to interchange with them. paul
Re: DOCBOOK: On the size of DocBook...
Paul Grosso wrote: At 15:36 2002 09 05 -0400, ed nixon wrote: Paul Grosso wrote: snip/ A big problem for me is that I still have not seen a satisfactory explanation of the user requirement(s) that is(are) driving this discussion. You are right, of course. I think the people who read this list regularly become aculturated to the atmosphere of an exchange. For example, the implicit, assumed goal that I saw immerging is two-fold: 1. significantly accelerate the learning curve of new and inexperienced users of the DocBook schema 2. further reduce the support overhead of this and the APPS list significantly for a certain class of question by simplifying and/or compartmentalizing. I detected that in the wind over the past months; I assume there are others who have developed the same impression. Perhaps not. snip/ What user requirements do bolting or unbolting components of a per application basis address? Are there significant and identifiable genres of DocBook application? For example, is there a significant delimitable difference between the markup required for software versus hardware documentation? Are there other types of publication that lend themselves to a segmentation exercise of some sort? I'm by no means as up on the DTD as I should be so I can empathize with the reader who, for example, can't figure out the rationale of the modularization stucture of DocBook and, therefore, doesn't really know where to begin when it comes to cutting out the chaff or adding some special stuff. Is it purely, abstractly structural -- block elements, inline elements, etc.? Or is it based on some sort of other semantic categorization? Another aspect: I'm vaguely aware that DocBook will validate at a significant number of levels and for a significant number of publication components, but how or why this happens is not clear to me. Pulling them all together is another question. Would refactoring the modules facilitate my understanding? I don't know. For people up to their arm pits on a daily basis with DocBook these musings probably sound ignorant. They are. But those people, I submit, are a small minority, certainly a small minority of the potential user base. Others, like myself, have to *look* for opportunities to use DocBook, in the midst of a whole range of other demands, tools and approaches to getting something down on paper or up on the web. Personally, I see three disadvantages of that: 1. someone has to do the bolting for a given application; 2. the tools have to support bolting/unbolting; 3. as soon as you bolt together one setup, someone is going to want/use/expect a tag you didn't bolt in. Yes. And that's what I meant by the difficult challenge of what to do, how and when. But the DocBook direction has always been toward customization. Is there an easier way, or a more generic way of doing it? Or, put in terms of user requirements, I see your suggestion accrues negative rather than positive points in the corresponding three (plus) user requirement areas: 1. I can use the off-the-shelf DocBook application with no extra work. Could you please list them for me? I gather Epic makes the claim and there are one, perhaps two of the under $100 editors. Are there a significant number of others? Using current, XML versions of DocBook? I use XMetaL and it is, unfortunately, a fair amount of work just getting something into the editing area that looks half-decent. Getting to a really smooth and robust editing environment for a MSWord convert is a tremendous amount of work. 2. I can find lots of tools that handle my application with little or no configuration. Lots? Same as above? 3. I can expect all of DocBook to be available; I can use TDG as a reference with no surprises; All of DocBook and the general, newbee, somewhat doubtful about this new markup and controlled editing thing freezes in his tracks, confounded by a wealth of (to him) meaningless choice. No surprises? This is clearly not the case, Paul, otherwise there would be significantly less traffic on this list. Every day there are surprises and inconsistencies because it is all a living, breathing, evolving system. Perhaps what's needed is some sort of map of the terrain that outlines some recommended ways of getting up the curve on the DTD and on the stylesheets and then... Cheers. ...edN
Re: DOCBOOK: On the size of DocBook...
At 19:55 05/09/2002, Paul Grosso wrote: The tool is merely subsetting the list of tags it shows the user when the user goes to a menu of tags I can insert here. But it's still valid to insert (or have) any tag in the full DTD, and you can always click the button on the tool that says show me all tags instead of just the most used subset. (For example, Epic Editor supports this kind of thing.) As per the Microsoft 'learning' pull downs? I don't use it, so its not on the pull down list? I need to hit the 'expand' arrow to see the full list? The only other effect of size is performance. And I suggest that any attempt to save milliseconds in performance is going to be overshadowed by the hours spent in interoperability problems inherent in approaches 3 and 4 above. Sorry Paul, I don't see that. Its my head that can't handle it, not the tools. Hence the interop issue is a non starter for me. Suppose your head can only remember the simplified subset. So what if your doctype declaration points to the full DTD, all your head has to handle is the set you're using. In what way does having lots of tags in the DTD that you are never going to learn or use give your head a problem? If you haven't learned about them and don't use them, how can the fact that they're in the DTD bother your head--your head doesn't even know they are there. I think we are suggesting the same thing from a different angle. emacs offers me 100 items at C-c C-e. I scan for what I know, ignoring those I don't. I still need them there for the 'odd jobs' so want them to be there (for the tools and interop). regards DaveP