I frequently write docbook documents colloratively with other authors. 
So that each of us can be working on the document in parallel, we use a 
version control system (CVS to be specific) to manage the document.

This works great and because Docbook/XML is an ASCII-based format the 
version control system is able to resolve most changes/merges with no 
problem.

What is a problem though is that with most Docbook editors, changing a 
single word in a paragraph causes a very large "diff" in the file.  This 
makes it difficult to find what exactly changed in the document from 
version to version.

For example, a single spelling change in a large paragraph could cause 
the entire paragraph to be replaced in the new version.  This can make 
tracing through document histories VERY painful and frustrating.

As and example take the following document:

<sect1>
    <title>Section 1</title>
    <para>This is the first paragraph of text. CHANGE THIS. There
    is more text to follow, but this should show the problem fine.
    If only the save option wrote this paragraph out better.</para>
</sect1>

Now imagine that I wanted to change the section of text at "CHANGE 
THIS".  Then it would look like:

<sect1>
    <title>Section 1</title>
    <para>This is the first paragraph of text. Diff. There is more
    text to follow, but this should show the problem fine. If only
    the save option wrote this paragraph out better.</para>
</sect1>

The problem is that by changing 2 words I have created a file diff for 
every line of the paragraph (because words were shifted to new lines).

XXE already supports a save option that allows for a max line length, 
but this doesn't really solve the problem.  Although it can help 
minimize it so that only the lines on or after the changed word are 
considered different.

One "simple" solution may be to add an option to the save options that 
creates new lines at end-of-sentence punctuation marks.  This could 
probably be implemented in the same area that the current max line 
length code is at.

So, with this option the example would look like:

<sect1>
    <title>Section 1</title>
    <para>This is the first paragraph of text.
    CHANGE THIS.
    There is more text to follow, but this should show the problem fine.
    If only the save option wrote this paragraph out better.
    </para>
</sect1>

This a single change would look like:

<sect1>
    <title>Section 1</title>
    <para>This is the first paragraph of text.
    Diff.
    There is more text to follow, but this should show the problem fine.
    If only the save option wrote this paragraph out better.</para>
</sect1>

The diff in this case is much better because it only shows the single 
line that changed instead of showing the entire paragraph.

Any opinions on this feature? Other ideas?  Would it be difficult to 
implement?

If my description is unclear, please ask questions and I will try to 
clarify the problem.

Thanks,
Allen




Reply via email to