The pretty-print feature is guaranteed to add whitespace, but when you re-import that 
output, care must be taken to deal with the extra whitespace you've added.  What I did 
for a while was not to pretty-print my output, but use a small Perl script that I 
wrote pretty-print if I needed to take a look.

use strict;

# Static definitions.
my $PAT_TOKEN = '\w+:[-\w]+';
my $PAT_ATTR = '[-\w]+\s*=\s*"@todo"';

# Read command line arguments.
my ($fn) = @ARGV;
die "No input file specified.  Stopping" unless $fn;
die "Error opening file $fn.  Stopping" unless open(FH, "<$fn");

my $indent = 0;
my $carry = '';
while (my $line = <FH>) {
        $line = $carry . $line;
        $carry = '';
        while ($line) {
                my ($tag, $extra) = split '>', $line, 2;
                $carry = $line and last unless $extra;
                $line = $extra;
                if ($tag =~ m|^<|) {
                        if ($tag =~ m|^</|) {
                                --$indent;
                        }

                        # Ensure encoding lies on first line of document.
                        if ($tag !~ m|^<\?|) {
                                indent($indent);
                        }

                        print "$tag>";

                        if ($tag =~ m|^<\w|) {
                                if ($tag !~ m|/$|) {
                                        ++$indent;
                                }
                        }
                }
                else {
                        print "$tag>";

                        if ($tag =~ m|</|) {
                                --$indent;
                        }
                }
        }
} 

print "\n$carry\n";

close(FH);

exit;

Adam Heinz
Senior Software Developer
Exstream Software

-----Original Message-----
From: T MacAdam [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 05, 2004 9:27 PM
To: [EMAIL PROTECTED]
Subject: Re: Still no luck with blank lines around CDATA sections
(easily repeatable!)

I did some debugging tonight and it is as you
suspected, caused by the changes centering around
revision 1.44.  Also, as Adam suggested, I could
revert to the old behavior by setting the
XMLUni::fgDOMWRTWhitespaceInElementContent feature to
false, except that Xerces will not allow me to set
that feature to false due to the following (line 284
of DOMWriterImpl.cpp):

static const bool  featuresSupported[] = {
    false, true,  // canonical-form
    true,  true,  // discard-default-content
    true,  true,  // entity
    true,  true,  // format-pretty-print
    false, true,  // normalize-characters
    true,  true,  // split-cdata-sections
    false, true,  // validation
    true,  false, // whitespace-in-element-content
    true,  true,   // byte-order-mark
    true,  true   // xml-declaration
};

It looks like support for setting that feature to
false is disabled.  Does anyone know why?

Also, I think I also understand why files take on more
and more blank lines the more you read and write them.
 It seems that if you use the "format-pretty-print"
and "whitespace-in-element-content" together, then the
following happens: 

1. The DOMWriter writes out two whitespace only TEXT
nodes before and after the CDATA section due to the
fact that the "whitespace-in-element-content" feature
is set.

2. The "format-pretty-print" feature causes blank
lines to be written before and after the CDATA section
when the CDATA itself is being written.

3. Next time the XML file is read, the two
whitespace-only nodes before and after the CDATA now
contain the extra blank lines added in step 2.  On the
next write, the process starts at step 1 again, writes
the (now longer) TEXT nodes, then adds more blank
lines   before and after the CDATA in step 2, and the
TEXT nodes are even larger when read in next time (and
so on...).  

I don't know if that classes as a bug, but it's
probably not desirable to anyone.  I guess the thing
is just don't use those two features together...?

Tom.




--- Gareth Reakes <[EMAIL PROTECTED]>
wrote:

> > Hey,
> 
> 
> > On the other hand, there are at
> > least two people who would rather DOMWriter didn't
> insert blank lines.
> > Here's what I'd suggest: try hacking it up as I
> suggested.  If you like
> > the results, file a bug, and one of us can attach
> a patch.  Further
> > discussion can be part of the bug for posterity.
> >
> 
> 
> Sounds good to me. I don't have a chance to look
> now, but my gut 
> feeling is that this "introduced" it
> 
> revision 1.44
> date: 2003/11/24 11:10:58;  author: gareth;  state:
> Exp;  lines: +6 -3
> Fix for bug 22917. Patch by Adam Heinz .
> 
> so we better make sure that it does not break
> whatever that bug is.
> 
> 
> Gareth
> 
> 
> --
> Gareth Reakes, Managing Director      Parthenon
> Computing
> +44-1865-811184                 
> http://www.parthcomp.com
> 
> 
>
---------------------------------------------------------------------
> To unsubscribe, e-mail:
> [EMAIL PROTECTED]
> For additional commands, e-mail:
> [EMAIL PROTECTED]
> 
> 



                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - Send 10MB messages!
http://promotions.yahoo.com/new_mail 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to