Re: Round-tripping again

Libor Kramolis Thu, 19 Sep 2002 01:00:55 -0700

I would appreciate round-tripping support in Xerces. It is really 
necessary for XML editors/tools -- broken user indentation is annoying.


+1

Regards,
Libor


Alex Rosen wrote:
> A few weeks ago I e-mailed this list, asking about adding round-tripping
> support to Xerces - i.e. the ability to output the exact same XML file as
> was read in, or at least very close to it. In other words, preserving more
> of the non-infoset information that normally gets dropped.
> 
> I spent some time working on this, and have a prototype done, which uses
> Augmentations to pass in more information about the "raw text" of the
> original document than Xerces normally gives. An example is the amount of
> whitespace between attributes. Saving this extra information (and using it
> on output) means that if the user puts each attribute on its own line, that
> will be preserved on output, instead of collapsing them back onto one line.
> These sorts of modifications are semantically equivalent, but it really
> annoys users when you reformat their document out from under them.
> 
> The particular project that needs this is a dom4j project, so I also created
> a special dom4j reader that takes this extra information that's given by the
> parser and stores it in each dom4j node it creates, and a writer that uses
> this saved information to write out a more accurate version of the output
> document. (This could easily be extended to DOM and JDOM.) I've attached an
> example. Sample.xml is the source file, rt-output.xml is the output using
> the new round-trip-enabled Xerces/dom4j code, and the other two are the
> output using standard Xerces/dom4j (in both standard and pretty-printing
> modes). Not everything is identical, but it's much, much better.
> 
> I think it would be nice if this feature were added to Xerces. I think it
> fulfills a significant need, and I don't think it adds any overhead when
> it's not turned on, and probably minimal overhead with it turned on. It
> currently doesn't cover many of the less-used areas of XML (notations, etc.)
> but I think it does a very good job of covering the common cases.
> 
> There also happened to be a similar thread going on at the same time as my
> original post, that I'd like to respond to:
> 
> http://marc.theaimsgroup.com/?l=xerces-j-dev&m=103029884901546&w=2
> 
> 
>>I can understand the cases in which people would like to
>>be able to do this but I also realize what it would take
>>to implement it. ;)
> 
> 
> I don't the the implementation is too bad. It's not trivial, but not
> unreasonably complex, I don't think.
> 
> 
>>The "limited usefulness" that I was referring to was the
>>fact that reporting character offsets only works if the
>>parsed source is already a character stream. If it's
>>anything else (say a byte stream in UTF8 or Shift_JIS)
>>then the application can't map those offsets back to the
>>source without re-reading the file.
> 
> 
> But there's *always* a character stream (Reader). Xerces creates one if it's
> not handed one. The easy way is to have Xerces send the actual text along to
> the user. (The other way is to have the user override createReader() to get
> his hands on the relevent character stream, which turns out to be a little
> ugly, but works fine.) Thus it's always applicable, even when you hand
> Xerces an InputStream. And I think it would be useful to a significant
> number of users.
> 
> So is there any chance of this modification making it in to Xerces? I'd be
> happy to send a patch once it's cleaned up a bit.
> 
> Thanks,
> Alex
> 

-- 
Libor Kramolis, Software Engineer      | <[EMAIL PROTECTED]>
NetBeans/Sun Microsystems, XML Project | http://xml.netbeans.org/


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Round-tripping again

Reply via email to