Andy Clark wrote: > > First, I'd like to look at what's currently in the API and then > discuss some points of design that I'd like to see in the > serializers. > > DOMSerializer: I'm sort of surprised that there are methods to > serialize a Document, Element, and DocumentFragment but nothing > for a generic Node. In fact, if you wanted to serialize a text > node or entity reference, you would first have to remove or > clone it into a DocumentFragment and serialize that. And it is > impossible to serialize things like attributes outside of their > container elements. Would it be enough to have the following > method? > > public void serialize(Node node) throws IOException;
I tried to stick to the W3C model which defines a document or a document fragment, so if you want to print just an element, I think it makes sense to use a document fragment. As for serializing Node that happens to be an Attribute, keep in mind that we're trying to define an API used by a lot of serializers. The question that should be raised is: would it be trivial for them to support it? Would a PDF serializer support that? > But I think that we could do without it altogether and just > make it possible to register new methods with the serializer > factory. But I'll get to that in a minute. If there is an agreement on that, I'll just make Method (which is designed to hold the default output method names, nothing more) part of the helpers class or kill it. I think it makes sense for documentation the common methods, see comments below, it's not essential for anything to work. > And the type of the method could be the mime type which would > avoid the need of a set/getMediaType on the OutputFormat object. > And if this thing is really representing the mime type, perhaps > it should be called such instead of "Method". It would tie in > better with existing standards. XSLT defines an output method which has one of three names xml, html, text or a qualified name for additional methods (like PDF, SVG, etc).It then defines media-type as a separate value. I don't like it, but it's part of the spec and the serializers have to support that for the sake of XSLT processing. To select a serializer you use the method name. Generally serializers do not care about the media type, but if we have a Servlet getting an XSLT response, it would probably want to use the media type as the content type. This is why getOutputFormat() exists, to extract the output format and determine the media type. The default output formats (and more can be supported) are defined in the helpers class, all of which provide values for both method and media type. In addition, the factory allows one to get an output format suitable for a given output method, so you can determine the media type. Not the best design, I agree, but one which follows the XSLT specs. > OutputFormat: It seems like a good idea to have a kind of > properties object like OutputFormat. But it seems that the > OutputFormat (and in fact the whole serializer API) is based > on serializing to a text markup syntax. This sort of jumps > the gun on what I'd like to say in general about the > serialization API so I won't go any further at this point. > Check out my comments below regarding this matter. No, the serializer API does not assume markup, it was designed to support PDF, JPEG, and other binary formats. An implementation should by default support the three common text formats, but the API is designed so other formats can be introduced as well. Once again, if you read the XSLT spec it clearly defines xml, html and text, does not define, but allows, other output methods. I followed the same guidelines in coming up with this API. > Serializer: I noticed that this design makes use of the SAX > interfaces but not of the traversal APIs added with DOM Level > 2. Is there a way that we could leverage those interfaces? Would make sense to support traversal for the DOMSerializer. What would be the API requirements for that (other than serializer(iterator))? > SerializerFactory: There's no way to dynamically register > OutputMethods or Serializers. I think that there should be > a way to do this. By definition the SerializerFactory is one way - but not the only way - of obtaining serializers. You can also construct them directly. So no need to go overboard with over generalizing it. For registering serializers, I actually had a method for it, but I had to pull it off and rethink it, since it would work better if it registers both a serializer and a default OutputFormat. I would definitely like to see a registration mechanism in the final API. > And overall, I'm not sure if we'd be allowed to drop stuff > into the org.xml package namespace. Arkin: have you checked > on this? And will any of this be superceded by DOM Level 3? > at least on the DOM serialization side, that is... Perhaps > Arnaud or someone else on the W3C commitee can shed light > on this. We are not yet dropping anything. There are two proposals, the Serializer API and the XSLT processing API (TRAX) which we are proposing in a larger forum as a vendor-neutral API. We intend to use org.xml for that, if we get permission for that. Until we get that, it's only available as a proposal and not in the CVS. > Okay, now I'd like to make a few comments about what I'd > like to see in a serialization API. First, I don't strictly > see serialization as an output to some text markup. As +1 > such, I would like a split between binary and character > serializers. Currently, there are both setOutputStream() > and setWriter() methods on the Serializer objects. If > possible, I'd like setOutputStream() only be on binary > serializers and setWriter() be used on character serializers. I don't see why the serializers should not support both. An output stream is also used in many applications for character output (even if we both agree they should use Writer). It should certainly be understood that a GIF serializer only uses setOutputStream, but then, a PDF serizlier to Base64 encoding could use either one. > All of the current serializer implementations (XML, HTML, > XHTML) would be character serializers and the OutputFormat > object seems to go very well with this. On the binary side, > however, I can see a situation where SVG gets serialized to > a JPEG image. I realize that this overlaps XSL Formatting > Objects, though. Perhaps a better example would be an XML > serializer that outputs to WBXML. SVG to JPEG is certainly something within the scope of serializers, just like XML to PDF. Although I would advocate that XML support be added directly to Acrobat Reader and rely on FO, instead of XML -> FO -> PDF, some transformations like that make sense. With that in mind, the serializer API is part of the XSLT API and should support outputing of all such possible transformation. XML, HTML, XHTML and Text are considered the default output methods, which I assume every XSLT processor or even XML parser would like to support. There should not be a conflict with getting XML to DB. The Serializer and OutputFormat objects are extensible, so they should allow you to add additional properties, e.g.: DBSerializer ser; DBOutputFormat format; ser = new DBSerializer(); format = new DBOutputFormat(); format.setSQLSyntax( "SQL92" ); ser.setConnection( jdbc.getConnection() ); ser.setTableName( "po" ); ser.asDOMSerializer().serialize( doc ); If these were registered in the factory then: format = SerializerFactory.getOutputFormat( "wbxml:db" ); ser = SerializerFactory.getSerializer( format ); arkin > > Is anyone else thinking along these lines? > > -- > Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED] -- ---------------------------------------------------------------------- Assaf Arkin www.exoffice.com CTO, Exoffice Technologies, Inc. www.exolab.org