Andy Clark wrote:
> 
> First, I'd like to look at what's currently in the API and then
> discuss some points of design that I'd like to see in the
> serializers.
> 
> DOMSerializer: I'm sort of surprised that there are methods to
> serialize a Document, Element, and DocumentFragment but nothing
> for a generic Node. In fact, if you wanted to serialize a text
> node or entity reference, you would first have to remove or
> clone it into a DocumentFragment and serialize that. And it is
> impossible to serialize things like attributes outside of their
> container elements. Would it be enough to have the following
> method?
> 
>   public void serialize(Node node) throws IOException;

I tried to stick to the W3C model which defines a document or a document
fragment, so if you want to print just an element, I think it makes
sense to use a document fragment.

As for serializing Node that happens to be an Attribute, keep in mind
that we're trying to define an API used by a lot of serializers. The
question that should be raised is: would it be trivial for them to
support it? Would a PDF serializer support that?



> But I think that we could do without it altogether and just
> make it possible to register new methods with the serializer
> factory. But I'll get to that in a minute.

If there is an agreement on that, I'll just make Method (which is
designed to hold the default output method names, nothing more) part of
the helpers class or kill it. I think it makes sense for documentation
the common methods, see comments below, it's not essential for anything
to work.


> And the type of the method could be the mime type which would
> avoid the need of a set/getMediaType on the OutputFormat object.
> And if this thing is really representing the mime type, perhaps
> it should be called such instead of "Method". It would tie in
> better with existing standards.

XSLT defines an output method which has one of three names xml, html,
text or a qualified name for additional methods (like PDF, SVG, etc).It
then defines media-type as a separate value. I don't like it, but it's
part of the spec and the serializers have to support that for the sake
of XSLT processing.

To select a serializer you use the method name. Generally serializers do
not care about the media type, but if we have a Servlet getting an XSLT
response, it would probably want to use the media type as the content
type. This is why getOutputFormat() exists, to extract the output format
and determine the media type.

The default output formats (and more can be supported) are defined in
the helpers class, all of which provide values for both method and media
type. In addition, the factory allows one to get an output format
suitable for a given output method, so you can determine the media type.

Not the best design, I agree, but one which follows the XSLT specs.



> OutputFormat: It seems like a good idea to have a kind of
> properties object like OutputFormat. But it seems that the
> OutputFormat (and in fact the whole serializer API) is based
> on serializing to a text markup syntax. This sort of jumps
> the gun on what I'd like to say in general about the
> serialization API so I won't go any further at this point.
> Check out my comments below regarding this matter.

No, the serializer API does not assume markup, it was designed to
support PDF, JPEG, and other binary formats. An implementation should by
default support the three common text formats, but the API is designed
so other formats can be introduced as well.

Once again, if you read the XSLT spec it clearly defines xml, html and
text, does not define, but allows, other output methods. I followed the
same guidelines in coming up with this API.


> Serializer: I noticed that this design makes use of the SAX
> interfaces but not of the traversal APIs added with DOM Level
> 2. Is there a way that we could leverage those interfaces?

Would make sense to support traversal for the DOMSerializer.

What would be the API requirements for that (other than
serializer(iterator))?


> SerializerFactory: There's no way to dynamically register
> OutputMethods or Serializers. I think that there should be
> a way to do this.

By definition the SerializerFactory is one way - but not the only way -
of obtaining serializers. You can also construct them directly. So no
need to go overboard with over generalizing it.

For registering serializers, I actually had a method for it, but I had
to pull it off and rethink it, since it would work better if it
registers both a serializer and a default OutputFormat.

I would definitely like to see a registration mechanism in the final
API.


> And overall, I'm not sure if we'd be allowed to drop stuff
> into the org.xml package namespace. Arkin: have you checked
> on this? And will any of this be superceded by DOM Level 3?
> at least on the DOM serialization side, that is... Perhaps
> Arnaud or someone else on the W3C commitee can shed light
> on this.

We are not yet dropping anything. There are two proposals, the
Serializer API and the XSLT processing API (TRAX) which we are proposing
in a larger forum as a vendor-neutral API. We intend to use org.xml for
that, if we get permission for that. Until we get that, it's only
available as a proposal and not in the CVS.


> Okay, now I'd like to make a few comments about what I'd
> like to see in a serialization API. First, I don't strictly
> see serialization as an output to some text markup. As

+1

> such, I would like a split between binary and character
> serializers. Currently, there are both setOutputStream()
> and setWriter() methods on the Serializer objects. If
> possible, I'd like setOutputStream() only be on binary
> serializers and setWriter() be used on character serializers.

I don't see why the serializers should not support both. An output
stream is also used in many applications for character output (even if
we both agree they should use Writer). It should certainly be understood
that a GIF serializer only uses setOutputStream, but then, a PDF
serizlier to Base64 encoding could use either one.


> All of the current serializer implementations (XML, HTML,
> XHTML) would be character serializers and the OutputFormat
> object seems to go very well with this. On the binary side,
> however, I can see a situation where SVG gets serialized to
> a JPEG image. I realize that this overlaps XSL Formatting
> Objects, though. Perhaps a better example would be an XML
> serializer that outputs to WBXML.

SVG to JPEG is certainly something within the scope of serializers, just
like XML to PDF. Although I would advocate that XML support be added
directly to Acrobat Reader and rely on FO, instead of XML -> FO -> PDF,
some transformations like that make sense. With that in mind, the
serializer API is part of the XSLT API and should support outputing of
all such possible transformation.

XML, HTML, XHTML and Text are considered the default output methods,
which I assume every XSLT processor or even XML parser would like to
support.

There should not be a conflict with getting XML to DB. The Serializer
and OutputFormat objects are extensible, so they should allow you to add
additional properties, e.g.:

  DBSerializer   ser;
  DBOutputFormat format;

  ser = new DBSerializer();
  format = new DBOutputFormat();
  format.setSQLSyntax( "SQL92" );
  ser.setConnection( jdbc.getConnection() );
  ser.setTableName( "po" );
  ser.asDOMSerializer().serialize( doc );


If these were registered in the factory then:

  format = SerializerFactory.getOutputFormat( "wbxml:db" );
  ser = SerializerFactory.getSerializer( format );

arkin

> 
> Is anyone else thinking along these lines?
> 
> --
> Andy Clark * IBM, JTC - Silicon Valley * [EMAIL PROTECTED]

-- 
----------------------------------------------------------------------
Assaf Arkin                                           www.exoffice.com
CTO, Exoffice Technologies, Inc.                        www.exolab.org


Reply via email to