Re: Framework for StAX-based model loading

Jim Marino Tue, 21 Mar 2006 19:01:16 -0800

I have an additional question. If I have a custom complex type, sayFoo (:-)) what steps do I need to take to have that bound into acomponent property for the StAX and SDO approaches e.g.:


public class Foo{


    private String bar;

    public void setBar(String val){
        bar = val;
    }

    public String getBar(){
        return bar;
    }

    private Foo foo;

    public void setFoo( Foo val){
        foo = val;
    }

    public Foo getFoo(){
        return foo;
    }

}


Jim


On Mar 21, 2006, at 6:22 PM, Jean-Sebastien Delfino wrote:

Frank Budinsky wrote:
I think Jeremy's point that this might be an example of using the"wrong hammer on the screw" depends on how structured (vs. open)the physical model actually is. More generally than just whetherSDO is the right approach for this particular application, I thinkit's a question of whether it's worth using any Java bindingtechnology on a model that is full of open and mixed content - andeven worse, the open content is not described by a schema (i.e.,lax). Whether it's SDO, JAXB, or some other XML bindingtechnology, the programming model that results is more DOM-likethen a nice static model that we'd like. In the SDO case you endup doing a lot of DOM-like access using Sequence's (accessing"mixed", "any", and "anyAttribute" properties) - hardly a cleanand beautiful Java binding. I'm not sure if the threshold of wherethe physical model is too loosely defined to map to a clean Javabinding has been crossed or not in this case, but it looks closeanyway. If it wasn't, we wouldn't need a logical model.
I think the other problems that Jeremy points out, need to getfixed anyway and SDO should be a competitive Java bindingtechnology, as well as all the other things.
Frank.




Jeremy Boynes <[EMAIL PROTECTED]> 03/21/2006 02:44 PM
Please respond to
tuscany-dev


To
[email protected]
cc

Subject
Re: Framework for StAX-based model loading






Jean-Sebastien Delfino wrote:
Jeremy Boynes wrote:
Jim Marino wrote:
Hi Jeremy,
Could you briefly enumerate what you see as the benefits to theStAX
framework over alternatives?
The final straw that prompted me to do this was the amount of
classloader wrangling we ended up doing in the Tomcat code acouple of
weeks ago. We need to keep track of context classloader switching
between loading the model and loading any application code.There is
plenty of room for subtle errors to creep in.
The classloader issues we ran into with the current SDOimplementationneed to be solved. I am not sure that they are a sufficientreason forstopping to use SDO and moving to a different technology. I'msurprised
to see that we have a databinding technology in Tuscany but we are
running away from it when we encounter our first problems with it. I
think we should spend a little more time trying to fix it instead of
running away from it. By the way the classloader problems we raninto acouple weeks ago were not just caused by SDO, this was acombination ofSDO, Axis2 and some of the factories used under the cover byAxis2, all
having different requirements in terms of "current" class loader.
I wouldn't say "running away" (that's a little dramatic) but I dothink
it is fair to say that we ran into problems due to the way our SDO
implementation couples the type system to the thread's classloader.
The goal of SDO is not to be just another XML data bindingtechnology,its purpose is to "simplify data programming so developers canfocus on
business logic instead of the underlying technology." It does this
through abstraction of that technology, by providing a data-graph
oriented API and capabilities like containment and change tracking.

We are not using those capabilities. We're actually only using a tiny
fraction of its capabilites. I think it is reasonable to evaluate
whether it is the right hammer to use on this screw.
The SDO solution (actually this would be true of any XML->POJObinding)was fine when the logical model was an exact replica of the XMLfiles.
However, to support more logical unit testing (and other uses) the
model
has now shifted back to being more of a true configurationmodel. Thismeans we can't just slurp the XML into objects and use themdirectly,
we
need to read in the POJOs and then run a transformation on them.Thisadds an additional phase to the load process that needsmaintenance.
The logical model was never an exact replica of the XML files.Whatever
technology you use you'll need to do the following:
1. handle the parsing/loading from XML
2. transform what you get from the XML into a logical model
The current SDO based approach separated the two concerns. Withthis new
StAX approach we do (1) and (2) together. I think this will create
complexity over time.
I would describe our problem slightly differently saying we need:
1) to parse the incoming bytestream
2) to use the parse results to build the logical model
The first of these is being handled by StAX, the second the code we
provide that reacts to parse events and builds the model. This seems
like a fairly clear separation of concerns.

I would also break down the SDO-based solution differently:
1) SDO parses the incoming bytestream
2) SDO builds its physical model to represent the XML
3) the SCDLModelContentHandler parses the physical model
   and generates a partial logical model
4) linkers generated in 3) run to complete the logical model

I think this is less separated. The implementation of
SCDLContentModelHandler is tightly coupled to the physical model
generated by SDO. The linkers are also coupled to the elementhandlers
(the caseXXXX methods) that parse each physical model object.

I'd also point out that the implementation of the content handler for
the core assembly model is different from other extensions: theformeruses a (code-generated) case dispatcher, the others tend to use asingle
method with manually coded instanceof tests.
Having a container system there able to manage the loaders means
extending the model is easy - an extension just needs tocontribute its
model elements and a XML handler. There is no need to codegen a
separate
XML model and write a transformer. There is also only one extension
registry rather than two (the SDO type registry and the SCDL loader
registry).
I don't see why codegen is a problem. In general I'd rather get some
code generated than write it myself. I agree that with the SDOapproachyou have to register the generated model and your handler/transformer. Idon't really understand the difference you make between a handlerand a
transformer and why it's easier to write a handler than to codegen a
model and write a transformer. With the SDO based approach youneed towrite code that gets data out of an SDO model with nice generatedgettermethods. With a StAX based approach (and it would be very similarwith aDOM or SAX based approach) you get the data out of a more weaklytyped
model. Frankly I prefer to write:
String name=component.getName();
than
String name=reader.getAttributeValue(null, "name");
I wouldn't have such a big issue with codegen if it generated thecode
that we wanted. However, the codegen here is generating objects that
represent the physical structure of the XML rather than ones thatmap tothe logical model. This leads to the need for another parser/transform
(the SCDLModelContentHandler) to convert from the physical
representation to the logical one.
The result is we have both the complexity of code generation *and*the
complexity of a manually coded parser/transformer.
The XML handling is pluggable - it just uses the standard StAX APIs
rather than internal hooks to our SDO implementation and/or EMF. A
validating StAX parser can be used if required; semanticvalidation is
still being performed in the model and builders.
We have been working to remove the dependencies on EMF, so againI don'tsee how this can motivate moving to StAX. One of the goals ofTuscany is
to provide a good SDO story anyway to people who want to load an XML
document into an SDO model, without requiring any hooks into our SDO
implementation or EMF. Again I think we should all work toimprove ourSDO story instead of using something else. For example I thinkthat we
should improve SDO to provide a good integration with StAX.
The point I was making here is that there are multipleimplementations
of StAX available that can be used and that the loader is not tied to
one of them. This may be useful for people embedding Tuscany in other
environments where they have some preference over whichimplementation
should be used.
We are already working to integrate StAX support with SDO to allowSDO'sto be (de)-serialized from/to StAX event streams. We are alreadyusing
this for better AXIOM integration in the web-services stack.
Having a good SDO implementation is a primary objective for theproject.
However, as I pointed out at the start, SDO is more than just an XML
binding technology and we need to make sure its SDO-ness excels.We alsoneed to realize that it is not the universal solution for databinding
and that other technologies may work better in specific scenarios. I
think we have one of those scenarios.
Code footprint is better as there is no intermediate form.Performanceand memory footprint are probably better too. However, I don'tsee thatas a major factor as we are only reading config data here (i.e.it's
once per deployment not once per request).
I agree, code and memory footprint are better with the StAXapproach.
On the downside, StAX is a technology that may not be asfamiliar to
people. However, I think it has enough similarity to DOM/SAX to be
readily understood. It is also heavily used in Axis2 so we will be
seeing it anyway.
We will need to modify the parsing code if the XML changeswhereas
with
SDO or another XML->POJO solution that would be handled bygeneration.However, the need to transform the model after load means thatwe have
custom parsing code anyway just running on the POJOs (see
SCDLModelContentHandler).
This is the point that raises the biggest concern for me. I havebeenthere before multiple times, implemented models and loaders forchangingXML specifications. Usually the kind of approach demonstrated bythisStAX loader looks very simple and very tempting at the beginningof theproject, but ends up in a mess and maintenance nightmare in thelonger
term.
Unfortunately I think that just an aspect of this problem. I haveseen
XML-binding based solutions used in other projects and they have also
proved to be very fragile even in the face of static schemas. Ifwe hada pure SDO solution I would be less concerned; my concern here isabout
the parse/transform code that we have to maintain to convert the
physical to the logical model.
If we want to avoid that we need the following:
- more work on this StAX loader to clearly separate the pure
loading/de-serialization aspect from the physical -> logical
transformation aspects, if you mix the two aspects it will be arecipefor disaster after a year of maintenance adjusting to changes inthe XML
See above - I think the same applies to the SCDLModelContentHandler
- define intermediate data structures (close to the physicalmodel) toavoid polluting the logical model with XML specific info (I'mthinkingof all the cases where we're going to have to storeintermedaite / halfloaded data and complete/resolve things in a second pass,properties andwires are two examples that come to mind); if we don't do thatand start
adding stuff to the logical model to facilitate the loading phase we
will make a mess of the logical model
I don't think we have needed to populate the logical model withphysical
artifacts. I did make some changes to support the StAX loader such as
storing reference targets as pointers rather than using a Reference
object directly. I would contend that is a normalization that better
represents the logical model - it certainly cut out a couple of bugs
that were the result of inconsistent updates to the denormalizedReferences.
- make the StAX loader approach more complete before we jump toswitchto it, to really understand the impact, for example the StAXloaders donot handle subsystems or property types at the moment, I thinkthat thesupport for properties in particular is going to be interestingand will
generate some work to integrate between StAX and SDO if we want to
support SDO properties.
Given subsystems are changing and not really supported yet by the
runtime this does not seem like a major issue. In fact, looking atthehow subsystems are handled in the SCDLModelContentHandler, addingStAX
loader support form them would be trivial. How about you have a go at
doing it and see how hard/easy adding things to the StAX framework
actually is?
For properties, I don't think either solution is fully fleshed outyetso using that as a basis for comparison is a little unfair. Forexample,I would ask how user-defined types are loaded into the SDO typesystem,
or how non-SDO property type support would be added (e.g. to load
properties as JAXB objects)?

With the StAX solution we have the advantage that by using a standard
technology it will be easier to add in other binding frameworks.We are
already working on a StAX->SDO deserializer that would give us a
solution when the user was using an SDO; further JAXB, XMLBeansand (I
believe) Castor all support StAX sources so it should be trival to
integrate those.
- come up with brand new solution for serializing/saving models,I amsure we will run into use cases where we want to save anassembly, a setof wires, a module component configuration etc; StAX only handlestheloading part, so if we're not using SDO here again we'll have toinvent
something else to handle serialization/saving of the models...
We need to do that anyway. If the logical model was an SDO then wecouldjust write it out, but it isn't. We would need to develop andmaintain aserializer equivalent to the SCDLModelContentHandler and that islikely
to be just as much manual code to maintain as a StAX serializer.
My main concern is about the complexity of maintaining all thiscode.
Just the (incomplete) support for the core SCDL is already about 750
lines of code, mixing parsing logic and mapping/construction of the
logical model, using dynamic APIs like reader.getAttribute(null,"name")compared to the 550 lines of model transformation code usingstronglytyped APIs generated from the SCDL XSD in theSCDLModelContentHandler
(with more complete support for the core SCDL). I anticipate many
changes in the SCDL XSD and the logical model during the courseof this
year, the StAX based loader approach may look appealing now at the
beginning of the project, but we will only succeed with it if wehavecommitted contributors and committers ready to maintain thiscode, makeit complete, and adjust it each time we change the SCDL XSD (andit'sgoing to be pretty painful, compared to just rebuild to regen thecodefrom XSD and adjust the transformer where it's broken by thechanges).
I think the "adjust the transformer" part here is the key phrase.This
is going to be just as much work as maintaining any StAX handler (as
they are essentially doing the same thing).
If you're going to compare lines of code, the honest metric herewouldbe to compare the number of lines related to parsing andtransformationand to exclude the code used to create the logical model. Butlines of
code alone are not the only metric; when I compare blocks out of
SCDLModelContentHandlerImpl with the StAX equivalent the latter seems
less complex (at least to me).

For example, the <service> code in SCDLModelContentHanlderImpl is:

    public Object caseService(Service object) {
        final org.apache.tuscany.model.assembly.Service
service=factory.createService();
        service.setName(object.getName());

        linkers.add(new Runnable() {
            public void run() {
                currentComponentType.getServices().add(service);
            };
        });

        currentService=service;
        return service;
    }

and for the contained <interface.java>

    public Object caseJavaInterface(JavaInterface object) {
        final JavaServiceContract
serviceContract=factory.createJavaServiceContract();
        serviceContract.setScope(Scope.INSTANCE);

        serviceContract.setInterfaceName(object.getInterface());
serviceContract.setCallbackInterfaceName(object.getCallbackInterface());
        linkServiceContract(object, serviceContract);

        return serviceContract;
    }

and

    private void linkServiceContract(Object object, final
ServiceContract serviceContract) {
        Object container=((DataObject)object).getContainer();
        if (container instanceof Service) {

            // Set a service contract on a service
            final org.apache.tuscany.model.assembly.Service
service=currentService;
            linkers.add(new Runnable() {
                public void run() {
                    service.setServiceContract(serviceContract);
                }
            });
        }
        else if (container instanceof Reference) {

            // Set a service contract on a reference
            final org.apache.tuscany.model.assembly.Reference
reference=currentReference;
            linkers.add(new Runnable() {
                public void run() {
                    reference.setServiceContract(serviceContract);
                }
            });
        } else if (container instanceof ExternalService) {

            // Set a service contract on an external service
            final org.apache.tuscany.model.assembly.ExternalService
externalService=currentExternalService;
            linkers.add(new Runnable() {
                public void run() {
externalService.getConfiguredService().getService().setServiceContract(serviceContract);
                }
            });
        } else if (container instanceof EntryPoint) {

            // Set a service contract on an entry point
            final org.apache.tuscany.model.assembly.EntryPoint
entryPoint=currentEntryPoint;
            linkers.add(new Runnable() {
                public void run() {
entryPoint.getConfiguredService().getService().setServiceContract(serviceContract);
entryPoint.getConfiguredReference().getReference().setServiceContract(serviceContract);
                }
            });
        }
    }

whereas in the StAX framework the <service> handler is:

    public Service load(XMLStreamReader reader, ResourceLoader
resourceLoader) throws XMLStreamException,ConfigurationLoadException {
        assert SERVICE.equals(reader.getName());
        Service service = factory.createService();
        service.setName(reader.getAttributeValue(null, "name"));

        while (true) {
            switch (reader.next()) {
            case START_ELEMENT:
                AssemblyModelObject o = registry.load(reader,
resourceLoader);
                if (o instanceof ServiceContract) {
                    service.setServiceContract((ServiceContract) o);
                }
                reader.next();
                break;
            case END_ELEMENT:
                return service;
            }
        }
    }

and the <interface.java> handler is

    public JavaServiceContract load(XMLStreamReader reader,
ResourceLoader resourceLoader) throws XMLStreamException,
ConfigurationLoadException {
assert AssemblyConstants.INTERFACE_JAVA.equals(reader.getName());
        JavaServiceContract serviceContract =
factory.createJavaServiceContract();
        serviceContract.setScope(Scope.INSTANCE);
serviceContract.setInterfaceName(reader.getAttributeValue(null,
"interface"));
serviceContract.setCallbackInterfaceName(reader.getAttributeValue(null,
"callbackInterface"));
        return serviceContract;
    }
I am not completely opposed to a StAX based loader approach. Ican even
see some benefits:
- smaller code and memory footprint compared to a solution based on
generated code
- faster loading (I'm actually not sure how much we'll gain ontypical
SCA module file, but I'm guessing that it'll be faster)
- and more important IMO... more flexible parsing (for example wecould
relax a little the requirement for some of the namespaces to be
specified in SCDL, this could help simplify SCDL files a little)
But I'm concerned that we're going to have to spent a lot ofenergy tomake it really work well in the long term (energy which could bespenton many other aspects we need to cover in Tuscany). Basically Iwill beOK with this StAX loader approach only if we have enough peoplein thegroup really volunteering to take responsiblity for it, maintainit, andadjust it to all the upcoming changes to the XSD or the logicalmodel.People just need to think about it and realize that it's going tobe a
lot of work.
I think we need to take responsibility for whichever option wechoose.Maintaining the loading code in the face of an evolving spec isgoing torequire work from all of us. The choice here comes down to whatare we
willing to look after:

A) an SDO model, code generation, a SCDLContentModelHandler and its
   linker blocks, a set of SCDLModelLoader impls,
   and Thread classloader management code, or

B) a set of StAXElementLoader's
As I said at the start, I wasn't even looking at this until we ranintothe Thread classloader problems during the Tomcat integration. Istilldon't think we have resolved all of them and think we are going torun
into a new set of challenges with other property binding frameworks.
I would like to put this to rest soon. Both versions are out therefor
folk to look at and they are both complete enough to run our current
examples.
I think the StAX version is simpler yet just as flexible, avoidssome of
the problems seen to date, and provides an easy integration path with
other frameworks (including SDO) and unless there is a strongtechnical
argument against I'd like to switch over.

--
Jeremy
The SCDL schemas are there: http://svn.apache.org/repos/asf/incubator/tuscany/java/spec/sca/src/main/resources/schemas. Most ofthe base elements have an xsd:any or anyAttribute in addition totheir structured content. This is typical when you define alanguage like SCDL to allow for extensions of the language withoutmaking changes to the base schema. Besides that, SCDL looks prettystructured to me.
I think that the (A) vs (B) comparison above is unfair. We don'tneed to maintain an SDO model, it is generated from the SCDL XSDs,we are not adding maintenance requirements on the code generation,it is already part of the SDO sub-project, and I don't think thatthe classloader related issues are relevant here (again theclassloader issues we ran into were caused by different librarieshaving different requirements on the "current" classloader, thiswill need to be fixed in SDO but is not only an SDO issue).
I am not convinced at all that the StAXElementLoaders are going toremain as simple and flexible as Jeremy thinks. My main questionremains: Is anybody volunteering to take responsibility for this code?
--
Jean-Sebastien

Re: Framework for StAX-based model loading

Reply via email to