Re: Framework for StAX-based model loading

Jean-Sebastien Delfino Mon, 20 Mar 2006 23:01:46 -0800

Jeremy Boynes wrote:

Jim Marino wrote:

Hi Jeremy,
Could you briefly enumerate what you see as the benefits to the StAXframework over alternatives?


The final straw that prompted me to do this was the amount of
classloader wrangling we ended up doing in the Tomcat code a couple of
weeks ago. We need to keep track of context classloader switching
between loading the model and loading any application code. There is
plenty of room for subtle errors to creep in.

The classloader issues we ran into with the current SDO implementationneed to be solved. I am not sure that they are a sufficient reason forstopping to use SDO and moving to a different technology. I'm surprisedto see that we have a databinding technology in Tuscany but we arerunning away from it when we encounter our first problems with it. Ithink we should spend a little more time trying to fix it instead ofrunning away from it. By the way the classloader problems we ran into acouple weeks ago were not just caused by SDO, this was a combination ofSDO, Axis2 and some of the factories used under the cover by Axis2, allhaving different requirements in terms of "current" class loader.

The SDO solution (actually this would be true of any XML->POJO binding)
was fine when the logical model was an exact replica of the XML files.
However, to support more logical unit testing (and other uses) the model
has now shifted back to being more of a true configuration model. This
means we can't just slurp the XML into objects and use them directly, we
need to read in the POJOs and then run a transformation on them. This
adds an additional phase to the load process that needs maintenance.

The logical model was never an exact replica of the XML files. Whatevertechnology you use you'll need to do the following:

1. handle the parsing/loading from XML
2. transform what you get from the XML into a logical model

The current SDO based approach separated the two concerns. With this newStAX approach we do (1) and (2) together. I think this will createcomplexity over time.

Having a container system there able to manage the loaders means
extending the model is easy - an extension just needs to contribute its
model elements and a XML handler. There is no need to codegen a separate
XML model and write a transformer. There is also only one extension
registry rather than two (the SDO type registry and the SCDL loader
registry).

I don't see why codegen is a problem. In general I'd rather get somecode generated than write it myself. I agree that with the SDO approachyou have to register the generated model and your handler/transformer. Idon't really understand the difference you make between a handler and atransformer and why it's easier to write a handler than to codegen amodel and write a transformer. With the SDO based approach you need towrite code that gets data out of an SDO model with nice generated gettermethods. With a StAX based approach (and it would be very similar with aDOM or SAX based approach) you get the data out of a more weakly typedmodel. Frankly I prefer to write:

String name=component.getName();
than
String name=reader.getAttributeValue(null, "name");

The XML handling is pluggable - it just uses the standard StAX APIs
rather than internal hooks to our SDO implementation and/or EMF. A
validating StAX parser can be used if required; semantic validation is
still being performed in the model and builders.

We have been working to remove the dependencies on EMF, so again I don'tsee how this can motivate moving to StAX. One of the goals of Tuscany isto provide a good SDO story anyway to people who want to load an XMLdocument into an SDO model, without requiring any hooks into our SDOimplementation or EMF. Again I think we should all work to improve ourSDO story instead of using something else. For example I think that weshould improve SDO to provide a good integration with StAX.

Code footprint is better as there is no intermediate form. Performance
and memory footprint are probably better too. However, I don't see that
as a major factor as we are only reading config data here (i.e. it's
once per deployment not once per request).

I agree, code and memory footprint are better with the StAX approach.

On the downside, StAX is a technology that may not be as familiar to
people. However, I think it has enough similarity to DOM/SAX to be
readily understood. It is also heavily used in Axis2 so we will be
seeing it anyway.

We will need to modify the parsing code if the XML changes whereas with

SDO or another XML->POJO solution that would be handled by generation.
However, the need to transform the model after load means that we have
custom parsing code anyway just running on the POJOs (see
SCDLModelContentHandler).

This is the point that raises the biggest concern for me. I have beenthere before multiple times, implemented models and loaders for changingXML specifications. Usually the kind of approach demonstrated by thisStAX loader looks very simple and very tempting at the beginning of theproject, but ends up in a mess and maintenance nightmare in the longer term.


If we want to avoid that we need the following:

- more work on this StAX loader to clearly separate the pureloading/de-serialization aspect from the physical -> logicaltransformation aspects, if you mix the two aspects it will be a recipefor disaster after a year of maintenance adjusting to changes in the XML- define intermediate data structures (close to the physical model) toavoid polluting the logical model with XML specific info (I'm thinkingof all the cases where we're going to have to store intermedaite / halfloaded data and complete/resolve things in a second pass, properties andwires are two examples that come to mind); if we don't do that and startadding stuff to the logical model to facilitate the loading phase wewill make a mess of the logical model- make the StAX loader approach more complete before we jump to switchto it, to really understand the impact, for example the StAX loaders donot handle subsystems or property types at the moment, I think that thesupport for properties in particular is going to be interesting and willgenerate some work to integrate between StAX and SDO if we want tosupport SDO properties.- come up with brand new solution for serializing/saving models, I amsure we will run into use cases where we want to save an assembly, a setof wires, a module component configuration etc; StAX only handles theloading part, so if we're not using SDO here again we'll have to inventsomething else to handle serialization/saving of the models...

My main concern is about the complexity of maintaining all this code.Just the (incomplete) support for the core SCDL is already about 750lines of code, mixing parsing logic and mapping/construction of thelogical model, using dynamic APIs like reader.getAttribute(null, "name")compared to the 550 lines of model transformation code using stronglytyped APIs generated from the SCDL XSD in the SCDLModelContentHandler(with more complete support for the core SCDL). I anticipate manychanges in the SCDL XSD and the logical model during the course of thisyear, the StAX based loader approach may look appealing now at thebeginning of the project, but we will only succeed with it if we havecommitted contributors and committers ready to maintain this code, makeit complete, and adjust it each time we change the SCDL XSD (and it'sgoing to be pretty painful, compared to just rebuild to regen the codefrom XSD and adjust the transformer where it's broken by the changes).

I am not completely opposed to a StAX based loader approach. I can evensee some benefits:- smaller code and memory footprint compared to a solution based ongenerated code- faster loading (I'm actually not sure how much we'll gain on typicalSCA module file, but I'm guessing that it'll be faster)- and more important IMO... more flexible parsing (for example we couldrelax a little the requirement for some of the namespaces to bespecified in SCDL, this could help simplify SCDL files a little)

But I'm concerned that we're going to have to spent a lot of energy tomake it really work well in the long term (energy which could be spenton many other aspects we need to cover in Tuscany). Basically I will beOK with this StAX loader approach only if we have enough people in thegroup really volunteering to take responsiblity for it, maintain it, andadjust it to all the upcoming changes to the XSD or the logical model.People just need to think about it and realize that it's going to be alot of work.

 what comes to mind - if anyone else can see any issues please let
me know.

--
Jeremy



--
Jean-Sebastien

Re: Framework for StAX-based model loading

Reply via email to