Am 05.10.2011 14:24, schrieb Marzia Forli:
Thank you very much Steven,
Right now I constructed the 'hello world' stax pipeline and began to examine
the mentioned 'ExampleComplexTransformer' and slowly bending my head around
this whole stuff ;-)
Correct me if I am wrong but stax and sax 'PipelineComponent' are fundamentally
different,
Yes, indeed.
StAX and SAX are different approaches to handling XML data in Java.
SAX is much older and basically pushes different bits from the original
document to your "Handler", which then decides what to do with that
information.
StAX uses XMLEvents and basically you ask for the next event when you're
ready to process it.
(That's the event mode, there's also a cursor based API in StAX but
we're not using that in cocoon-stax).
in case of stax, the core of transformer is method 'produceEvents' where
there is code like this:
if (getParent().hasNext()) {
XMLEvent event = getParent().nextEvent();
addEventToQueue(event);
}
You are right about my further questions ;-) After I constructed above
mentioned pipeline and inserted a DummyTransformer with above code block
together with
System.out.println(event);
on the output I see all my (DummyTransformer) output followed with
XMLSerializer output, and I am a bit puzzled, as if whole pipeline buffers
events and on the end it bursts them trough a XMLSerializer ?
The pipeline itself will not buffer any XML information.
Every event is processed completely before the next one is requested.
That's why larger file can be processed just as easily as very small files.
What you're seeing might be a buffering in the underlying XML handling
or some other buffering outside cocoon.
Do you use System.out as output stream for the pipeline or do you
produce into another stream, that is then written to the console?
Up to now I think that my solution would be to make a BatchingTransformer which
pass-trough the events until it reaches InSubtreeNavigator.fulfillsCriteria then it
starts batching reading the events until it reaches fulfillsCriteria false then it
gives buffered Queue<XMLEvent> events to rewriter after that it enqueue those
resulting events via addAllEventsToQueue of my transformer... What do you think is it
dumb approach or what ?
The intention of the navigators is to let you know where you are.
You define criteria (like inside a certain sub-tree, a certain attribute
is present and/or has a certain value, etc) and then let the Navigators
decide if those criteria have been met.
Based on that you can decide to drop, replace or create XMLEvents.
Complex transformations might require to buffer a sizeable part of the
document before you can make a decision.
Depending on how large that part is, memory consumption might become a
problem or not.
There's really no right or wrong here: Do what works!
The general principle is always:
Everything that is added to the queue (with addEventToQueue) will reach
your XMLSerializer.
Everything else will be gone.
Also I need to fiddle with XMLInputFactory properties (you know
isValidating,isCoalescing...) and maybe later use non-default XMLInputFactory
(woodstox) is it possible ?
Thanks for you patience...
I'm afraid those settings cannot be configured yet. - Congratulations!
You're the first asking for this.
However that is easily added.
You can either create your own version of
org.apache.cocoon.stax.component.XMLGenerator and set the properties you
need
or - even better - we add this to our XMLGenerator.
Same goes for woodstox.
Are you using trunk or one of the released versions?
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]