Am 05.10.2011 14:24, schrieb Marzia Forli:
Thank you very much Steven,
Right now I constructed the 'hello world' stax pipeline and began to examine 
the mentioned 'ExampleComplexTransformer' and slowly bending my head around 
this whole stuff ;-)
Correct me if I am wrong but stax and sax 'PipelineComponent' are fundamentally 
different,
Yes, indeed.

StAX and SAX are different approaches to handling XML data in Java.
SAX is much older and basically pushes different bits from the original document to your "Handler", which then decides what to do with that information.

StAX uses XMLEvents and basically you ask for the next event when you're ready to process it. (That's the event mode, there's also a cursor based API in StAX but we're not using that in cocoon-stax).

  in case of stax, the core of transformer is method 'produceEvents' where 
there is code like this:

        if (getParent().hasNext()) {
                XMLEvent event = getParent().nextEvent();
                addEventToQueue(event);
        }

You are right about my further questions ;-) After I constructed above 
mentioned pipeline and inserted a DummyTransformer with above code block 
together with
        System.out.println(event);
on the output I see all my (DummyTransformer) output followed with 
XMLSerializer output, and I am a bit puzzled, as if whole pipeline buffers 
events and on the end it bursts them trough a XMLSerializer ?
The pipeline itself will not buffer any XML information.
Every event is processed completely before the next one is requested. That's why larger file can be processed just as easily as very small files.

What you're seeing might be a buffering in the underlying XML handling or some other buffering outside cocoon. Do you use System.out as output stream for the pipeline or do you produce into another stream, that is then written to the console?

Up to now I think that my solution would be to make a BatchingTransformer which 
pass-trough the events until it reaches InSubtreeNavigator.fulfillsCriteria then it 
starts batching reading the events until it reaches fulfillsCriteria false then it 
gives buffered Queue<XMLEvent>  events to rewriter after that it enqueue those 
resulting events via addAllEventsToQueue of my transformer... What do you think is it 
dumb approach or what ?

The intention of the navigators is to let you know where you are.
You define criteria (like inside a certain sub-tree, a certain attribute is present and/or has a certain value, etc) and then let the Navigators decide if those criteria have been met.
Based on that you can decide to drop, replace or create XMLEvents.

Complex transformations might require to buffer a sizeable part of the document before you can make a decision. Depending on how large that part is, memory consumption might become a problem or not.
There's really no right or wrong here: Do what works!


The general principle is always:
Everything that is added to the queue (with addEventToQueue) will reach your XMLSerializer.
Everything else will be gone.


Also I need to fiddle with XMLInputFactory properties (you know 
isValidating,isCoalescing...) and maybe later use non-default XMLInputFactory 
(woodstox) is it possible ?
Thanks for you patience...

I'm afraid those settings cannot be configured yet. - Congratulations! You're the first asking for this.
However that is easily added.

You can either create your own version of org.apache.cocoon.stax.component.XMLGenerator and set the properties you need
or - even better - we add this to our XMLGenerator.
Same goes for woodstox.

Are you using trunk or one of the released versions?




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]




---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to