Re: [orbeon-user] XPL semantics

Erik Bruchez Fri, 28 Jan 2005 14:43:03 -0800

Eric van der Vlist wrote:

> The fact that we can define targets doesn't mean that the execution
> of a processor is always explicit. Look for instance at make. When
> you give no target, the first one in the Makefile is chosen. A rule
> that won't break current applications could be that when you specify
> no target, all the processors without output are executed.

I think breaking compatibility is not a big problem, because XPL 1.0
will probably feature a version attribute.

Most of the time I don't like implicit rules, especially when they
look arbitrary. In your example, an XPL programmer would have to say,
ok, if I add a target, then this processor executes. But if I don't,
all of them execute. And then scratch his / her head ;-) I would
prefer a unique ground rule, even if it has some non-intuitive
ramifications.

> It would make sense IMO to be able to define the "default target" in
> the pipeline and to be able to overwrite it in the pipeline
> invocation (like a xsl:param).

But then you need a way of passing that parameter to the pipeline
invocation. As long as that's done through an XML Infoset, that would
be fine. Otherwise, again, you add a new concept to the language, that
of being able to pass parameters that are not expressed as XML
Infosets.

> You refer to one of them as a "black hole" :-) ... We could call
> then "actions" but that could be confusing with PFC actions as they
> can have outputs...

Here we are looking at an XPL that is completely independent from
OPS. So if the right term was "action", we would maybe go for it. But
I don't think "action" it the right term. Producing a serialized XML
document on stdout does not sound more like an "action" to me than
transforming an document into another through XSLT.

Others have the term "sink". In the beginning, we constantly used
"generator", "transformer", and "serializer", maybe under the
influence of Cocoon. But in XPL those denominations don't make that
much sense purely from the perspective of the inputs and outputs they
have, because often processors that "generate" an XML Infoset, like an
URL "generator", actually takes an XML Infoset input to configure it
as well. So you can't just look at the processor as a black box and
say, it's a generator because it has only an output, because in fact
it has an input as well!

So those terms seem more appropriate to describe a particular
processor subjectively, based on functionality, rather than based on
the inputs and outputs they have.

For now the spec just talks about "processors that don't have any
connected outputs" ;-)

>>Maybe this could be changed to performing the initialization phase on
>>all the processors, like suggested in the post linked above. This
>>could make the processing model a little easier to understand, because
>>then even a processor with an unconnected output has an opportunity to
>>do something on an equal footing with processors that do not have
>>outputs. If it wants to perform some actions during its initialization
>>phase, it may do so; if it wants to use the lazy approach and do its
>>work only when its output(s) is/are read, it may do so as well.
>
> Yes, at first thought, I think that this would be better...

This has pretty big implications though, on the execution of things
like p:choose, on sub-pipelines, etc. I temporarily convinced myself
yesterday that this was after all not so good ;-) And that if we
decide to be "lazy" in the execution process, better to assume the
implications. But that's probably not the end of it.

> What would also be most useful is to document for each processor,
> what is done during the each initialisation and read phase.

Absolutely. Usually, the solution is simple, because when you have
outputs, nothing visible is done during the initialization phase,
everything appears to be done in read phases, in a lazy approach
(things are used whenever they are needed). With processors that don't
have outputs, everything is done in the initialization phase.

> It could also be nice to provide a kind of visualisation (graphical
> or not) of the flows of actions that can be expected from a pipe. Do
> you think a XSLT transformation could do that (take a XPL document
> and generate a XHTML (or SVG) representation of what can be
> expected) or would that be too complex?

Yes in theory, but it is difficult. We tried doing this at some point
but the issue of actually doing a layout was complex and we gave
up. There are some open source libraries that can help us do that
though.

>>Whether the behavior remains the same or not depends on how you write
>>the pipeline, but it is likely to remain the same. Clearly, reading
>>the output may cause other tasks to be performed (i.e. other
>>processors to be executed related to producing that output). In theory
>>it could even cause the order of execution of processors to be
>>different, but in your particular examples with serializers only, the
>>execution order would remain the same, because all processors without
>>outputs are guaranteed to be executed first.
>
> Hmmm... In that case, I wouldn't want the serialisers to be called when
> there is an output...
>
> The following pipe should be valid:
>
> <p:config>
>
> <p:param name="data" type="input">
> <p:param name="target" type="input">
> <p:param name="data" type="output">
>
> <p:choose href="#target">
>   <p:when test="target='xml'"> xml serializer </p:when>
>   <p:when test="target='html'"> html serializer </p:when>
> </p:choose>
>
> <p:processor name="oxf:identity">
>   <p:input name="data" href="#data"/>
>   <p:output name="data" ref="data"/>
> </p:processor>
>
> </p:config>
>
> and given what you've said when a target is specified and the output
> isn't used, only the init phase of one of the serialisers would be
> called while when a target isn't specified and the output is used
> only the identity processor would be called.

Not quite. If you don't connect the "data" output of the pipeline when
you use it, the pipeline doesn' have any connected outputs. So it
executes as a "serializer", which means that it goes through the
initialization phase. During that phase, one of the two serializer,
XML or HTML, will be executed through the execution of the p:choose.

If you do connect an output, when you read that output, the pipeline
also goes through the initialization phase. So your XML or HTML
serializer is also run.

Which seems fair to me, because, as you said, reading the pipeline
output should if possible not change the behavior of the pipeline. In
this particular case, you are getting what you want from that point of
view.

> If I understand correctly, the pipe itself would then be considered
> as having an output and wouldn't be called at all (from another
> pipe) if this output isn't connected and one would have to use the
> null serializer to force its execution...

There are two use cases:

1. Call the pipeline from some Java code. Here you would have control
   through an API over what you do with the pipeline: initialize it,
   and read from it. So let's forget this one.

2. Call the pipeline through the pipeline processor within another
   pipeline. In this case, if the processor does not have any
   connected outputs, it is run as a serializer so initialized when
   the calling pipeline initializes. If it does have a connected
   output, it is initialized when that output is read, and then a read
   phase follows.

> That makes sense, but I feel a little bit uneasy by the way all that
> is working. But I can't even explain why! Maybe that's just the fact
> that actual actions (such as sending a HTTP response) is done in a
> method called "init" which hurts me...

In the current implementation the method is actually called start()
;-) This said, I called this phase "initialization", but maybe we can
find a better term. Maybe it should just be an "execution" phase and
zero or more "read" phases. The issue is that both phases do belong to
what is considered the execution of the pipeline.

The only thing that bothers me slightly at the moment is the
determination of when a processor without connected outputs is
executed. As you pointed out initially, that's what bothered you
too. What I see at this point is that we can:

1. Leave things the way they are.

2. Try to figure out some kind of "target" approach, by which we would
   remove the automatic execution of processors without connected
   outputs. Whether real targets a la ant or simply a mechanism to
   tell whether such processors are executed or not.

3. Going the other way and make all processors on an equal footing, by
   saying that processors with connected outputs are also initialized.

> The fact that it seems possible to emulate targets with a simple
> choose would tend to show that the current balance between
> simplicity and features is good!

Thanks! But don't believe that we are inflexible on XPL at this point.

> What about a compact syntax (ala RELAX NG)?

That could be good too, but probably after the good old verbose XML
syntax is specified - unless there is a volunteer to make a proposal
in parallel.

-Erik


-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
orbeon-user mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/orbeon-user

Re: [orbeon-user] XPL semantics

Reply via email to