Re: [Taverna-hackers] semantics of control links

Jan Hidders Mon, 22 Jun 2009 06:16:09 -0700

On 22 Jun 2009, at 13:08, Stian Soiland-Reyes wrote:

> On Mon, Jun 22, 2009 at 11:16, Jan Hidders<[email protected]>  
> wrote:
>
>>>> So it will now say that if a processor iterates over [["a","b"],[],
>>>> ["c"]] the iteration for position 1 will not start until the  
>>>> messages
>>>> for position 1.1 (the string "a"), position 1.2 (the string "b")  
>>>> and
>>>> position 1 (closing the list). This would be correct, yes?
>>> I am confused, this depends on the depth of the input ports.  If  
>>> input
>>> ports are all depth 0, no invocation would be done, as the second
>>> input port is presented with [] - iteration over an empty list  
>>> yields
>>> no iterations.
>> Sorry. I assumed nesting depth 1 for the port.
>
> OK, now I get you :-)


Yea. :-)

> Then yeah, it would wait until a full list was available, either ["a",
> "b"], [] or ["c"] (if these are made by iterations, and multithreading
> or merges are in effect above, [] or ["c"] might come first).  The
> full list is always sent AFTER the individual items inside it, so it
> would implicitly mean after "a" and "b" have been produced.

Clear.

>> We thought you meant (when we discussed pipelined services)  
>> pipelining
>> at a lower level than the level at which the iteration happens. Let  
>> me
>> try to explain. Assume a processor P with one input and one output
>> port. Suppose we iterate over the list ["a", "b"]. The input port has
>> nesting depth 1 so P iterates over the strings.
>
> No - there would be no iteration in this case if the port depth is 1,
> remember 1 is list of values, depth 0 is an individual value.
>
> I assume below that you mean the port has depth 0, which means there
> would be two iterations.

Ugh. Yes, depth 0 is what I meant.

>> Let's say for the
>> first iteration (for value "a") the result is ["x", "y"].  for the
>> second iteration (for value "b") the result is ["u"].
>
> OK, so the output port does have depth 1.
>
>> Currently we
>> allow that the "x" can rise through the invocation stack and could
>> even be already consumed by another processor. So we don't wait for
>> the "y" value to arrive or the list to be closed.
>
> Yes, this can happen with pipelining processors, like BioMart.
>
> The processor would have to return, assuming the outputport is  
> called "o1":
>
> [0]: o1="x"
> [1]: o1="y"
> []: o1 = ["x", "y"]

If we speak about the whole processor, not just the single iteration,  
wouldn't that be ?:
[0,0]: o1="x"
[0,1]: o1="y"
[0]: o1 = ["x", "y"]

After all, later we still (might) get:
[1,0]: o1 = "u"
[1]: o1 = ["u"]
[]: -1 = [["x","y"], ["u"]]

Btw. are you really sending the whole ["x", "y"] at the closing of the  
list, because that would seem a bit redundant, or is this just  
conceptually there?

> [0] or [1] can come in any order, but the [] must come after them. (In
> an extended version with depth 2 indexes, [2,1] must come before [2] -
> but that does not mean that [3,4] can't come before [2]. )

Clear.

>> We assumed that this
>> is the kind of pipelining you wanted. What we already had is that the
>> value ["x", "y"] is produced and can be consumed by the next  
>> processor
>> without waiting for ["u"].
>
> OK, I got confused and thought you meant different output ports. In
> this example there's only one input port and one output port.
>
> Unless the next processor is expecting depth 2 (lists of lists) then
> yes, it can start crunching on ["x", "y"] before ["u"] is produced. If
> the processor is expecting depth 0 it can even start before "y" is
> produced (but this would mean that the producing processor needs to
> support activity pipelining).

Great. But now I'm unsure about an earlier remark you made: (I've also  
included my question that you were responding to)

>> Do you just mean to say here that all output ports always produce
>> values, or are you talking about synchronization here, i.e., they
>> produce it more or less at the same moment?
>
> They would all produce values, and the values would come at exactly
> the same moment - the values are returned in a Map that flows up the
> dispatch stack - at the top of which the Processor will disassemble it
> and send it out on the individual processor output ports.


In principle a processor could have two output ports that both produce  
their values as a stream stretched out over time, even if it is not  
iterating, so then they do not really  produce their values at exactly  
the same moment, do they? It's probably not even true that these  
streams should start and stop at the same moment, nor do they need to  
have the same length. What I mean is: this is not a rule that holds in  
general. I understand that in many cases this holds, but this is then  
due to the semantics of the activities in the list of activities  
associated with the processor, rather than that the processor enforces  
this. Correct?

The only thing that the processor would enforce is that every  
iteration has completely stopped once it has sent its last output  
message.

Apologies for taking so much of your time, btw., but we are on a  
deadline with our paper and really need to get this correct.

-- Jan HIdders

------------------------------------------------------------------------------
Are you an open source citizen? Join us for the Open Source Bridge conference!
Portland, OR, June 17-19. Two days of sessions, one day of unconference: $250.
Need another reason to go? 24-hour hacker lounge. Register today!
http://ad.doubleclick.net/clk;215844324;13503038;v?http://opensourcebridge.org
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
Developers Guide: http://www.mygrid.org.uk/tools/developer-information

Re: [Taverna-hackers] semantics of control links

Reply via email to