>From a user-perspective, adding support for workflow port semantic types
would be very useful. It would make it possible to validate workflows by
insuring that input and output ports are compatible before running the
workflow or while constructing the workflow. It would also be useful when
looking for workflows. For example, if I have a workflow that outputs a
SwissProt ID, find all workflows that take SwissProt IDs. Or find shims
that convert SwissProt IDs into GenPept IDs.
Regards,
Mark Fortner
On Tue, Nov 22, 2011 at 3:36 AM, Stian Soiland-Reyes <
[email protected]> wrote:
> Sorry for not giving you the results of this meeting last month.
>
> Attending was (if I remember correctly):
>
> Alan R Williams, David Withers, Alexandra Nenadic, Rob Haines, Stian
> Soiland-Reyes
>
>
> I presented the options and arguments in:
>
>
> http://dl.dropbox.com/u/794465/presentations/2011-10-25-scufl2-format/index.html
>
>
> The existing SCUFL2 Workflow Bundle format (note: deeper pages in this
> wiki is unfortunately out of date)
>
> > [1]
> http://www.mygrid.org.uk/dev/wiki/display/developer/Taverna+Workflow+Bundle
> > [2]
> http://www.mygrid.org.uk/dev/wiki/display/developer/Scufl2-WorkflowBundle
>
>
> Example unpacked Workflow Bundle:
>
>
> https://github.com/myGrid/scufl2/tree/master/scufl2-rdfxml/src/test/resources/uk/org/taverna/scufl2/rdfxml/example
>
> (this is from
> https://github.com/myGrid/scufl2/blob/master/scufl2-rdfxml/src/test/resources/uk/org/taverna/scufl2/rdfxml/example.wfbundle?raw=true
> )
>
>
>
> Unfortunately, the meeting was inconclusive.
>
> From the discussions: (apologies for the colourful style, just to show
> that there were wide opinions! :))
>
>
> - RDF/XML - even when made formal with an XSD - looks "scary"
> --> .. but it's "hidden" inside a ZIP file, so you would not see it
> unless you really want to
> --> Could do plain RDF - but in Turtle which is easier to read.
> -----> What is Turtle ???
>
> - Why a ZIP-file?? I would explode if I was given this as a developer.
> --> To structure the different parts that constitute a workflow (such
> as nested workflows and embedder resources), and allow future
> extensions that we have not thought about yet
> -----> Strawman suggestion: Make the file format "truly" binary (flip
> a bit so it does not unzip) to avoid peeking
> --------> No - the purpose of SCUFL2 was to open up the format!
>
> -- "Why did you go for RDF?"
> ----> "A workflow is by definition linking. It has also been raised
> the need to annotate any part of a workflow, or to talk about bits of
> a workflow outside Taverna. Thus having identifiers for every part,
> and linking between them are important. This is native in RDF, in
> formats like JSON you will have to invent your own standard for this,
> and then think about namespaces, name collisions, etc."
>
> - "Just do a single XML!"
> --> "I'm fine with it as it is now - what do we gain by removing the
> RDF-bits from the XML?"
> --> "What about all the features of the workflow bundle? Attaching
> binaries? Annotations?"
> -----> "Keep both workflow bundle format (as it is) and a flat XML
> format (single file)"
> ----------> "The flat file is meant for what.. if it is to be
> "simpler" but "not complete", then is XML the right choice?"
>
>
> - "People really would rather like an API"
> --> "Which is the SCUFL2 toolkit - but do we also want to maintain
> libraries for Ruby, Python, etc?
> -----> Why was not the toolkit used when (internal project for
> web-based wf editor) decided to read/write .t2flow directly instead?"
>
> - "JSON would be good for other programmers, but not as a
> write-by-hand-format"
> ---> "Too many braces and quotes"
>
> - "Yaml is mainly known only in Ruby world"
>
> - "Current format does not include typing information on ports, mime
> types, semantic types, etc. "
> --> "Was intended to do as annotations as it is not affecting
> execution of a workflow"
> -----> "But it's there in the ActivityInputPort in the engine"
> ----------> ".. but in reality only used for activity's to mark to
> themselves later if they should resolve a port as binary or string"
>
> -----
>
> So my strong option (as presented) was to keep the bundle format and
> scufl2 APIs as they are (avoiding further delays to its release), but
> add a new serialisation format which is simpler, but can't necessarily
> express all (existing or possible) Taverna workflows.
>
> However it was unclear what is the use case for this "simpler" format:
>
> a) Should it be primarily for *writing* workflows (programmatically
> or manually?) - and therefore the API should "fill in the blanks" and
> have many good defaults)
>
> b) ..or primarily be for *reading* workflows - and therefore have as
> much information included, even inferred things like list depth on
> links, and annotations like mime types
>
>
>
> It was unclear what directions to take further.
>
> It was suggested to modifying the current workflow bundle format to
> use plain XML inside (which it does) but without the RDFie bits. I am
> personally against this. If the current XML with a schema is not
> actually usable for "pure XML" developers, then I would suggest to
> drop the schema bit and just do the bundle files as pure RDF. Then one
> might as well do them as Turtle files instead of overly verbose
> RDF/XML - it would also avoid anyone trying to parse them as XML
> anyway.
>
> Doing a "clean XML" inside would be easy - as it is just to modify the
> current XSD (XML schema) to remove the extra elements/attributes that
> make it valid RDF - but it would then just be an bastardized RDF, in
> which we still need to decide on how to serialise various activity
> configurations, etc.
>
> To me it is unclear what is the advantage of spending more engineering
> time to modify the format of the workflow bundle, if benefits are not
> that obvious. If we are to do this, then it should be done with that
> particular benefit in mind ("making X easier"), not just a last-minute
> change for the sake of "not having RDF". Just throwing something
> together in the last minute is how we ended up with the current
> .t2flow format.
>
>
> Our intermittent decision is to keep the Scufl2 workflow bundle format
> as it is now, and finish of the remaining .t2flow activity parsers so
> that we can release the Scufl2 beta + Taverna 3 OSGi platform command
> line alpha before Christmas. (Suggestions for better name welcome!)
>
>
>
>
> So just to kickstart perhaps some email discussions instead:
>
> If you are interested in working with the workflow format outside
> Taverna's code base - could you give a one-line summary of what you
> want to do?
>
> Include details such as if you want to this manually or
> programmatically, which programming language you use, what existing
> formats like XML, RDF, JSON, Yaml you are comfortable with, and rather
> you would like a simple-read or simple-write format.
>
> If you want to inspect workflows, then what are you interested in?
> Port information? Crawling links between processors? Details of which
> services are used? Annotations like title/description?
>
> If you want to write workflows, what style are you writing?
> Copy-pasting-style from existing workflow templates, or
> write-from-scratch script-like definitions?
>
>
> On Tue, Oct 11, 2011 at 10:50, Stian Soiland-Reyes <[email protected]>
> wrote:
> >
> > more details »
> >
> > Scufl2 alternative script-like workflow format
> >
> > As you might know, we are developing Scufl2 as the next language and
> > model for specifying Taverna workflow. We have already successfully
> > used this model in an internal release of an OSGi-based command line
> > tool for executing Taverna workflows, and for Taverna 3 are working
> > towards using this as the workflow model being edited by the
> > workbench.
> >
> > One of the motivations for moving from .t2flow to Scufl2 is to allow
> > third-party tools (like myExperiment) easier access to the workflow
> > structure, for reading, inspection, annotation and creation of
> > workflows. Ideally this should also allow the development of
> > alternative workflow building environments independent of the workflow
> > engine, like a simplified web-based editor.
> >
> > Scufl2 as developed so far includes a Workflow Bundle format [1],
> > using structured folders (normally archived in a ZIP-file for
> > distribution purposes) of RDF/XML files - which can also be
> > created/parsed with an XML Schema if the appropriate xsi:type
> > attribute is included. [2]. (Note that the Scufl2 wiki pages are still
> > at a draft stage)
> >
> > Although this format allows for all the Taverna language features to
> > be specified and used in an extensible manner, for instance allowing
> > bundling of data, provenance and runtime libraries, plugins to specify
> > which options they expect, etc., this format is more of an exchange
> > and bundling format than a format for manual editing.
> >
> > The Scufl2 toolkit [3] allows reading and writing the common Scufl2
> > workflow model in many different formats, this is for instance how it
> > reads .t2flow and SCUFL 1 workflows at the moment. There is also an
> > internal textual "debug output" format. [4]
> >
> > It has been raised a requirement to develop a more user-fronted
> > textual format for editing workflows by hand in an editor. The current
> > Scufl2 development is very much enabled to do so.
> >
> > My suggestion is to do this as an additional SCUFL2 format, as a
> > simple text file in YAML or JSON, for writing regular straight-forward
> > workflows, with enough automagic (such as port depths and iteration
> > strategies) filled in by a combination of SCUFL2 tools and the engine.
> >
> > For instance in made-up-JSON using the external tool activity:
> >
> > {
> > "workflow" : {
> > "inputs": ("name"),
> > "outputs": ("greeting"),
> > "processors": {
> > "hello": {
> > "type": "tool",
> > "command": "echo Hello, %%name%%"
> > }
> > }
> > "links": {
> > "name": "hello:name",
> > "hello:STDOUT": "greeting"
> > }
> > }
> > }
> >
> > Such a format would not be able to express every .t2flow or Scufl2 WB
> > workflow, but by limiting the scope we can avoid many details which
> > would make the format too verbose or magic, for instance port mapping,
> > alternative activities, and complex activity configurations.
> >
> > I hereby propose a meeting/Skype call to discuss the need and
> > implementation plan of such an alternative serialisation format for
> > SCUFL2 workflows - with the aim of workflows to be easy to edit and
> > read by hand. If you are interested in attending the meeting (no
> > matter if you are part of the myGrid team or not), please mark your
> > availability in the Doodle poll [3].
> >
> > [1]
> http://www.mygrid.org.uk/dev/wiki/display/developer/Taverna+Workflow+Bundle
> > [2]
> http://www.mygrid.org.uk/dev/wiki/display/developer/Scufl2-WorkflowBundle
> > [3] https://github.com/mygrid/scufl2/
> > [4]
> https://github.com/myGrid/scufl2/blob/master/scufl2-api/src/test/resources/uk/org/taverna/scufl2/api/io/HelloWorld.txt
> > [5] http://www.doodle.com/ybruwger8mi5bnn7
> >
> > When
> > Tue 2011-10-11 15:00 – 16:00 London
> > Where
> > Skype / myGrid (map)
> > Calendar
> > [email protected]
> > Who
> > •
> > [email protected] organiser
> > •
> > [email protected]
> > •
> > [email protected]
> >
> > Going? Yes - Maybe - No more options »
> >
> > Invitation from Google Calendar
> >
> > You are receiving this courtesy email at the account
> [email protected] because you are an attendee of this
> event.
> >
> > To stop receiving future notifications for this event, decline this
> event. Alternatively, you can sign up for a Google account at
> https://www.google.com/calendar/ and control your notification settings
> for your entire calendar.
> >
> >
> ------------------------------------------------------------------------------
> > All the data continuously generated in your IT infrastructure contains a
> > definitive record of customers, application performance, security
> > threats, fraudulent activity and more. Splunk takes this data and makes
> > sense of it. Business sense. IT sense. Common sense.
> > http://p.sf.net/sfu/splunk-d2d-oct
> > _______________________________________________
> > taverna-hackers mailing list
> > [email protected]
> > Web site: http://www.taverna.org.uk
> > Mailing lists: http://www.taverna.org.uk/about/contact-us/
> > Developers Guide: http://www.taverna.org.uk/developers/
>
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>
>
> ------------------------------------------------------------------------------
> All the data continuously generated in your IT infrastructure
> contains a definitive record of customers, application performance,
> security threats, fraudulent activity, and more. Splunk takes this
> data and makes sense of it. IT sense. And common sense.
> http://p.sf.net/sfu/splunk-novd2d
> _______________________________________________
> taverna-hackers mailing list
> [email protected]
> Web site: http://www.taverna.org.uk
> Mailing lists: http://www.taverna.org.uk/about/contact-us/
> Developers Guide: http://www.taverna.org.uk/developers/
>
------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure
contains a definitive record of customers, application performance,
security threats, fraudulent activity, and more. Splunk takes this
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/