Hi Stian and all,

[cc'ed Carole since she will probably be interested]

My interested was from an interoperability (galaxy-taverna++)
perspective. A simpler format (subset of t2 features) that could be
used to map to, say galaxy, features.  Galaxy provides a subset of t2
wkf features anyway. Galaxy also uses JSON to represent their
workflows (see attached example). If we do go for it we should make
sure to make them as compatible as possible.

Now (pretty soon actually) we have both galaxy and taverna (plus the
galaxy-taverna generator) in a single virtual machine so possibilities
of interoperability increase. Having something (a simpler+subset
format) to bring the wkf descriptions closer would make it possible to
achieve considerably more integration between the two systems.

Cheers,
Kostas



On 22 November 2011 12:36, Stian Soiland-Reyes
<[email protected]> wrote:
> Sorry for not giving you the results of this meeting last month.
>
> Attending was (if I remember correctly):
>
> Alan R Williams, David Withers, Alexandra Nenadic, Rob Haines, Stian
> Soiland-Reyes
>
>
> I presented the options and arguments in:
>
>  http://dl.dropbox.com/u/794465/presentations/2011-10-25-scufl2-format/index.html
>
>
> The existing SCUFL2 Workflow Bundle format (note: deeper pages in this
> wiki is unfortunately out of date)
>
>> [1] 
>> http://www.mygrid.org.uk/dev/wiki/display/developer/Taverna+Workflow+Bundle
>> [2] http://www.mygrid.org.uk/dev/wiki/display/developer/Scufl2-WorkflowBundle
>
>
> Example unpacked Workflow Bundle:
>
> https://github.com/myGrid/scufl2/tree/master/scufl2-rdfxml/src/test/resources/uk/org/taverna/scufl2/rdfxml/example
>
> (this is from 
> https://github.com/myGrid/scufl2/blob/master/scufl2-rdfxml/src/test/resources/uk/org/taverna/scufl2/rdfxml/example.wfbundle?raw=true)
>
>
>
> Unfortunately, the meeting was inconclusive.
>
> From the discussions: (apologies for the colourful style, just to show
> that there were wide opinions! :))
>
>
> - RDF/XML - even when made formal with an XSD - looks "scary"
> --> .. but it's "hidden" inside a ZIP file, so you would not see it
> unless you really want to
> --> Could do plain RDF - but in Turtle which is easier to read.
> -----> What is Turtle ???
>
> - Why a ZIP-file?? I would explode if I was given this as a developer.
> --> To structure the different parts that constitute a workflow (such
> as nested workflows and embedder resources), and allow future
> extensions that we have not thought about yet
> -----> Strawman suggestion: Make the file format "truly" binary (flip
> a bit so it does not unzip) to avoid peeking
> --------> No - the purpose of SCUFL2 was to open up the format!
>
> -- "Why did you go for RDF?"
> ----> "A workflow is by definition linking. It has also been raised
> the need to annotate any part of a workflow, or to talk about bits of
> a workflow outside Taverna. Thus having identifiers for every part,
> and linking between them are important. This is native in RDF, in
> formats like JSON you will have to invent your own standard for this,
> and then think about namespaces, name collisions, etc."
>
> - "Just do a single XML!"
> --> "I'm fine with it as it is now - what do we gain by removing the
> RDF-bits from the XML?"
> --> "What about all the features of the workflow bundle? Attaching
> binaries? Annotations?"
> -----> "Keep both workflow bundle format (as it is) and a flat XML
> format (single file)"
> ----------> "The flat file is meant for what.. if it is to be
> "simpler" but "not complete", then is XML the right choice?"
>
>
> - "People really would rather like an API"
> --> "Which is the SCUFL2 toolkit - but do we also want to maintain
> libraries for Ruby, Python, etc?
> -----> Why was not the toolkit used when (internal project for
> web-based wf editor) decided to read/write .t2flow directly instead?"
>
> - "JSON would be good for other programmers, but not as a 
> write-by-hand-format"
> ---> "Too many braces and quotes"
>
> - "Yaml is mainly known only in Ruby world"
>
> - "Current format does not include typing information on ports, mime
> types, semantic types, etc. "
> --> "Was intended to do as annotations as it is not affecting
> execution of a workflow"
> -----> "But it's there in the ActivityInputPort in the engine"
> ----------> ".. but in reality only used for activity's to mark to
> themselves later if they should resolve a port as binary or string"
>
> -----
>
> So my strong option (as presented) was to keep the bundle format and
> scufl2 APIs as they are (avoiding further delays to its release), but
> add a new serialisation format which is simpler, but can't necessarily
> express all (existing or possible) Taverna workflows.
>
> However it was unclear what is the use case for this "simpler" format:
>
> a)  Should it be primarily for *writing* workflows (programmatically
> or manually?)  - and therefore the API should "fill in the blanks" and
> have many good defaults)
>
> b)  ..or primarily be for *reading* workflows - and therefore have as
> much information included, even inferred things like list depth on
> links, and annotations like mime types
>
>
>
> It was unclear what directions to take further.
>
> It was suggested to modifying the current workflow bundle format to
> use plain XML inside (which it does) but without the RDFie bits. I am
> personally against this. If the current XML with a schema is not
> actually usable for "pure XML" developers, then I would suggest to
> drop the schema bit and just do the bundle files as pure RDF. Then one
> might as well do them as Turtle files instead of overly verbose
> RDF/XML - it would also avoid anyone trying to parse them as XML
> anyway.
>
> Doing a "clean XML" inside would be easy - as it is just to modify the
> current XSD (XML schema) to remove the extra elements/attributes that
> make it valid RDF - but it would then just be an bastardized RDF, in
> which we still need to decide on how to serialise various activity
> configurations, etc.
>
> To me it is unclear what is the advantage of spending more engineering
> time to modify the format of the workflow bundle, if benefits are not
> that obvious.  If we are to do this, then it should be done with that
> particular benefit in mind ("making X easier"), not just a last-minute
> change for the sake of "not having RDF". Just throwing something
> together in the last minute is how we ended up with the current
> .t2flow format.
>
>
> Our intermittent decision is to keep the Scufl2 workflow bundle format
> as it is now, and finish of the remaining .t2flow activity parsers so
> that we can release the Scufl2 beta + Taverna 3 OSGi platform command
> line alpha before Christmas. (Suggestions for better name welcome!)
>
>
>
>
> So just to kickstart perhaps some email discussions instead:
>
> If you are interested in working with the workflow format outside
> Taverna's code base - could you give a one-line summary of what you
> want to do?
>
> Include details such as if you want to this manually or
> programmatically, which programming language you use, what existing
> formats like XML, RDF, JSON, Yaml you are comfortable with, and rather
> you would like a simple-read or simple-write format.
>
> If you want to inspect workflows, then what are you interested in?
> Port information? Crawling links between processors? Details of which
> services are used? Annotations like title/description?
>
> If you want to write workflows, what style are you writing?
> Copy-pasting-style from existing workflow templates, or
> write-from-scratch script-like definitions?
>
>
> On Tue, Oct 11, 2011 at 10:50, Stian Soiland-Reyes <[email protected]> 
> wrote:
>>
>> more details »
>>
>> Scufl2 alternative script-like workflow format
>>
>> As you might know, we are developing Scufl2 as the next language and
>> model for specifying Taverna workflow. We have already successfully
>> used this model in an internal release of an OSGi-based command line
>> tool for executing Taverna workflows, and for Taverna 3 are working
>> towards using this as the workflow model being edited by the
>> workbench.
>>
>> One of the motivations for moving from .t2flow to Scufl2 is to allow
>> third-party tools (like myExperiment) easier access to the workflow
>> structure, for reading, inspection, annotation and creation of
>> workflows. Ideally this should also allow the development of
>> alternative workflow building environments independent of the workflow
>> engine, like a simplified web-based editor.
>>
>> Scufl2 as developed so far includes a Workflow Bundle format [1],
>> using structured folders (normally archived in a ZIP-file for
>> distribution purposes) of RDF/XML files - which can also be
>> created/parsed with an XML Schema if the appropriate xsi:type
>> attribute is included. [2]. (Note that the Scufl2 wiki pages are still
>> at a draft stage)
>>
>> Although this format allows for all the Taverna language features to
>> be specified and used in an extensible manner, for instance allowing
>> bundling of data, provenance and runtime libraries, plugins to specify
>> which options they expect, etc., this format is more of an exchange
>> and bundling format than a format for manual editing.
>>
>> The Scufl2 toolkit [3] allows reading and writing the common Scufl2
>> workflow model in many different formats, this is for instance how it
>> reads .t2flow and SCUFL 1 workflows at the moment. There is also an
>> internal textual "debug output" format. [4]
>>
>> It has been raised a requirement to develop a more user-fronted
>> textual format for editing workflows by hand in an editor. The current
>> Scufl2 development is very much enabled to do so.
>>
>> My suggestion is to do this as an additional SCUFL2 format, as a
>> simple text file in YAML or JSON, for writing regular straight-forward
>> workflows, with enough automagic (such as port depths and iteration
>> strategies) filled in by a combination of SCUFL2 tools and the engine.
>>
>> For instance in made-up-JSON using the external tool activity:
>>
>> {
>> "workflow" : {
>> "inputs": ("name"),
>> "outputs": ("greeting"),
>> "processors": {
>> "hello": {
>> "type": "tool",
>> "command": "echo Hello, %%name%%"
>> }
>> }
>> "links": {
>> "name": "hello:name",
>> "hello:STDOUT": "greeting"
>> }
>> }
>> }
>>
>> Such a format would not be able to express every .t2flow or Scufl2 WB
>> workflow, but by limiting the scope we can avoid many details which
>> would make the format too verbose or magic, for instance port mapping,
>> alternative activities, and complex activity configurations.
>>
>> I hereby propose a meeting/Skype call to discuss the need and
>> implementation plan of such an alternative serialisation format for
>> SCUFL2 workflows - with the aim of workflows to be easy to edit and
>> read by hand. If you are interested in attending the meeting (no
>> matter if you are part of the myGrid team or not), please mark your
>> availability in the Doodle poll [3].
>>
>> [1] 
>> http://www.mygrid.org.uk/dev/wiki/display/developer/Taverna+Workflow+Bundle
>> [2] http://www.mygrid.org.uk/dev/wiki/display/developer/Scufl2-WorkflowBundle
>> [3] https://github.com/mygrid/scufl2/
>> [4] 
>> https://github.com/myGrid/scufl2/blob/master/scufl2-api/src/test/resources/uk/org/taverna/scufl2/api/io/HelloWorld.txt
>> [5] http://www.doodle.com/ybruwger8mi5bnn7
>>
>> When
>> Tue 2011-10-11 15:00 – 16:00 London
>> Where
>> Skype / myGrid (map)
>> Calendar
>> [email protected]
>> Who
>> •
>> [email protected] organiser
>> •
>> [email protected]
>> •
>> [email protected]
>>
>> Going?   Yes - Maybe - No    more options »
>>
>> Invitation from Google Calendar
>>
>> You are receiving this courtesy email at the account 
>> [email protected] because you are an attendee of this 
>> event.
>>
>> To stop receiving future notifications for this event, decline this event. 
>> Alternatively, you can sign up for a Google account at 
>> https://www.google.com/calendar/ and control your notification settings for 
>> your entire calendar.
>>
>> ------------------------------------------------------------------------------
>> All the data continuously generated in your IT infrastructure contains a
>> definitive record of customers, application performance, security
>> threats, fraudulent activity and more. Splunk takes this data and makes
>> sense of it. Business sense. IT sense. Common sense.
>> http://p.sf.net/sfu/splunk-d2d-oct
>> _______________________________________________
>> taverna-hackers mailing list
>> [email protected]
>> Web site: http://www.taverna.org.uk
>> Mailing lists: http://www.taverna.org.uk/about/contact-us/
>> Developers Guide: http://www.taverna.org.uk/developers/
>
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>

Attachment: Galaxy-Workflow-Test1.ga
Description: Binary data

------------------------------------------------------------------------------
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/about/contact-us/
Developers Guide: http://www.taverna.org.uk/developers/

Reply via email to