On Thu, Jun 11, 2009 at 06:52, Yoshinobu Kano<[email protected]> wrote:
> Since I also cannot imagine that a normal NLP tool does not require > the actual text, > and the annotations added by the tools tend to be larger than the raw > text data, > passing URLs would not be a good option for the connection between > text mining components. > However for the Taverna-UCompare/UIMA interface, URLs would make sense > when the input is a URL referred document. Note that URIs could be any URI or another kind of reference, it doesn't have to be a world wide accessible HTTP-based URL - it could be as simple as urn:uuid:9321d5b1-8904-43a5-8a21-f92bae6d9fa7 The main point is if you want to avoid sending large documents from a service, to Taverna, and then just upload it again to the next service, when those two services could exchange the documents in a more efficient manner (and to lower Taverna's memory footprint), then using references like URIs would make this possible - and if you did go for HTTP-urls (it could be links to stuff within the service) those would also be accessible for outside services. > Well that is my question for this Taverna/Bio* community. > Probably we can assume that the normal input is document based - an > abstract or a full text of an academic paper. I guess it would come down to what you decide to do in your workflow, and what you want to do in your service code. :-) I would guess that it would be good to keep the things that you are going to play around with, such as deciding which algorithms to use, which databases to fetch from, etc, should be done or initiated by the workflow. The boring number crunching and analysis should be done by the services. Another thing is if you want to use external services, then obviously it would be great if your services played on the same 'level' so you could make two versions of the same workflow, where one uses your service, and another a similar service provided by some Japanese university. So it comes down to the actual research that you are planning to do, really.. :-) > A good news! This strategy would resolve my concern. > How many users use 1.7/2.0/2.1b - how much is the backward compatibility? > Would it be fine to make everything on 2.1b? Not sure about the usage numbers, 2.1b1 is still quite fresh. 2.x workflows should be compatible which each other, and 2.x can open 1.x workflows. However, you can't open a 2.x workflow in 1.x. Based on the feedback we have received so far, I would recommend looking at 2.1b1. However, if you are developing your own extensions to Taverna, do note that many of the APIs have changed between 1.x and 2.x - so you have to decide early. Unfortunately the developer documentation for 2.x is not very complete yet, but of course you are free to look at existing source code. You can also use this list to ask for pointers as to what APIs it would make sense to use - depending on what extension you are doing. > Since UIMA/U-Compare has their own workflow system, > and they have many functionalities including batch processing, > I need to send a single to the UIMA side workflow that the (list of) > input has finished, when the Taverna side workflow finishes > everything. OK, so you need to communicate with the UIMA side that you are now 'finished'. Then I would use a second processor and a control link, as I specified earlier. You don't specifically need the last item of the list - you just need to know that all the items have been sent individually to UIMA? > This is due to some of the text mining components are .. are..? :-) > Is there any way to notice the end of the list in the BeanShell, say > some special variable which has such a status? No. As I said before, the individual services don't have access to 'where' in the iterations they are. > # I used bsh.shared name space for my implementation, is it a safe > thing in Taverna? I doubt that would be very safe. I'm not sure if you would get interferences with different workflow runs or different beanshells in the same workflow - but that should be easy to test. -- Stian Soiland-Reyes, myGrid team School of Computer Science The University of Manchester ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/ Developers Guide: http://www.mygrid.org.uk/tools/developer-information
