Hi Stian, Thank you very much for your helps again.
I would like to follow your advice -- as far as I understand, make an APIConsumer, modifiying BiomartActivity.java. May I have a pointer to any document which describes about creating an APIConsumer code in general, or information which *.jar files I need on the classpath, when I create my own activity java code and compile? Thanks, -Yoshinobu On Wed, Jul 8, 2009 at 3:09 PM, Stian Soiland-Reyes<[email protected]> wrote: > We have not yet exposed pipelining to the interface used by the > Beanshell scripts. > > It is possible to do what you want by implementing your own subclass > of Activity - you might want to look at the BiomartActivity which does > this kind of pipelining. > > Basically you are able to return several times through the callback > object in the Activity - you would return with indexes, and in the end > return the full list. > > From an Activity you will also be able to interface with the reference > manager, so that you can register the data values and get a reference > back - these are the ones returned and collected in the full list - > and they should have a smaller memory footprint. > > Such an activity would have a granular depth that is lower (say 0) > than the actual output depth (1) - so it means the end result is depth > 1, but I'll output one and one item at depth 0. > > > I tried making a workflow which implemented it's own java.util.List > subclass and returned a fancy Iterator (which returned new values with > a 10% chance of reaching end of list), but as the beanshell script > still has granular output depth 1 no pipelining would occur in the > workflow before the iterator was finished. > > see > http://taverna.googlecode.com/svn/taverna/engine/net.sf.taverna.t2.activities/tags/activities-0.8/biomart-activity/src/main/java/net/sf/taverna/t2/activities/biomart/BiomartActivity.java > for an activity that does this currently (because it's working with a > HTTP-based protocol with database rows sent back tab-separated - it > can return items even before the full HTTP transfer is finished) > > As you see it's slightly trickier than normal because you will have to > keep track of the list, but the key lines are: > > > > // Register value > T2Reference data = referenceService.register(resultLine[i], > outputDepth - 1, true, callback.getContext()); > > // Populate output map for all ports for this given index > partialOutputData.put(outputName, data); > // Keep track of values so far > outputLists.get(outputName).add((int) index, data) > > // Partial results > callback.receiveResult(partialOutputData, new int[] { (int) index }); > > > .. > > // Finally return the full list (of references) > outputData = new HashMap(); > outputData.put(outputName, > referenceService.register(outputLists.get(outputName), > > outputDepth, true, callback.getContext())); > callback.receiveResult(outputData, new int[0]); > > > > > On Wed, Jul 8, 2009 at 08:53, Yoshinobu Kano<[email protected]> wrote: >> Hi, >> >> Thanks to all of your kind helps, I have achieved many issues needed, >> but another issue arised regarding to the list generation. >> May I ask your help again? >> I have read the Taverna2-helpset.pdf but could not find a solution. >> >> I am trying to create a local worker, which essentially outputs a list >> (depth 1) without input. >> However, since the data size could be quite large, I would like to >> make this output in stream-mannar using the Taverna built-in >> behaviour, >> to avoid loading everything on the memory at the same time. >> >> What I thought is to make this component >> dummy-single-value-in/single-value-out, >> then feed a dummy list to its input to make use of the Taverna >> built-in iterator. >> The problem is that the size of the output list is unknown until all >> of the process is done, >> I need to change the size of the dummy-input-list dynamically, >> depending on the output signal (boolean, end of the process or not) of >> the component. >> . >> Since the list seems to be represented as java.util.List, >> it might be possible but up to the internal implmenetation of Taverna >> -- is it possible to add a new element to the input list dynamically >> (i.e. during the iteration of the very input list itself)? >> >> Are there any other solution to this problem? >> >> Thank you very much in advance, >> >> -Yoshinobu >> >> On Thu, Jun 11, 2009 at 9:36 AM, Stian >> Soiland-Reyes<[email protected]> wrote: >>> On Thu, Jun 11, 2009 at 06:52, Yoshinobu Kano<[email protected]> >>> wrote: >>> >>> >>>> Since I also cannot imagine that a normal NLP tool does not require >>>> the actual text, >>>> and the annotations added by the tools tend to be larger than the raw >>>> text data, >>>> passing URLs would not be a good option for the connection between >>>> text mining components. >>>> However for the Taverna-UCompare/UIMA interface, URLs would make sense >>>> when the input is a URL referred document. >>> >>> Note that URIs could be any URI or another kind of reference, it >>> doesn't have to be a world wide accessible HTTP-based URL - it could >>> be as simple as urn:uuid:9321d5b1-8904-43a5-8a21-f92bae6d9fa7 >>> >>> The main point is if you want to avoid sending large documents from a >>> service, to Taverna, and then just upload it again to the next >>> service, when those two services could exchange the documents in a >>> more efficient manner (and to lower Taverna's memory footprint), then >>> using references like URIs would make this possible - and if you did >>> go for HTTP-urls (it could be links to stuff within the service) those >>> would also be accessible for outside services. >>> >>> >>> >>>> Well that is my question for this Taverna/Bio* community. >>>> Probably we can assume that the normal input is document based - an >>>> abstract or a full text of an academic paper. >>> >>> I guess it would come down to what you decide to do in your workflow, >>> and what you want to do in your service code. :-) >>> >>> I would guess that it would be good to keep the things that you are >>> going to play around with, such as deciding which algorithms to use, >>> which databases to fetch from, etc, should be done or initiated by the >>> workflow. The boring number crunching and analysis should be done by >>> the services. >>> >>> Another thing is if you want to use external services, then obviously >>> it would be great if your services played on the same 'level' so you >>> could make two versions of the same workflow, where one uses your >>> service, and another a similar service provided by some Japanese >>> university. >>> >>> So it comes down to the actual research that you are planning to do, >>> really.. :-) >>> >>> >>> >>>> A good news! This strategy would resolve my concern. >>>> How many users use 1.7/2.0/2.1b - how much is the backward compatibility? >>>> Would it be fine to make everything on 2.1b? >>> >>> Not sure about the usage numbers, 2.1b1 is still quite fresh. >>> >>> 2.x workflows should be compatible which each other, and 2.x can open >>> 1.x workflows. However, you can't open a 2.x workflow in 1.x. >>> >>> Based on the feedback we have received so far, I would recommend >>> looking at 2.1b1. >>> >>> However, if you are developing your own extensions to Taverna, do note >>> that many of the APIs have changed between 1.x and 2.x - so you have >>> to decide early. Unfortunately the developer documentation for 2.x is >>> not very complete yet, but of course you are free to look at existing >>> source code. You can also use this list to ask for pointers as to what >>> APIs it would make sense to use - depending on what extension you are >>> doing. >>> >>> >>>> Since UIMA/U-Compare has their own workflow system, >>>> and they have many functionalities including batch processing, >>>> I need to send a single to the UIMA side workflow that the (list of) >>>> input has finished, when the Taverna side workflow finishes >>>> everything. >>> >>> OK, so you need to communicate with the UIMA side that you are now >>> 'finished'. Then I would use a second processor and a control link, as >>> I specified earlier. >>> >>> You don't specifically need the last item of the list - you just need >>> to know that all the items have been sent individually to UIMA? >>> >>> >>>> This is due to some of the text mining components are >>> >>> .. are..? :-) >>> >>>> Is there any way to notice the end of the list in the BeanShell, say >>>> some special variable which has such a status? >>> >>> No. As I said before, the individual services don't have access to >>> 'where' in the iterations they are. >>> >>> >>>> # I used bsh.shared name space for my implementation, is it a safe >>>> thing in Taverna? >>> >>> I doubt that would be very safe. I'm not sure if you would get >>> interferences with different workflow runs or different beanshells in >>> the same workflow - but that should be easy to test. >>> >>> >>> >>> >>> -- >>> Stian Soiland-Reyes, myGrid team >>> School of Computer Science >>> The University of Manchester >>> >> >> >> >> -- >> Yoshinobu Kano (Given/Family) >> [email protected] >> Project Research Associate, the University of Tokyo / U-Compare Project Lead >> http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/ >> > > > > -- > Stian Soiland-Reyes, myGrid team > School of Computer Science > The University of Manchester > -- Yoshinobu Kano (Given/Family) [email protected] Project Research Associate, the University of Tokyo / U-Compare Project Lead http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/ ------------------------------------------------------------------------------ Enter the BlackBerry Developer Challenge This is your chance to win up to $100,000 in prizes! For a limited time, vendors submitting new applications to BlackBerry App World(TM) will have the opportunity to enter the BlackBerry Developer Challenge. See full prize details at: http://p.sf.net/sfu/Challenge _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/ Developers Guide: http://www.mygrid.org.uk/tools/developer-information
