Re: [Taverna-hackers] Handling Documents

Yoshinobu Kano Thu, 09 Jul 2009 06:55:58 -0700

Hi Stian,

Thank you very much for your helps again.


I would like to follow your advice -- as far as I understand, make an
APIConsumer, modifiying BiomartActivity.java.

May I have a pointer to any document which describes about creating an
APIConsumer code in general,
or information which *.jar files I need on the classpath,
when I create my own activity java code and compile?

Thanks,

-Yoshinobu

On Wed, Jul 8, 2009 at 3:09 PM, Stian
Soiland-Reyes<[email protected]> wrote:
> We have not yet exposed pipelining to the interface used by the
> Beanshell scripts.
>
> It is possible to do what you want by implementing your own subclass
> of Activity - you might want to look at the BiomartActivity which does
> this kind of pipelining.
>
> Basically you are able to return several times through the callback
> object in the Activity - you would return with indexes, and in the end
> return the full list.
>
> From an Activity you will also be able to interface with the reference
> manager, so that you can register the data values and get a  reference
> back - these are the ones returned and collected in the full list -
> and they should have a smaller memory footprint.
>
> Such an activity would have a granular depth that is lower (say 0)
> than the actual output depth (1) - so it means the end result is depth
> 1, but I'll output one and one item at depth 0.
>
>
> I tried making a workflow which implemented it's own java.util.List
> subclass and returned a fancy Iterator (which returned new values with
> a 10% chance of reaching end of list), but as the beanshell script
> still has granular output depth 1 no pipelining would occur in the
> workflow before the iterator was finished.
>
> see 
> http://taverna.googlecode.com/svn/taverna/engine/net.sf.taverna.t2.activities/tags/activities-0.8/biomart-activity/src/main/java/net/sf/taverna/t2/activities/biomart/BiomartActivity.java
> for an activity that does this currently (because it's working with a
> HTTP-based protocol with database rows sent back tab-separated - it
> can return items even before the full HTTP transfer is finished)
>
> As you see it's slightly trickier than normal because you will have to
> keep track of the list, but the key lines are:
>
>
>
> // Register value
> T2Reference data = referenceService.register(resultLine[i],
> outputDepth - 1, true, callback.getContext());
>
> // Populate output map for all ports for this given index
> partialOutputData.put(outputName, data);
> // Keep track of values so far
> outputLists.get(outputName).add((int) index, data)
>
> // Partial results
> callback.receiveResult(partialOutputData, new int[] { (int) index });
>
>
> ..
>
> // Finally return the full list (of references)
> outputData = new HashMap();
> outputData.put(outputName,
> referenceService.register(outputLists.get(outputName),
>                                                                               
>  outputDepth, true, callback.getContext()));
> callback.receiveResult(outputData, new int[0]);
>
>
>
>
> On Wed, Jul 8, 2009 at 08:53, Yoshinobu Kano<[email protected]> wrote:
>> Hi,
>>
>> Thanks to all of your kind helps, I have achieved many issues needed,
>> but another issue arised regarding to the list generation.
>> May I ask your help again?
>> I have read the Taverna2-helpset.pdf but could not find a solution.
>>
>> I am trying to create a local worker, which essentially outputs a list
>> (depth 1) without input.
>> However, since the data size could be quite large, I would like to
>> make this output in stream-mannar using the Taverna built-in
>> behaviour,
>> to avoid loading everything on the memory at the same time.
>>
>> What I thought is to make this component
>> dummy-single-value-in/single-value-out,
>> then feed a dummy list to its input to make use of the Taverna
>> built-in iterator.
>> The problem is that the size of the output list is unknown until all
>> of the process is done,
>> I need to change the size of the dummy-input-list dynamically,
>> depending on the output signal (boolean, end of the process or not) of
>> the component.
>> .
>> Since the list seems to be represented as java.util.List,
>> it might be possible but up to the internal implmenetation of Taverna
>> -- is it possible to add a new element to the input list dynamically
>> (i.e. during the iteration of the very input list itself)?
>>
>> Are there any other solution to this problem?
>>
>> Thank you very much in advance,
>>
>> -Yoshinobu
>>
>> On Thu, Jun 11, 2009 at 9:36 AM, Stian
>> Soiland-Reyes<[email protected]> wrote:
>>> On Thu, Jun 11, 2009 at 06:52, Yoshinobu Kano<[email protected]> 
>>> wrote:
>>>
>>>
>>>> Since I also cannot imagine that a normal NLP tool does not require
>>>> the actual text,
>>>> and the annotations added by the tools tend to be larger than the raw
>>>> text data,
>>>> passing URLs would not be a good option for the connection between
>>>> text mining components.
>>>> However for the Taverna-UCompare/UIMA interface, URLs would make sense
>>>> when the input is a URL referred document.
>>>
>>> Note that URIs could be any URI or another kind of reference, it
>>> doesn't have to be a world wide accessible HTTP-based URL - it could
>>> be as simple as urn:uuid:9321d5b1-8904-43a5-8a21-f92bae6d9fa7
>>>
>>> The main point is if you want to avoid sending large documents from a
>>> service, to Taverna, and then just upload it again to the next
>>> service, when those two services could exchange the documents in a
>>> more efficient manner (and to lower Taverna's memory footprint), then
>>> using references like URIs would make this possible - and if you did
>>> go for HTTP-urls (it could be links to stuff within the service) those
>>> would also be accessible for outside services.
>>>
>>>
>>>
>>>> Well that is my question for this Taverna/Bio* community.
>>>> Probably we can assume that the normal input is document based - an
>>>> abstract or a full text of an academic paper.
>>>
>>> I guess it would come down to what you decide to do in your workflow,
>>> and what you want to do in your service code. :-)
>>>
>>> I would guess that it would be good to keep the things that you are
>>> going to play around with, such as deciding which algorithms to use,
>>> which databases to fetch from, etc, should be done or initiated by the
>>> workflow. The boring number crunching and analysis should be done by
>>> the services.
>>>
>>> Another thing is if you want to use external services, then obviously
>>> it would be great if your services played on the same 'level' so you
>>> could make two versions of the same workflow, where one uses your
>>> service, and another a similar service provided by some Japanese
>>> university.
>>>
>>> So it comes down to the actual research that you are planning to do,
>>> really.. :-)
>>>
>>>
>>>
>>>> A good news! This strategy would resolve my concern.
>>>> How many users use 1.7/2.0/2.1b - how much is the backward compatibility?
>>>> Would it be fine to make everything on 2.1b?
>>>
>>> Not sure about the usage numbers, 2.1b1 is still quite fresh.
>>>
>>> 2.x workflows should be compatible which each other, and 2.x can open
>>> 1.x workflows. However, you can't open a 2.x workflow in 1.x.
>>>
>>> Based on the feedback we have received so far, I would recommend
>>> looking at 2.1b1.
>>>
>>> However, if you are developing your own extensions to Taverna, do note
>>> that many of the APIs have changed between 1.x and 2.x - so you have
>>> to decide early. Unfortunately the developer documentation for 2.x is
>>> not very complete yet, but of course you are free to look at existing
>>> source code. You can also use this list to ask for pointers as to what
>>> APIs it would make sense to use - depending on what extension you are
>>> doing.
>>>
>>>
>>>> Since UIMA/U-Compare has their own workflow system,
>>>> and they have many functionalities including batch processing,
>>>> I need to send a single to the UIMA side workflow that the (list of)
>>>> input has finished, when the Taverna side workflow finishes
>>>> everything.
>>>
>>> OK, so you need to communicate with the UIMA side that you are now
>>> 'finished'. Then I would use a second processor and a control link, as
>>> I specified earlier.
>>>
>>> You don't specifically need the last item of the list - you just need
>>> to know that all the items have been sent individually to UIMA?
>>>
>>>
>>>> This is due to some of the text mining components are
>>>
>>> .. are..? :-)
>>>
>>>> Is there any way to notice the end of the list in the BeanShell, say
>>>> some special variable which has such a status?
>>>
>>> No. As I said before, the individual services don't have access to
>>> 'where' in the iterations they are.
>>>
>>>
>>>> # I used bsh.shared name space for my implementation, is it a safe
>>>> thing in Taverna?
>>>
>>> I doubt that would be very safe. I'm not sure if you would get
>>> interferences with different workflow runs or different beanshells in
>>> the same workflow - but that should be easy to test.
>>>
>>>
>>>
>>>
>>> --
>>> Stian Soiland-Reyes, myGrid team
>>> School of Computer Science
>>> The University of Manchester
>>>
>>
>>
>>
>> --
>> Yoshinobu Kano (Given/Family)
>> [email protected]
>> Project Research Associate, the University of Tokyo / U-Compare Project Lead
>> http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/
>>
>
>
>
> --
> Stian Soiland-Reyes, myGrid team
> School of Computer Science
> The University of Manchester
>



-- 
Yoshinobu Kano (Given/Family)
[email protected]
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/

------------------------------------------------------------------------------
Enter the BlackBerry Developer Challenge  
This is your chance to win up to $100,000 in prizes! For a limited time, 
vendors submitting new applications to BlackBerry App World(TM) will have
the opportunity to enter the BlackBerry Developer Challenge. See full prize  
details at: http://p.sf.net/sfu/Challenge
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
Developers Guide: http://www.mygrid.org.uk/tools/developer-information

Re: [Taverna-hackers] Handling Documents

Reply via email to