[Taverna-hackers] Handling Documents

Yoshinobu Kano Tue, 09 Jun 2009 01:48:52 -0700

Hi,

I am trying to embed our text mining system into Taverna workflow.
Since I myself is not a biologist in any meaning,
I am not sure how the text/document is handled in the Taverna (or
biological) community.
A couple of questions regarding the text handling issue.



In the Results tab, when Result Type is "Text" newlines are not shown.
Is there any way to display newlines as really newlines?
How about wrapping lines?


Iteration Strategy.
It seems like that the iteration strategy is handled using Java
objects like ArrayList,
the workflow itself is not iterated.
Is this a correct understanding, or is there any way to iterate the
same workflow as a "batch" like way?
How do you handle a large scale document set, e.g. the whole Pubmed
papers, to avoid the large memory consumption?
If there is such a batch mode exists, how do I notice the end of the batch?
Further, what is the most popular unit to handle text in this
community - sentence, document, word... ?

Any help appreciated!

Thanks,

-Yoshinobu
-- 
Yoshinobu Kano (Given/Family)
[email protected]
Project Research Associate, the University of Tokyo / U-Compare Project Lead
http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/

------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
taverna-hackers mailing list
[email protected]
Web site: http://www.taverna.org.uk
Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/
Developers Guide: http://www.mygrid.org.uk/tools/developer-information

[Taverna-hackers] Handling Documents

Reply via email to