Hi Andrea, Thanks for your interest in our work.
>>> Further, what is the most popular unit to handle text in this >>> community - sentence, document, word... ? >> >> I do not know. > > All of the above, plus paragraphs, phrases, and any other logical > textual units you can dream up. Of course, I'm no biologist either - I > want this for social science research. The more flexible you can make > it, the more potential uses it will have. Well our system is everything for text mining/NLP, we already have several sets of sentence splitter, tokenizers, named entity taggers, parsers and many other kinds of tools ready to use. Our tools are completely interoperable based on the UIMA framework, no programming required to create a workflow since data type compatiblity is guranteed. Please visit http://u-compare.org/ for details. If you only need text mining/NLP with less human labor, U-Compare is for you. What I am not sure is whether we can assume that the input is a raw text document in this community... > one piece of analysis. For our uses, the applications of the tools are > drastically restricted when only one level of text is allowable, but > transcending the sentence structure is difficult for NLP. Do you mean something which has dependencies over sentences like coreference resolutions? Just from curiosity as an NLP researcher, what sort of analysis you are planning to perform? > I would definitely be interested in getting my hands on a text mining > plugin or service for Taverna. I would immediately be able to do quite > a few interesting snippets of research that are currently impossible > for me, starting with analysis of the 3.42 GB CSV of juicy search log > data that's just gathering virtual dust on my hard drive... Well the performance/output issue is another sort of problem. Our system is scable and possible to be launched locally, but you need to prepare your servers to run such a large data. I think we could provide our system as Taverna-linked soon. Cheers, -Yoshinobu > > Cheers, > > Andrea > > > Andrea Wiggins > PhD Student, School of Information Studies > Syracuse University > > 337 Hinds Hall > Syracuse, NY 13244 > [email protected] > www.andreawiggins.com > > > ------------------------------------------------------------------------------ > Crystal Reports - New Free Runtime and 30 Day Trial > Check out the new simplified licensing option that enables unlimited > royalty-free distribution of the report engine for externally facing > server and web deployment. > http://p.sf.net/sfu/businessobjects > _______________________________________________ > taverna-hackers mailing list > [email protected] > Web site: http://www.taverna.org.uk > Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/ > Developers Guide: http://www.mygrid.org.uk/tools/developer-information > -- Yoshinobu Kano (Given/Family) [email protected] Project Research Associate, the University of Tokyo / U-Compare Project Lead http://www-tsujii.is.s.u-tokyo.ac.jp/ http://u-compare.org/kano/ ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ taverna-hackers mailing list [email protected] Web site: http://www.taverna.org.uk Mailing lists: http://www.taverna.org.uk/taverna-mailing-lists/ Developers Guide: http://www.mygrid.org.uk/tools/developer-information
