Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Marshall Schor wrote: Hi everyone. I'm restarting the UIMA Proposal thread based on the comments so far, with a revised proposal that more closely follows http://incubator.apache.org/guides/proposal.html. The first paragraph was rewritten to more clearly state what the proposal was, in plainer language. It is also slightly updated, reflecting the submission of UIMA to OASIS for standardization work. I have updated the Wiki accordingly. I wasn't too sure about the leveling of some of the sections, feel free to correct. --Thilo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Hello, everyone. In conjunction with our proposal to Apache for the UIMA framework, we have submitted a Charter to develop a community based standard for the UIMA Specification (not implementation) to OASIS (www.oasis-open.org). OASIS is a standards development organization with a good track record for developing and promoting open standards around many things including XML processing, SOA, document management and web services. The UIMA Specification is intended to parallel the efforts of the open-source UIMA Java Framework implementation. It is a platform-independent specification intended to facilitate interoperability of text and multi-modal analytics across modalities, platforms and frameworks. The Apache proposal for the UIMA framework references this work, and would comply with this standard. (A rough analogy might be the Apache web server as the Apache project, and the HTTP protocol standard from W3C as the corresponding standards work). If you are interested in developing the UIMA Specification standard, can commit to participating in the technical committee and want to be one of its founding members, you can do so by following the standard OASIS process for this. The first step would be to become a member of OASIS ( http://www.oasis-open.org/join/categories.php) by October 5th (if you're not already a member :-), and to let us know of your intent to be a founding member; we already have four others from different organizations/companies signed up as founding members. On October 5 there will be an open Call For Participation to OASIS members for participation in the UIMA Technical Committee. The founding membership must be formed before October 5th. -Marshall Schor - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Based on the previous discussion, looks like an interesting product, and it has three mentors signed up already. Question to incubator PMC - any other obstacles to starting an acceptance vote? Andrus On Sep 9, 2006, at 4:00 PM, Marshall Schor wrote: Hello everyone, I'm restarting this thread on the Unstructured Information Management Architecture implementation (UIMA) framework, in the hopes of moving this along better; this time it also has the prefix [PROPOSAL] which I had left out due to over-excitement at doing my first posting to this list :-) . Please consider this proposal (on the incubator wiki because it is quite long: http://wiki.apache.org/incubator/UimaProposal ), and help us move it along toward getting it voted on by the Incubator PMC. Two important clarifying emails (as well as the whole previous thread) can be found here: http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project% 3A-Unstructured-Information-Management-Architecture---UIMA- p5987788.html and http://www.nabble.com/Re%3A-Proposal-for-a-new- incubation-project%3A-Unstructured-Information-Management- Architecture---UIMA-p5986403.html (These are also hyperlinks in the wiki to these at the end of the first small section.) -Marshall Leo Simons wrote: On Fri, Aug 25, 2006 at 06:04:04PM +0200, Thilo Goetz wrote: snip/ I hope this gives you a better idea what UIMA is about Yep, this and other explanations made it a lot clearer, thanks! UIMA sounds ambituous and interesting. cheers, Leo Niclas Hedhman wrote: On Thursday 24 August 2006 03:21, Marshall Schor wrote: Proposal for Incubation Project: Unstructured Information Management Architecture - UIMA From going from WTF is this to Hmmm... interesting after Leo's brilliant please clarify (resusable as well) mail. I think this is an area that has plenty of potential, possibly with a lot of interested parties in academia at large, I think ASF could be a good community breeding ground. I'm in favour of this, but not capable of contributing in any form. Cheers Niclas Yonik Seeley wrote: On 8/26/06, Thilo Goetz [EMAIL PROTECTED] wrote: From an application perspective, we have great hopes for a cooperation with the Lucene project. Great, I think this is something I'd like to get involved in! I've been thinking about how Solr integration could work. You then also need a search engine that can index that extra information and make it available for search. Without getting into too much detail here, some info could be immediately usable by Lucene based apps (like entity extraction, where you can add info via a new field in the document). Parts-of-speech type of stuff is currently more difficult of course. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Having finally read all the emails related to this proposal, I'm very much for this puppy entering ASF and eventually getting it going with Lucene and friends. A few questions. 1. What you are proposing for ASF is the UIMA 2.0 code that currently lives on SF, correct? 2. What about the SDK, and could you tell me/us what's in the SDK that is not in the SF code? (I'm confused, because your proposal includes references to tools for development and design of UIMA components, but doesn't that typically live in an SDK?) 3. I'm a bit puzzled why something that sounds like a framework/pipeline for hooking up components with pre-defined input/output adapters ends up with with a 400 page user guide/book. Perhaps I should present this as a question. How come? Or is that user guide for the SDK only? Otis - Original Message From: Marshall Schor [EMAIL PROTECTED] To: general@incubator.apache.org Sent: Saturday, September 9, 2006 8:00:57 AM Subject: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework Hello everyone, I'm restarting this thread on the Unstructured Information Management Architecture implementation (UIMA) framework, in the hopes of moving this along better; this time it also has the prefix [PROPOSAL] which I had left out due to over-excitement at doing my first posting to this list :-) . Please consider this proposal (on the incubator wiki because it is quite long: http://wiki.apache.org/incubator/UimaProposal ), and help us move it along toward getting it voted on by the Incubator PMC. Two important clarifying emails (as well as the whole previous thread) can be found here: http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5987788.html and http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5986403.html (These are also hyperlinks in the wiki to these at the end of the first small section.) -Marshall Leo Simons wrote: On Fri, Aug 25, 2006 at 06:04:04PM +0200, Thilo Goetz wrote: snip/ I hope this gives you a better idea what UIMA is about Yep, this and other explanations made it a lot clearer, thanks! UIMA sounds ambituous and interesting. cheers, Leo Niclas Hedhman wrote: On Thursday 24 August 2006 03:21, Marshall Schor wrote: Proposal for Incubation Project: Unstructured Information Management Architecture - UIMA From going from WTF is this to Hmmm... interesting after Leo's brilliant please clarify (resusable as well) mail. I think this is an area that has plenty of potential, possibly with a lot of interested parties in academia at large, I think ASF could be a good community breeding ground. I'm in favour of this, but not capable of contributing in any form. Cheers Niclas Yonik Seeley wrote: On 8/26/06, Thilo Goetz [EMAIL PROTECTED] wrote: From an application perspective, we have great hopes for a cooperation with the Lucene project. Great, I think this is something I'd like to get involved in! I've been thinking about how Solr integration could work. You then also need a search engine that can index that extra information and make it available for search. Without getting into too much detail here, some info could be immediately usable by Lucene based apps (like entity extraction, where you can add info via a new field in the document). Parts-of-speech type of stuff is currently more difficult of course. -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework
Hi, and thanks for taking the time to read all the emails on this. Here's some answers to your questions, below. Otis Gospodnetic wrote: Having finally read all the emails related to this proposal, I'm very much for this puppy entering ASF and eventually getting it going with Lucene and friends. A few questions. 1. What you are proposing for ASF is the UIMA 2.0 code that currently lives on SF, correct? Yes, that is correct. 2. What about the SDK, and could you tell me/us what's in the SDK that is not in the SF code? (I'm confused, because your proposal includes references to tools for development and design of UIMA components, but doesn't that typically live in an SDK?) The only other thing in the SDK that is not coming to Apache is a version of a semantic search engine (and some associated components) that can index both keywords, and also labeled spans containing the keywords; this is because Apache already has Lucene, and that engine is a good candidate for extension in this manner. The SDK includes tooling and examples; those are coming. In addition, we're bringing the framework test cases. 3. I'm a bit puzzled why something that sounds like a framework/pipeline for hooking up components with pre-defined input/output adapters ends up with with a 400 page user guide/book. Perhaps I should present this as a question. How come? Or is that user guide for the SDK only? There are several reasons for this. One reason is that the book's first part is actually a general introduction to the rationale behind the framework, followed by a tutorial (chapters 4-7). Our target audience were mainly Researchers who worked down in the depths of analytic algorithms, and who didn't necessarily spend much time keeping up to date with newer technologies for building software applications. So we found ourselves giving tutorials, and decided it would be good to include those in the big book. Besides the framework, we have some tooling (both Eclipse IDE based, and stand alone); there are chapters on these tools and how to use them. The architecture includes the idea of specifying lots of meta-data about the components, in XML, and our early users had a lot of trouble getting the XML right. So we built an Eclipse editor for editing the XML which does a whole bunch of consistency checking, and presents a visual model to the user describing the component meta-data in a friendlier way than just XML. The chapter describing this tool is one of the larger ones. Finally, when you get into the details, you'll find there's more to this than it first appears :-). Does that help explain the manual length? -Marshall Schor Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]