Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-22 Thread Thilo Goetz

Marshall Schor wrote:
Hi everyone.  I'm restarting the UIMA Proposal thread based on the 
comments so far, with a revised proposal that more closely follows 
http://incubator.apache.org/guides/proposal.html.  The first paragraph 
was rewritten to more clearly state what the proposal was, in plainer 
language.  It is also slightly updated, reflecting the submission of 
UIMA to OASIS for standardization work.


I have updated the Wiki accordingly.  I wasn't too sure about the 
leveling of some of the sections, feel free to correct.


--Thilo


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-13 Thread Marshall Schor

Hello, everyone.

In conjunction with our proposal to Apache for the UIMA framework, we 
have submitted a Charter to develop a community based standard for the 
UIMA Specification (not implementation) to OASIS (www.oasis-open.org).  
OASIS is a standards development organization with a good track record 
for developing and promoting open standards around many things including 
XML processing, SOA, document management and web services.


The UIMA Specification is intended to parallel the efforts of the 
open-source UIMA Java Framework implementation. It  is a 
platform-independent specification intended to facilitate 
interoperability of text and multi-modal analytics across modalities, 
platforms and frameworks. The Apache proposal for the UIMA framework 
references this work, and would comply with this standard. (A rough 
analogy might be the Apache web server as the Apache project, and the 
HTTP protocol standard from W3C as the corresponding standards work).


If you are interested in developing the UIMA Specification standard, can 
commit to participating in the technical committee and want to be one of 
its founding members, you can do so by following the standard OASIS 
process for this.  The first step would be to become a member of OASIS ( 
http://www.oasis-open.org/join/categories.php) by October 5th (if you're 
not already a member :-), and to let us know of your intent to be a 
founding member; we already have four others from different 
organizations/companies signed up as founding members.


On October 5 there will be an open Call For Participation to OASIS 
members for participation in the UIMA Technical Committee. The founding 
membership must be formed before October 5th.


-Marshall Schor

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-09 Thread Andrus Adamchik
Based on the previous discussion, looks like an interesting product,  
and it has three mentors signed up already. Question to incubator PMC  
- any other obstacles to starting an acceptance vote?


Andrus


On Sep 9, 2006, at 4:00 PM, Marshall Schor wrote:

Hello everyone,

I'm restarting this thread on the Unstructured Information  
Management Architecture implementation (UIMA) framework, in the  
hopes of moving this along better; this time it also has the prefix  
[PROPOSAL] which I had left out due to over-excitement at doing my  
first posting to this list :-) .
Please consider this proposal  (on the incubator wiki because it is  
quite long: http://wiki.apache.org/incubator/UimaProposal ), and  
help us move it along toward getting it voted on by the Incubator PMC.


Two important clarifying emails (as well as the whole previous  
thread) can be found here:
http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project% 
3A-Unstructured-Information-Management-Architecture---UIMA- 
p5987788.html  and  http://www.nabble.com/Re%3A-Proposal-for-a-new- 
incubation-project%3A-Unstructured-Information-Management- 
Architecture---UIMA-p5986403.html
(These are also hyperlinks in the wiki to these at the end of the  
first small section.)


-Marshall


Leo Simons wrote:

On Fri, Aug 25, 2006 at 06:04:04PM +0200, Thilo Goetz wrote:
snip/


I hope this gives you a better idea what UIMA is about



Yep, this and other explanations made it a lot clearer, thanks!

UIMA sounds ambituous and interesting.

cheers,

Leo

Niclas Hedhman wrote:

On Thursday 24 August 2006 03:21, Marshall Schor wrote:



Proposal for Incubation Project: Unstructured Information Management
Architecture - UIMA



From going from WTF is this to Hmmm... interesting after Leo's  
brilliant please clarify (resusable as well) mail.


I think this is an area that has plenty of potential, possibly  
with a lot of interested parties in academia at large, I think ASF  
could be a good community breeding ground.


I'm in favour of this, but not capable of contributing in any form.


Cheers
Niclas


Yonik Seeley wrote:

On 8/26/06, Thilo Goetz [EMAIL PROTECTED] wrote:
 From an application perspective, we have great hopes for a  
cooperation

with the Lucene project.


Great, I think this is something I'd like to get involved in!
I've been thinking about how Solr integration could work.


You then also need a search engine that
can index that extra information and make it available for search.


Without getting into too much detail here, some info could be
immediately usable by Lucene based apps (like entity extraction,  
where

you can add info via a new field in the document).  Parts-of-speech
type of stuff is currently more difficult of course.

-Yonik



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-09 Thread Otis Gospodnetic
Having finally read all the emails related to this proposal, I'm very much for 
this puppy entering ASF and eventually getting it going with Lucene and 
friends.

A few questions.
1. What you are proposing for ASF is the UIMA 2.0 code that currently lives on 
SF, correct?
2. What about the SDK, and could you tell me/us what's in the SDK that is not 
in the SF code? (I'm confused, because your proposal includes references to 
tools for development and design of UIMA components, but doesn't that typically 
live in an SDK?)
3. I'm a bit puzzled why something that sounds like a framework/pipeline for 
hooking up components with pre-defined input/output adapters ends up with with 
a 400 page user guide/book.  Perhaps I should present this as a question.  How 
come?  Or is that user guide for the SDK only?

Otis


- Original Message 
From: Marshall Schor [EMAIL PROTECTED]
To: general@incubator.apache.org
Sent: Saturday, September 9, 2006 8:00:57 AM
Subject: [PROPOSAL] UIMA (Unstructured Information Management Architecture) 
Framework

Hello everyone,

I'm restarting this thread on the Unstructured Information Management 
Architecture implementation (UIMA) framework, in the hopes of moving 
this along better; this time it also has the prefix [PROPOSAL] which I 
had left out due to over-excitement at doing my first posting to this 
list :-) . 

Please consider this proposal  (on the incubator wiki because it is 
quite long: http://wiki.apache.org/incubator/UimaProposal ), and help us 
move it along toward getting it voted on by the Incubator PMC.

Two important clarifying emails (as well as the whole previous thread) 
can be found here:
 
http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5987788.html
  
and  
http://www.nabble.com/Re%3A-Proposal-for-a-new-incubation-project%3A-Unstructured-Information-Management-Architecture---UIMA-p5986403.html
(These are also hyperlinks in the wiki to these at the end of the first 
small section.)

-Marshall


Leo Simons wrote:
 On Fri, Aug 25, 2006 at 06:04:04PM +0200, Thilo Goetz wrote:
 snip/
   
 I hope this gives you a better idea what UIMA is about
 

 Yep, this and other explanations made it a lot clearer, thanks!

 UIMA sounds ambituous and interesting.

 cheers,

 Leo
Niclas Hedhman wrote:
 On Thursday 24 August 2006 03:21, Marshall Schor wrote:

   
 Proposal for Incubation Project: Unstructured Information Management
 Architecture - UIMA
 

 From going from WTF is this to Hmmm... interesting after Leo's 
 brilliant please clarify (resusable as well) mail.

 I think this is an area that has plenty of potential, possibly with a lot of 
 interested parties in academia at large, I think ASF could be a good 
 community breeding ground.

 I'm in favour of this, but not capable of contributing in any form.


 Cheers
 Niclas
   
Yonik Seeley wrote:
 On 8/26/06, Thilo Goetz [EMAIL PROTECTED] wrote:
  From an application perspective, we have great hopes for a cooperation
 with the Lucene project.

 Great, I think this is something I'd like to get involved in!
 I've been thinking about how Solr integration could work.

 You then also need a search engine that
 can index that extra information and make it available for search.

 Without getting into too much detail here, some info could be
 immediately usable by Lucene based apps (like entity extraction, where
 you can add info via a new field in the document).  Parts-of-speech
 type of stuff is currently more difficult of course.

 -Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [PROPOSAL] UIMA (Unstructured Information Management Architecture) Framework

2006-09-09 Thread Marshall Schor

Hi, and thanks for taking the time to read all the emails on this.

Here's some answers to your questions, below.

Otis Gospodnetic wrote:

Having finally read all the emails related to this proposal, I'm very much for this 
puppy entering ASF and eventually getting it going with Lucene and friends.

A few questions.
1. What you are proposing for ASF is the UIMA 2.0 code that currently lives on 
SF, correct?
  

Yes, that is correct.

2. What about the SDK, and could you tell me/us what's in the SDK that is not 
in the SF code? (I'm confused, because your proposal includes references to 
tools for development and design of UIMA components, but doesn't that typically 
live in an SDK?)
  
The only other thing in the SDK that is not coming to Apache is a 
version of a semantic search engine (and some associated components) 
that can index both keywords, and also labeled spans containing the 
keywords; this is because Apache already has Lucene, and that engine is 
a good candidate for extension in this manner.  The SDK includes tooling 
and examples; those are coming.  In addition, we're bringing the 
framework test cases.

3. I'm a bit puzzled why something that sounds like a framework/pipeline for 
hooking up components with pre-defined input/output adapters ends up with with 
a 400 page user guide/book.  Perhaps I should present this as a question.  How 
come?  Or is that user guide for the SDK only?
  
There are several reasons for this.  One reason is that the book's first 
part is actually a general introduction to the rationale behind the 
framework, followed by a tutorial (chapters 4-7).  Our target audience 
were mainly Researchers who worked down in the depths of analytic 
algorithms, and who didn't necessarily spend much time keeping up to 
date with newer technologies for building software applications.  So we 
found ourselves giving tutorials, and decided it would be good to 
include those in the big book.


Besides the framework, we have some tooling (both Eclipse IDE based, and 
stand alone); there are chapters on these tools and how to use them.  
The architecture includes the idea of specifying lots of meta-data about 
the components, in XML, and our early users had a lot of trouble getting 
the XML right.  So we built an Eclipse editor for editing the XML which 
does a whole bunch of consistency checking, and presents a visual model 
to the user describing the component meta-data in a friendlier way than 
just XML.  The chapter describing this tool is one of the larger ones. 

Finally, when you get into the details, you'll find there's more to this 
than it first appears :-).


Does that help explain the manual length?

-Marshall Schor


Otis


  



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]