Re: Using the Cas to compare documents

Thilo Goetz Thu, 25 Jun 2009 03:59:58 -0700

Radwen ANIBA wrote:
> Hi everyone,
> 
> Following some examples applications of UIMA allow us to understand how
> every component in UIMA framework works. That great. But one question that a
> developper may ask is how to use the CAS to make a comparison of analyzed
> documents.
> 
> The CAS is common to everydocument and when analzing one of them we have an
> acces to the CAS for writing or updating.
> Let's imagine We have 3 documents to analyze. We write to the CAS metadata
> relative to each of them, but to go futher for the analysis of the documents
> it could be very interesting to compare these documents using the CAS,
> either in multiple manner or in pairwise.
> 
> To illustrate what i'm saying, let's imagine we are looking for email
> adresses inside three big documents using UIMA regexp capabilities.
> A result may be illustrated like this :
> 
> Document 1 :  Number of Unique emails 9 | Number of emails in common with
> Document 2 : 10 | Number of emails in common with Document 3 : 6
> Document 2 :  Number of Unique emails 5| Number of emails in common with
> Document 1 : 20 | Number of emails in common with Document 3 : 1
> Document 3 :  Number of Unique emails 4 | Number of emails in common with
> Document 1 : 15 | Number of emails in common with Document 2 : 3
> 
> Here is a simple cross comparison of documents in pairwise using the CAS, My
> question is how to achieve that ?
> Do we need to create additional Type System for the common information ? We
> have to do it on the fly dynamically ?
> 
> Thanks
> 
> Rad
>


Hi Rad,

using the CAS to do this will get expensive very quickly.  You will
not want to keep every document in its own CAS because of the memory
overhead.  I would probably write the information you're interested
in to an external datastore (e.g., a DB such as Derby) and do the
comparison there.

--Thilo

Re: Using the Cas to compare documents

Reply via email to