Radwen ANIBA wrote: > Hi everyone, > > Following some examples applications of UIMA allow us to understand how > every component in UIMA framework works. That great. But one question that a > developper may ask is how to use the CAS to make a comparison of analyzed > documents. > > The CAS is common to everydocument and when analzing one of them we have an > acces to the CAS for writing or updating. > Let's imagine We have 3 documents to analyze. We write to the CAS metadata > relative to each of them, but to go futher for the analysis of the documents > it could be very interesting to compare these documents using the CAS, > either in multiple manner or in pairwise. > > To illustrate what i'm saying, let's imagine we are looking for email > adresses inside three big documents using UIMA regexp capabilities. > A result may be illustrated like this : > > Document 1 : Number of Unique emails 9 | Number of emails in common with > Document 2 : 10 | Number of emails in common with Document 3 : 6 > Document 2 : Number of Unique emails 5| Number of emails in common with > Document 1 : 20 | Number of emails in common with Document 3 : 1 > Document 3 : Number of Unique emails 4 | Number of emails in common with > Document 1 : 15 | Number of emails in common with Document 2 : 3 > > Here is a simple cross comparison of documents in pairwise using the CAS, My > question is how to achieve that ? > Do we need to create additional Type System for the common information ? We > have to do it on the fly dynamically ? > > Thanks > > Rad >
Hi Rad, using the CAS to do this will get expensive very quickly. You will not want to keep every document in its own CAS because of the memory overhead. I would probably write the information you're interested in to an external datastore (e.g., a DB such as Derby) and do the comparison there. --Thilo
