Multi-Document Processing

Matthew Campbell Mon, 20 Aug 2007 14:28:39 -0700

Hey folks:

I'm looking at a process that runs each document through a bunch ofannotators to tag up various information, then I need to do someprocessing/manipulation of those documents based the information held inthe whole collection. I've been reading up on the CPE, but it lookslike it's primarily for running a collection of documents through anAE. I was hoping someone could point me in the right direction fordoing the collection-wide processing portion of my process.

I had started out by defining the process as one large aggregate AEand running each document through it, but I don't see a way to gothrough that initial tagging process for all documents and then move onto the next phase.I then switched gears and tried splitting up each phase into it'sown AE, but then I loose the complex Sofa mappings I had put togetherfor the previous attempt. So I guess this could be solved in two ways -one would be that the CPE has some sort of built-in method for doingcollection-wide processing and manipulation (ie, "first identify alllocation names in all documents, then replace each with a new name, butmake sure the new name doesn't appear in any other document"). Theother would be to somehow run through the first phase to identifyeverything, do processing using the collection of JCas's resulting, thenpump each JCas into a second AE for doing post-processing stuff.Somewhere in there would have to be some dynamically-mapped Sofas fromthe phase 1 AE to the phase 2 AE.

I hope that described my goal well enough, and thanks ahead of timefor any pointers you guys can throw my way.



-Matt

Multi-Document Processing

Reply via email to