Thanks. That answered my question. On Fri, Nov 6, 2015 at 10:21 AM, buddha <[email protected]> wrote:
> UIMA works best when you are investigating one document at a time. My > suggestion would be to run the initial pipeline to get the correct > annotation, which I assume are tokens in your case, then save those off > into some relational table. > > From there, you can run the documents through again and load your df > values as an external resource, then do the tf the second time. > > There are ways to estimate the tf/idf values, but, frankly, the whole > notion of “document frequency” means you’ve looked at the whole corpus at > least once. > > > On Nov 6, 2015, at 7:12 AM, Christopher Baechle <[email protected]> > wrote: > > > > I am working with an existing project that is built with UIMA. I am > trying > > to create a tf-idf style score that looks at the set of documents as a > > whole. > > > > Since the rest of the project uses UIMA heavily, I would like to > implement > > this as an annotator if possible, rather than a separate program. Is it > > possible within UIMA to do this? > >
