Re: How to annotate based on document collection

Christopher Baechle Fri, 06 Nov 2015 07:32:56 -0800

Thanks. That answered my question.

On Fri, Nov 6, 2015 at 10:21 AM, buddha <[email protected]> wrote:


> UIMA works best when you are investigating one document at a time.  My
> suggestion would be to run the initial pipeline to get the correct
> annotation, which I assume are tokens in your case, then save those off
> into some relational table.
>
> From there, you can run the documents through again and load your df
> values as an external resource, then do the tf the second time.
>
> There are ways to estimate the tf/idf values, but, frankly, the whole
> notion of “document frequency” means you’ve looked at the whole corpus at
> least once.
>
> > On Nov 6, 2015, at 7:12 AM, Christopher Baechle <[email protected]>
> wrote:
> >
> > I am working with an existing project that is built with UIMA. I am
> trying
> > to create a tf-idf style score that looks at the set of documents as a
> > whole.
> >
> > Since the rest of the project uses UIMA heavily, I would like to
> implement
> > this as an annotator if possible, rather than a separate program. Is it
> > possible within UIMA to do this?
>
>

Re: How to annotate based on document collection

Reply via email to