Hello,

First post to this list. Im beginning a project that will use
automated text classification to classify congressional bills and
AI::Categorizer looks like the best framework to use. However, Im
hitting a snag on what should be a simple operation.

I train an svm classifier on 1000 documents; this operation goes fine.
I then try to create an instance of AI::Categorizer::Collection::Files
containing 5 unclassified documents. I supply only the path because
the 5 documents are not yet categorized:

   my $c = new AI::Categorizer::Collection::Files(
       path => "$path");
   while (my $document = $c->next) {
       my $hypothesis = $nb->categorize($document);
       print "Best assigned category: ", $hypothesis->best_category, "\n";
       print "All assigned categories: ", join(', ',
$hypothesis->categories), "\n";
   }

This produces the error

No category information about '5-508' at
/usr/local/share/perl/5.8.7/AI/Categorizer/Collection/Files.pm line
44.
Mandatory parameter 'all_categories' missing in call to
AI::Categorizer::Hypothesis->new()

To get around this error I could just supply the categories of the 5
unknown test documents, but in our real world application we will have
a constant stream of unclassified documents coming in that will
recieve human attention only long after they have been automatically
classified.

Is the design intent to only allow test documents that already are
categorized (eg for creating confidence statistics)? If so, does
anyone have any suggestions on the preffered way to classifiy unknown
documents with AI::Categorizer?

Thanks,
Alan

Reply via email to