Hello, First post to this list. Im beginning a project that will use automated text classification to classify congressional bills and AI::Categorizer looks like the best framework to use. However, Im hitting a snag on what should be a simple operation.
I train an svm classifier on 1000 documents; this operation goes fine. I then try to create an instance of AI::Categorizer::Collection::Files containing 5 unclassified documents. I supply only the path because the 5 documents are not yet categorized: my $c = new AI::Categorizer::Collection::Files( path => "$path"); while (my $document = $c->next) { my $hypothesis = $nb->categorize($document); print "Best assigned category: ", $hypothesis->best_category, "\n"; print "All assigned categories: ", join(', ', $hypothesis->categories), "\n"; } This produces the error No category information about '5-508' at /usr/local/share/perl/5.8.7/AI/Categorizer/Collection/Files.pm line 44. Mandatory parameter 'all_categories' missing in call to AI::Categorizer::Hypothesis->new() To get around this error I could just supply the categories of the 5 unknown test documents, but in our real world application we will have a constant stream of unclassified documents coming in that will recieve human attention only long after they have been automatically classified. Is the design intent to only allow test documents that already are categorized (eg for creating confidence statistics)? If so, does anyone have any suggestions on the preffered way to classifiy unknown documents with AI::Categorizer? Thanks, Alan