... I've been looking at AI::Categorizer. I have a list of all valid vehicle descriptions (about 8200). I create for each of these a knowledge set, with the content the same as the category:
Briefly:
my $c = new AI::Categorizer( knowledge_set => AI::Categorizer::KnowledgeSet->new ( name => 'Vehicles', ), learner_class => 'AI::Categorizer::Learner::NaiveBayes');
my $l = $c->learner;
my %docs; foreach (vehicle descriptions) { $docs{$i}->{content} = $content; $docs{$i++}->{category} = [$content]; }
foreach (keys %docs) { $c->knowledge_set->make_document(name => $_, %{$docs->{$_}}); }
$l->train;
Sometimes it works well:
I'm using AI::Categorizer to categorize books and have many of the same questions as you. AI::Categorizer::Learner::KNN is working the best and like you when ever I try AI::Categorizer::Learner::SVM it blows up every time. I even moved it off onto a 64 bit machine with 8 gigs of memory and it still won't run. We have over 10,000 trained books using the text supplied by Amazon to train with.
-Tim