Hi Bill,

The problem in your example is actually in how you're creating %dochash. You're re-using the %doc hash for both documents, which means that under the surface you don't have what you think you have. Witness this condensed version of your example:

=======================================================
my %doc;
my %dochash;

$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching them.";
$dochash{SeahawksDocTitle} = \%doc;

$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and stuff";
$dochash{SeattleDocTitle} = \%doc;

use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
          'SeattleDocTitle' => {
'content' => 'I like to go to seattle and watch the mariners and stuff',
                                 'name' => 'Seattle'
                               },
          'SeahawksDocTitle' => $VAR1->{'SeattleDocTitle'}
        };
=======================================================


There are several ways to create the data structure you intend - one way would be something like this:


=======================================================
my %dochash;

$dochash{SeahawksDocTitle} =
  {
   name => "Seahawks",
content => "The Seahawks are a pretty good team. I enjoy watching them.",
  };

$dochash{SeattleDocTitle} =
  {
   name => "Seattle",
content => "I like to go to seattle and watch the mariners and stuff",
  }

use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
          'SeattleDocTitle' => {
                                 'name' => 'Seattle',
'content' => 'I like to go to seattle and watch the mariners and stuff'
                               },
          'SeahawksDocTitle' => {
                                  'name' => 'Seahawks',
'content' => 'The Seahawks are a pretty good team. I enjoy watching them.'
                                }
        };
=======================================================


Then the following display code shows that the Collection is created properly:

=======================================================
print "Number of docs: ", $collection->count_documents, "\n";
while (my $doc = $collection->next) {
print $doc->name, " => [", join( ", ", map $_->name, $doc->categories ), "]\n";
}
=======================================================
Number of docs: 2
Seahawks => [trucks, cars]
Seattle => [seattle, baseball]
=======================================================


 -Ken


On Aug 4, 2005, at 7:26 PM, Bill W. wrote:

Hello perl-ai!

I've been playing with AI::Categorizer for a week or two now, and am having difficulties creating a collection object using the InMemory module. I'm new to perl and oop and programming for that matter, but I've managed to get the functionality I'm looking for from AI::Categorizer using Collection::Files. However, it would be very much more useful and efficient if I could create the collection from memory. It seems that the collection is created, and I can load it into a knowledgeset. I can even train NaiveBayes on the knowledge set and categorize documents (although I'm not sure that it's doing so properly.). It seems that it's not acknowledging all of the categories that are included in the collection's documents, it seems to only be recognizing one document's category set as the set for the collection. The main error I'm getting is when I try to generate a stats_table using:

my $mem_experiment = $l_mem->categorize_collection( collection => $c_mem_test );
print $mem_experiment->stats_table;

Can't take log of 0 at /usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.

Can anyone tell me where I'm going wrong? I very much appreciate help from anyone who has gotten this working. And thanks to Ken for creating this great tool.

-Bill


---------code snippet--------
my %doc;
my %dochash;

my $cars = AI::Categorizer::Category->by_name(name => "cars");
my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
my $seattle = AI::Categorizer::Category->by_name(name => "seattle");

push(my @seahawks_categories,$cars,$trucks);
push(my @seattle_categories,$seattle,$baseball);


$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching them, and going to Seattle to see them";
$doc{categories} = [EMAIL PROTECTED];
$dochash{SeahawksDocTitle} = \%doc;

$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and stuff";
$doc{categories} = [EMAIL PROTECTED];
$dochash{SeattleDocTitle} = \%doc;


my $collection = new AI::Categorizer::Collection::InMemory( data => \%dochash);

return($collection);


Reply via email to