Hi Bill,
The problem in your example is actually in how you're creating
%dochash. You're re-using the %doc hash for both documents, which
means that under the surface you don't have what you think you have.
Witness this condensed version of your example:
=======================================================
my %doc;
my %dochash;
$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching
them.";
$dochash{SeahawksDocTitle} = \%doc;
$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and
stuff";
$dochash{SeattleDocTitle} = \%doc;
use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
'SeattleDocTitle' => {
'content' => 'I like to go to seattle
and watch the mariners and stuff',
'name' => 'Seattle'
},
'SeahawksDocTitle' => $VAR1->{'SeattleDocTitle'}
};
=======================================================
There are several ways to create the data structure you intend - one
way would be something like this:
=======================================================
my %dochash;
$dochash{SeahawksDocTitle} =
{
name => "Seahawks",
content => "The Seahawks are a pretty good team. I enjoy watching
them.",
};
$dochash{SeattleDocTitle} =
{
name => "Seattle",
content => "I like to go to seattle and watch the mariners and
stuff",
}
use Data::Dumper;
print Dumper \%dochash;
=======================================================
$VAR1 = {
'SeattleDocTitle' => {
'name' => 'Seattle',
'content' => 'I like to go to seattle
and watch the mariners and stuff'
},
'SeahawksDocTitle' => {
'name' => 'Seahawks',
'content' => 'The Seahawks are a
pretty good team. I enjoy watching them.'
}
};
=======================================================
Then the following display code shows that the Collection is created
properly:
=======================================================
print "Number of docs: ", $collection->count_documents, "\n";
while (my $doc = $collection->next) {
print $doc->name, " => [", join( ", ", map $_->name, $doc->categories
), "]\n";
}
=======================================================
Number of docs: 2
Seahawks => [trucks, cars]
Seattle => [seattle, baseball]
=======================================================
-Ken
On Aug 4, 2005, at 7:26 PM, Bill W. wrote:
Hello perl-ai!
I've been playing with AI::Categorizer for a week or two now, and am
having difficulties creating a collection object using the InMemory
module. I'm new to perl and oop and programming for that matter, but
I've managed to get the functionality I'm looking for from
AI::Categorizer using Collection::Files. However, it would be very
much more useful and efficient if I could create the collection from
memory. It seems that the collection is created, and I can load it
into a knowledgeset. I can even train NaiveBayes on the knowledge set
and categorize documents (although I'm not sure that it's doing so
properly.). It seems that it's not acknowledging all of the
categories that are included in the collection's documents, it seems
to only be recognizing one document's category set as the set for the
collection. The main error I'm getting is when I try to generate a
stats_table using:
my $mem_experiment = $l_mem->categorize_collection( collection =>
$c_mem_test );
print $mem_experiment->stats_table;
Can't take log of 0 at
/usr/local/share/perl/5.8.4/Statistics/Contingency.pm line 183.
Can anyone tell me where I'm going wrong? I very much appreciate help
from anyone who has gotten this working. And thanks to Ken for
creating this great tool.
-Bill
---------code snippet--------
my %doc;
my %dochash;
my $cars = AI::Categorizer::Category->by_name(name => "cars");
my $trucks = AI::Categorizer::Category->by_name(name => "trucks");
my $baseball = AI::Categorizer::Category->by_name(name => "baseball");
my $seattle = AI::Categorizer::Category->by_name(name => "seattle");
push(my @seahawks_categories,$cars,$trucks);
push(my @seattle_categories,$seattle,$baseball);
$doc{name} = "Seahawks";
$doc{content} = "The Seahawks are a pretty good team. I enjoy watching
them, and going to Seattle to see them";
$doc{categories} = [EMAIL PROTECTED];
$dochash{SeahawksDocTitle} = \%doc;
$doc{name} = "Seattle";
$doc{content} = "I like to go to seattle and watch the mariners and
stuff";
$doc{categories} = [EMAIL PROTECTED];
$dochash{SeattleDocTitle} = \%doc;
my $collection = new AI::Categorizer::Collection::InMemory( data
=> \%dochash);
return($collection);