Aha, yes.
AI::Categorizer lets you customize the tokenization behavior to be
however you want, by subclassing the Document class and overriding the
tokenize() method. You could do something like this:
{
package My::Documents;
@ISA = qw(AI::Categorizer::Document::Text);
sub tokenize {
Thanks for all the good feedback, I'll certainly be following up on it.
I did find one reason why I wasn't getting good matches ... when I
looked more carefully at the perl data structure, I found that the
'features' hash only contained alphabetic characters. So, for example,
in the string 'WARRIO
On Feb 5, 2005, at 1:26 AM, Richard Jelinek wrote:
True true. And while this is true, the reports about nonfunctional SVM
are also true. At least I can confirm them and have mentioned them
here some time ago already.
What can/will "we" do about this?
Oh yes, sorry I forgot to address this in my mes
Hi Ken,
On Fri, Feb 04, 2005 at 08:36:10PM -0600, Ken Williams wrote:
> What this means is that in order to use AI::Categorizer in the obvious
> way for this project, you're going to have to get your hands on some
> training data that has the same statistical properties as what you'll
> see at
ueries" are the noisy strings you're trying to clean up. Sometimes
that works pretty well.
Or you could try the Levenshtein edit distance that Samy suggested. Or
you could try something else that you invent. =)
-Ken
On Feb 4, 2005, at 4:18 AM, Jason Armstrong wrote:
Perhaps someo
It is not in perl, but SMVlight (http://svmlight.joachims.org/) offers a
very efficient (in my experience) C implementation of support vector
machines, and, being a command line tool, it's easy to interface it to
perl.
Regards,
Marco
--
Marco Baroni
SSLMIT, University of Bologna
http://sslm
Jason Armstrong wrote:
...
I've been looking at AI::Categorizer. I have a list of all valid vehicle
descriptions (about 8200). I create for each of these a knowledge set,
with the content the same as the category:
Briefly:
my $c = new AI::Categorizer(
knowledge_set => AI::Categorizer:
wrote:
Perhaps someone on this list has some good advice for me. I am working
on a project that imports vehicle descriptions. Very often, the data
capturers give invalid information, or mistyped data. I am looking for
a
way to intelligently reformat the data, and add the mistyped entry for
future
Perhaps someone on this list has some good advice for me. I am working
on a project that imports vehicle descriptions. Very often, the data
capturers give invalid information, or mistyped data. I am looking for a
way to intelligently reformat the data, and add the mistyped entry for
future use.
I