[EMAIL PROTECTED] (Tom Fawcett) wrote: >Ken Williams <[EMAIL PROTECTED]> wrote: >> [EMAIL PROTECTED] (Tom Fawcett) wrote: >> >It would be nice to have a numeric discretization module as well so >> >these would work with mixed numerics and text, but that's probably >> >asking for too much... >> >> I'm not sure what the term 'discretization' means - is it a conversion >> of numerics to some other form, or a lumping, or something like that? > >Yes, it's basically taking a continuous valued attribute and creating >appropriate bins for the value. So instead of C in [1,100] you have >C_prime in {low,medium,high}. This is necessary for techniques like >Naive Bayes which can't handle continuous attributes naturally. >Figuring out the number of bins and their ranges is the trick. I guess >there are some straightforward entropy based methods that are pretty >easy to write. I'll implement one when I get some, um, spare time. Cool. Keep us apprised. >> By the way, the best place to discuss this work is on the perl-AI list, >> at [EMAIL PROTECTED] . That's where I'm trying to coax discussions to >> take place. > >OK, I've joined it. I've cc'd this message there too. ------------------- ------------------- Ken Williams Last Bastion of Euclidity [EMAIL PROTECTED] The Math Forum