Yeah, you'd have to hand in a vector listing which distribution to use for
each element in the feature vector. Weka might have a way round this, but
I'll have to try using it to see what the interface is like. They reference
a paper that estimates the distribution of each feature using KDE:
http://www.cs.iastate.edu/~jtian/cs573/Papers/John-UAI-95.pdf
I guess then you wouldn't have to specify but it seems strange to try to
estimate the distribution of a features you know is Bernoulli, for instance
On Wed, Jun 11, 2014 at 3:52 PM, Lars Buitinck <[email protected]> wrote:
> 2014-06-11 15:54 GMT+02:00 Gavin Gray <[email protected]>:
> > I need to use Naive Bayes for mixed categorial and numerical data and was
> > thinking of implementing a flexible Naive Bayes algorithm similar to
> Weka's
> > instead of hacking my way around by converting the numerical to
> categorical
> > or similar. Is there a good reason I shouldn't do this? Is anyone else
> > interested in having this functionality? Or does anyone have any other
> > comments?
>
> I've thought about such a FrankensteinNB but never really found it
> worthwhile. The API becomes complicated because you have to specify
> which features follow which event model (and what would the model
> attributes look like?). When dealing with mixed event models, I just
> switch to discriminative classifiers, i.e. LinearSVC or
> LogisticRegression.
>
>
> ------------------------------------------------------------------------------
> HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
> Find What Matters Most in Your Big Data with HPCC Systems
> Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
> Leverages Graph Analysis for Fast Processing & Easy Data Exploration
> http://p.sf.net/sfu/hpccsystems
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
------------------------------------------------------------------------------
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing & Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general