Hi all,

A bayesian network (BN) is a directed acyclic graph whose nodes represent
random variables X1,...Xn. For each node Xi, we can encode a conditional
probability distribution P(Xi | parents(Xi)), conditioned on the node's
parents. The entire BN represents a joint probability distribution over the
nodes.

Let me illustrate a sample use-case of a BN. Referring to this figure: (sample
bayes-net 
image)<http://www.google.com.sg/imgres?hl=en&client=ubuntu&channel=cs&biw=990&bih=514&tbm=isch&tbnid=kAkJBvTcc28fvM:&imgrefurl=http://www.bayesnets.com/&docid=4zInkuApqHON-M&imgurl=http://www.bayesnets.com/index_files/image027.jpg&w=325&h=182&ei=Y5NhT-vOLtCxrAeB2PWcCA&zoom=1&iact=rc&dur=167&sig=117339824432212531378&page=5&tbnh=106&tbnw=190&start=46&ndsp=13&ved=1t:429,r:7,s:46&tx=64&ty=78>.
A
history of smoking might cause either chronic bronchitis or lung cancer.
Lung cancer might have two effects: a mass seen on X-ray, or fatigue. On
the other hand, chronic bronchitis has a single effect: fatigue. We can now
ask questions like: "Given that a patient has a history of smoking and is
experiencing fatigue, what is the probability that he has cancer?"

The following distinct situations arise in a BN context:
1)If both the structure and parameters of a BN are known, we can answer
inference questions by doing maximum likelihood calculations / finding
marginal probabilities.
2)The structure of the BN might be specified by a human expert with unknown
parameters, which can be learnt from data.
3)The harder case is when both the structure as well as the parameters are
unknown. Learning the BN amounts to a search of the model space, and is
usually done by a monte-carlo search.
4)Finally, the hybrid case occurs when the human expert specifies
constraints on the graph, reducing the search space. For example, if we
have data about hypertension, cholesterol, smoking (yes/no), sex, and age.
We can disallow any arrows into sex and age, since we know none of the
variables can influence sex/age, and thus constrain our search space.

I propose implementing all 4 cases, in that order.

Alexandre, here is an excellent reference for learning the structure of
BNs: http://research.microsoft.com/apps/pubs/default.aspx?id=65088

There is also an R-package called "deal" for learning BNs:
http://www.jstatsoft.org/v08/i20

As Xinfan pointed out, there exist many other implementations, including
some in python: http://www.cs.ubc.ca/~murphyk/Software/bnsoft.html

My thinking is that implementing a bayesian-network would be the first step
in developing more general graphical modelling capability in scikits-learn.
Future extensions can include dynamic BNs which evolve with time, markov
networks, etc.

Xinfan, efficiency is indeed an issue: searching for the correct graph has
been shown to be super-exponential in the number of nodes. Hence most
algorithms do a monte-carlo search and settle for a locally-best option.
Andreas, currently i'm not considering hidden nodes, but hope it is clear
how BNs are useful even without hidden nodes.

Thank you for your time :)

regards,
shankar.



On Thu, Mar 15, 2012 at 8:37 AM, xinfan meng <[email protected]> wrote:

> FYR, here is a pretty good comparison of the existing graphical software
> http://www.cs.ubc.ca/~murphyk/Software/bnsoft.html . However, I am really
> concerned about the efficiency issue.
>
> On Thu, Mar 15, 2012 at 6:03 AM, Andreas Mueller <[email protected]
> > wrote:
>
>>  On 03/14/2012 10:52 PM, Alexandre Gramfort wrote:
>>
>> hi shankar,
>>
>> that sounds interesting to me. Can you come up with a few references
>> and are you aware of existing implementations?
>>
>>  I think the best reference is "Bishop: Machine Learning and Pattern
>> Recognition".
>> Though it is not used for classification there so much.
>>
>> Bayesian Network is another word for "directed graphical model".
>>
>> This is a very wide area and I'm not sure this is easy to implement in
>> a generic way.
>> Are you thinking about discrete models? Then 
>> libDAI<http://cs.ru.nl/%7Ejorism/libDAI/>is
>> the one-in-all solution that is often used afaik. Not sure if
>> it can learn structure, to, though.
>>
>> What kind of structure would you like to learn? I would
>> guess you'd restrict yourself to DAG.
>>
>> I'm a bit tired now so it might be I don't see the obvious but actually
>> I don't think I really understand the proposal.
>> What should be the input and what the output? Are there hidden states?
>> If your inputs are deterministic, I'm not sure what you would gain by
>> having a directed graphical model - if you don't have hidden states.
>> And if you do have hidden states, then I'm not quite sure what the
>> structure learning does...
>> Can you elaborate a bit more?
>>
>> Thanks,
>> Andy
>>
>>
>> ------------------------------------------------------------------------------
>> Virtualization & Cloud Management Using Capacity Planning
>> Cloud computing makes use of virtualization - but cloud computing
>> also focuses on allowing computing to be delivered as a service.
>> http://www.accelacomm.com/jaw/sfnl/114/51521223/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> [email protected]
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Best Wishes
> --------------------------------------------
> Meng Xinfan(蒙新泛)
> Institute of Computational Linguistics
> Department of Computer Science & Technology
> School of Electronic Engineering & Computer Science
> Peking University
> Beijing, 100871
> China
>
>
> ------------------------------------------------------------------------------
> This SF email is sponsosred by:
> Try Windows Azure free for 90 days Click Here
> http://p.sf.net/sfu/sfd2d-msazure
> _______________________________________________
> Scikit-learn-general mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to