Hi Andreas,

I agree missing data is not specific to MLP.
We dealt it with pretty simple as u mentioned by taking mean over the
dataset for continuous-valued attributes.
Another thing that I feel is not adequately explored in the scikit
implementations is the discrete attributes.
Classification problems with discrete input features or a mix of discrete
and continuous features cannot be handled well. Many UCI datasets have a
mix of discrete and continuous attributes.
For discrete attributes we consider the missing values as another kind of
discrete value namely 'UNKNOWN'.

And I mentioned about allowing for multiple hidden layers because its just
a flexibility we would like to give to more advanced users of MLP who might
like to experiment with different number of hidden units in case of
difficult problems.

Thanks,
Vandana

On Thu, Jun 7, 2012 at 10:16 AM, eat <e.antero.ta...@gmail.com> wrote:

> Hi,
>
> On Thu, Jun 7, 2012 at 6:09 PM, LI Wei <li...@ee.cuhk.edu.hk> wrote:
>
>> Intuitively maybe we can set the missing values using the average over
>> the nearest neighbors calculated using these existing features? Not sure
>> whether it is the correct way to do it :-)
>
> I think the key question is: how reliable manner one can estimate the mean
> (and variance) here.
>
> With data sets containing both missing values and outliers, I doubt that
> there exists any simple, generally accepted. way to both detect outliers
> (so that their impact on mean and variance is counted for) and same time
> impute missing values.
>
> However it might be possible to incorporate some domain specific
> knowledge in order to move on. So, in summary, what kind of schemes there
> exists to add (ad hoc) domain specific knowledge systematic manner into the
> modeling process?
>
>
> My 2 cents,
> -eat
>
>>
>> Cheers,
>> LI, Wei
>>
>>
>> On Thu, Jun 7, 2012 at 12:25 PM, Andreas Mueller <
>> amuel...@ais.uni-bonn.de> wrote:
>>
>>>  Hi everybody!
>>> David, it's your project, I'm just trying to help along ;)
>>> About 2): Afaik there is nothing in sklearn at the moment
>>> that can deal with missing variables and I feel the MLP
>>> is one of the estimators where dealing with missing values
>>> is hardest.
>>> @David: I wouldn't keep you from trying but it seems a bit
>>> out of the scope of the MLP. I think the idea for missing data
>>> was to provide an additional mask as input that says
>>> which values are missing. Dealing with this is much more natural
>>> in naive Bayes or tree based methods than in the MLP I think.
>>>
>>> @Vandana: For dealing with missing data, one easy way is to
>>> set the missing variables to their mean over the dataset.
>>> Usually for MLPs the input should be zero mean, unit variance.
>>> So the missing variable would be just set to 0.
>>> Do you know of any better way of dealing with missing values
>>> in MLPs?
>>>
>>> Cheers,
>>> Andy
>>>
>>>
>>>
>>> On 06/05/2012 07:51 PM, David Marek wrote:
>>>
>>> I think you sent this mail only to me, please send all mails to mailling
>>> list. Btw. Andreas is my mentor, so he is the one in charge here :-)
>>>
>>> Ad 1) Afaik all you need is one hidden layer, it's certainly possible to
>>> add the possibility, but I think we decided that it's not a priority.
>>>
>>> Ad 2) Good idea
>>>
>>> David
>>>
>>> ---------- Forwarded message ----------
>>> From: Vandana Bachani <vandana....@gmail.com>
>>> Date: Tue, Jun 5, 2012 at 6:59 PM
>>> Subject: Re: [Scikit-learn-general] Contributing to scikit-learn
>>> To: h4wk...@gmail.com
>>>
>>>
>>> Hi David,
>>> I think we can add the following also to the to do list:
>>> 1. Any number of hidden layers and hidden units should be supported.
>>> 2. Missing data should be handled (several UCI datasets have missing
>>> data).
>>>
>>>  I will look at the code and then send you a mail about my thoughts on
>>> the same.
>>>
>>>  If you would like to have a look at my project report, I am attaching
>>> the same.
>>>
>>>  Thanks,
>>> Vandana
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>>
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing 
>>> listScikit-learn-general@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Live Security Virtual Conference
>>> Exclusive live event will cover all the ways today's security and
>>> threat landscape has changed and how IT managers can respond. Discussions
>>> will include endpoint security, mobile security and the latest in malware
>>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


-- 
Vandana Bachani
Graduate Student, MSCE
Computer Science & Engineering Department
Texas A&M University, College Station
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to