Have you tried using One-Class SVM to learn the minority class? I read
somewhere that it can lead to better results than using both classes when
you have heavily unbalanced data class proportions.
Albert
2014-09-09 16:55 GMT+02:00 ZORAIDA HIDALGO SANCHEZ <
zoraida.hidalgosanc...@telefonica.com>:
> I already did but when I test on my test dataset(unbalance) I get very
> poor results.
>
> De: Eustache DIEMERT <eusta...@diemert.fr>
> Responder a: "scikit-learn-general@lists.sourceforge.net" <
> scikit-learn-general@lists.sourceforge.net>
> Fecha: martes, 9 de septiembre de 2014 16:33
> Para: "scikit-learn-general@lists.sourceforge.net" <
> scikit-learn-general@lists.sourceforge.net>
> Asunto: Re: [Scikit-learn-general] SVC and unbalanced dataset
>
> besides class weights you may try to downsample your negative examples.
>
> E/
>
> 2014-09-09 14:32 GMT+02:00 ZORAIDA HIDALGO SANCHEZ <
> zoraida.hidalgosanc...@telefonica.com>:
>
>> Dear all,
>>
>> I am trying to classify a dataset with a binary target. Number of positive
>> instances represents only the 3% of the total instances. I have tried
>> using SVC with neither auto_weight nor sample_weight and the confusion
>> matrix shows that all instances are classified as negative. However, if I
>> use either auto_weight:auto or sample_weight(computing the weight of each
>> instances proportional to the porcentaje of its target) then the confusion
>> matrix is the other way around(that means, all instances are classified as
>> positive).
>>
>> What am I doing wrong?
>>
>> This is how I have made the calls:
>>
>> 1) with no additional parameters:
>> SVC(probability=True, max_iter=1000, verbose=5)
>>
>> 2) with class_weight:
>> SVC(class_weight=Œauto¹, probability=True, max_iter=1000, verbose=5)
>>
>> 3) with sample_weight:
>> classifier = SVC(probability=True, max_iter=1000, verbose=5)
>>
>> and later:
>>
>> sample_weight = np.asarray(compute_sample_weight(np.unique(y_train),
>> y_train))
>> classifier.fit(X_train, y_train, sample_weight=sample_weight)
>>
>>
>> def compute_sample_weight(classes, y_train):
>> # Find the weight of each class as present in y.
>> le = LabelEncoder()
>> y_ind = le.fit_transform(y_train)
>> if not all(np.in1d(classes, le.classes_)):
>> raise ValueError("classes should have valid labels that are in y")
>>
>> # inversely proportional to the number of samples in the class
>> recip_freq = 1. / np.bincount(y_ind)
>> weight = recip_freq[le.transform(classes)] / np.mean(recip_freq)
>> weight_by_class = dict(zip(le.classes_, weight))
>> y_sample_weight = [weight_by_class[e] for e in y_train]
>> return y_sample_weight
>>
>>
>> Thanks.
>>
>>
>>
>> Z.-
>>
>>
>> ________________________________
>>
>> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
>> puede contener información privilegiada o confidencial y es para uso
>> exclusivo de la persona o entidad de destino. Si no es usted. el
>> destinatario indicado, queda notificado de que la lectura, utilización,
>> divulgación y/o copia sin autorización puede estar prohibida en virtud de
>> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
>> que nos lo comunique inmediatamente por esta misma vía y proceda a su
>> destrucción.
>>
>> The information contained in this transmission is privileged and
>> confidential information intended only for the use of the individual or
>> entity named above. If the reader of this message is not the intended
>> recipient, you are hereby notified that any dissemination, distribution or
>> copying of this communication is strictly prohibited. If you have received
>> this transmission in error, do not read it. Please immediately reply to the
>> sender that you have received this communication in error and then delete
>> it.
>>
>> Esta mensagem e seus anexos se dirigem exclusivamente ao seu
>> destinatário, pode conter informação privilegiada ou confidencial e é para
>> uso exclusivo da pessoa ou entidade de destino. Se não é vossa senhoria o
>> destinatário indicado, fica notificado de que a leitura, utilização,
>> divulgação e/ou cópia sem autorização pode estar proibida em virtude da
>> legislação vigente. Se recebeu esta mensagem por erro, rogamos-lhe que nos
>> o comunique imediatamente por esta mesma via e proceda a sua destruição
>>
>>
>> ------------------------------------------------------------------------------
>> Want excitement?
>> Manually upgrade your production database.
>> When you want reliability, choose Perforce.
>> Perforce version control. Predictably reliable.
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general