besides class weights you may try to downsample your negative examples.
E/
2014-09-09 14:32 GMT+02:00 ZORAIDA HIDALGO SANCHEZ <
zoraida.hidalgosanc...@telefonica.com>:
> Dear all,
>
> I am trying to classify a dataset with a binary target. Number of positive
> instances represents only the 3% of the total instances. I have tried
> using SVC with neither auto_weight nor sample_weight and the confusion
> matrix shows that all instances are classified as negative. However, if I
> use either auto_weight:auto or sample_weight(computing the weight of each
> instances proportional to the porcentaje of its target) then the confusion
> matrix is the other way around(that means, all instances are classified as
> positive).
>
> What am I doing wrong?
>
> This is how I have made the calls:
>
> 1) with no additional parameters:
> SVC(probability=True, max_iter=1000, verbose=5)
>
> 2) with class_weight:
> SVC(class_weight=Œauto¹, probability=True, max_iter=1000, verbose=5)
>
> 3) with sample_weight:
> classifier = SVC(probability=True, max_iter=1000, verbose=5)
>
> and later:
>
> sample_weight = np.asarray(compute_sample_weight(np.unique(y_train),
> y_train))
> classifier.fit(X_train, y_train, sample_weight=sample_weight)
>
>
> def compute_sample_weight(classes, y_train):
> # Find the weight of each class as present in y.
> le = LabelEncoder()
> y_ind = le.fit_transform(y_train)
> if not all(np.in1d(classes, le.classes_)):
> raise ValueError("classes should have valid labels that are in y")
>
> # inversely proportional to the number of samples in the class
> recip_freq = 1. / np.bincount(y_ind)
> weight = recip_freq[le.transform(classes)] / np.mean(recip_freq)
> weight_by_class = dict(zip(le.classes_, weight))
> y_sample_weight = [weight_by_class[e] for e in y_train]
> return y_sample_weight
>
>
> Thanks.
>
>
>
> Z.-
>
>
> ________________________________
>
> Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario,
> puede contener información privilegiada o confidencial y es para uso
> exclusivo de la persona o entidad de destino. Si no es usted. el
> destinatario indicado, queda notificado de que la lectura, utilización,
> divulgación y/o copia sin autorización puede estar prohibida en virtud de
> la legislación vigente. Si ha recibido este mensaje por error, le rogamos
> que nos lo comunique inmediatamente por esta misma vía y proceda a su
> destrucción.
>
> The information contained in this transmission is privileged and
> confidential information intended only for the use of the individual or
> entity named above. If the reader of this message is not the intended
> recipient, you are hereby notified that any dissemination, distribution or
> copying of this communication is strictly prohibited. If you have received
> this transmission in error, do not read it. Please immediately reply to the
> sender that you have received this communication in error and then delete
> it.
>
> Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário,
> pode conter informação privilegiada ou confidencial e é para uso exclusivo
> da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário
> indicado, fica notificado de que a leitura, utilização, divulgação e/ou
> cópia sem autorização pode estar proibida em virtude da legislação vigente.
> Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique
> imediatamente por esta mesma via e proceda a sua destruição
>
>
> ------------------------------------------------------------------------------
> Want excitement?
> Manually upgrade your production database.
> When you want reliability, choose Perforce.
> Perforce version control. Predictably reliable.
>
> http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
Want excitement?
Manually upgrade your production database.
When you want reliability, choose Perforce.
Perforce version control. Predictably reliable.
http://pubads.g.doubleclick.net/gampad/clk?id=157508191&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general