Thanks for all the suggestions, I figure out was happening. In R we were training with all the data(train and test) and later evaluating on test :) Sorry. Now the results are very similar(0.73 vs 0.74). Andy, for the results I am using clf.predict_proba.
Thanks. El 06/10/14 20:13, "Andy" <[email protected]> escribió: >Hi Zoraida. > >I am not expert in R glms but I think the glm call just does logistic >regression. >For the binary case, this is the same as >sklearn.linear_model.LogisticRegression. > >Just a wild guess: Did you use clf.decision function results as input to >roc_auc_score? >If you use clf.predict results, you score will be much lower than it >should be. >In newer versions of scikit-learn, this is done automatically if you use >GridSearchCV or cross_val_score >for scoring your model and you use the "scoring" parameter. > >I don't understand the last part of your question. What do you find hard >to follow with scikit-learn? >Indeed, the implementation of LogisticRegression is a bit tricky as it >calls LibLinear, but I'm not sure you are asking about the code. > >Cheers, >Andy > > > >On 10/06/2014 03:10 PM, ZORAIDA HIDALGO SANCHEZ wrote: >> Hi all, >> >> I know the subject is ugly but I don¹t really know how to call it. >> >> I am newbie with all this machine learning techniques and what I do most >> of the time is to follow a ³try and error² approach. I now this method >>has >> some inconvenients but for now >> is what I am able to do. >> >> I am working with text on a classification problem. My pipeline is: >> TfidfVectorizer, feature selection with f_classif/Chi and the final >> classifier(I have tried lot of different classifiers). Unfortunately, >>the >> results that I am getting are very poor. The measurement that I am using >> is the AUC. The best result has been an AUC of 62(I have tried without >> doing feature selection too). >> >> Using same dataset but using R I have obtain an AUC of 0.90. In the >> process, I am using frequencies obtained with Scikit(I process the >> frequencies using TfidfVectorizer and later I store the resulting >>dataset >> on a csv). No feature selection is used and the classifier is a >>logistic >> regression: >> >> out.glm.1 <- glm(equat, data=dataset[,c(input, target)], >> family=binomial(link="logit²)) >> >> Is there someone that could tell me how to ³replicate² this with Scikit? >> And more, someone knows any resource ³easy to follow² where I can >> understand the underlying implementation >> on both libraries? In general, I found that Scikit has links to the >>source >> of the implementation(I mean, the original papers). On the other hand, I >> found R documentation very difficult to follow(parameters explanation) >>and >> there aren¹t too much details on the implementation. >> >> Thanks in advance. >> >> >> ________________________________ >> >> Este mensaje y sus adjuntos se dirigen exclusivamente a su >>destinatario, puede contener información privilegiada o confidencial y >>es para uso exclusivo de la persona o entidad de destino. Si no es >>usted. el destinatario indicado, queda notificado de que la lectura, >>utilización, divulgación y/o copia sin autorización puede estar >>prohibida en virtud de la legislación vigente. Si ha recibido este >>mensaje por error, le rogamos que nos lo comunique inmediatamente por >>esta misma vía y proceda a su destrucción. >> >> The information contained in this transmission is privileged and >>confidential information intended only for the use of the individual or >>entity named above. If the reader of this message is not the intended >>recipient, you are hereby notified that any dissemination, distribution >>or copying of this communication is strictly prohibited. If you have >>received this transmission in error, do not read it. Please immediately >>reply to the sender that you have received this communication in error >>and then delete it. >> >> Esta mensagem e seus anexos se dirigem exclusivamente ao seu >>destinatário, pode conter informação privilegiada ou confidencial e é >>para uso exclusivo da pessoa ou entidade de destino. Se não é vossa >>senhoria o destinatário indicado, fica notificado de que a leitura, >>utilização, divulgação e/ou cópia sem autorização pode estar proibida em >>virtude da legislação vigente. Se recebeu esta mensagem por erro, >>rogamos-lhe que nos o comunique imediatamente por esta mesma via e >>proceda a sua destruição >> >> >>------------------------------------------------------------------------- >>----- >> Slashdot TV. Videos for Nerds. Stuff that Matters. >> >>http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clk >>trk >> _______________________________________________ >> Scikit-learn-general mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general > > >-------------------------------------------------------------------------- >---- >Slashdot TV. Videos for Nerds. Stuff that Matters. >http://pubads.g.doubleclick.net/gampad/clk?id=160591471&iu=/4140/ostg.clkt >rk >_______________________________________________ >Scikit-learn-general mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/scikit-learn-general ________________________________ Este mensaje y sus adjuntos se dirigen exclusivamente a su destinatario, puede contener información privilegiada o confidencial y es para uso exclusivo de la persona o entidad de destino. Si no es usted. el destinatario indicado, queda notificado de que la lectura, utilización, divulgación y/o copia sin autorización puede estar prohibida en virtud de la legislación vigente. Si ha recibido este mensaje por error, le rogamos que nos lo comunique inmediatamente por esta misma vía y proceda a su destrucción. The information contained in this transmission is privileged and confidential information intended only for the use of the individual or entity named above. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this transmission in error, do not read it. Please immediately reply to the sender that you have received this communication in error and then delete it. Esta mensagem e seus anexos se dirigem exclusivamente ao seu destinatário, pode conter informação privilegiada ou confidencial e é para uso exclusivo da pessoa ou entidade de destino. Se não é vossa senhoria o destinatário indicado, fica notificado de que a leitura, utilização, divulgação e/ou cópia sem autorização pode estar proibida em virtude da legislação vigente. Se recebeu esta mensagem por erro, rogamos-lhe que nos o comunique imediatamente por esta mesma via e proceda a sua destruição ------------------------------------------------------------------------------ Meet PCI DSS 3.0 Compliance Requirements with EventLog Analyzer Achieve PCI DSS 3.0 Compliant Status with Out-of-the-box PCI DSS Reports Are you Audit-Ready for PCI DSS 3.0 Compliance? Download White paper Comply to PCI DSS 3.0 Requirement 10 and 11.5 with EventLog Analyzer http://pubads.g.doubleclick.net/gampad/clk?id=154622311&iu=/4140/ostg.clktrk _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
