Olivier Grisel <olivier.grisel@...> writes:

> 
> 2012/9/6 Ark <ark_antos@...>:
> >
> >> Hand how large in bytes? It seems that is should be small enough to be
> >> able to use sklearn.linear_model.LogisticRegression despite the data
> >> copy in memory.
> >>
> >
> > Right now its not even 100M, but it will extend to 1G atleast.
> 
> Alright, have you tried sklearn.linear_model.LogisticRegression (as
> always you need to do a basic grid search for the value of C).
> 
> On datasets of the order of more than 1G the memory copy induced by
> liblinear data layout might be an issue if you have 4GB on your
> workstation.
> 

   Well, logistic regression seems to work fine with lesser 
accuracy. I am fiddling with SelectKbest for features, and gridsearchcv to tune 
params, to see if it improves the results. On another note I was pointed to the 
liblinear FAQ at the question "Q: Why you support probability outputs for 
logistic regression only?". I tweaked the the mentioned liblinear source(and 
linearsvc) to get the probability estimates for LinearSVC, seemed to work , but 
I am not sure whether it is correct solution, if you have any thoughts on how I 
could verify please do comment.
Thanks.



------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to