Olivier Grisel <olivier.grisel@...> writes: > > 2012/9/6 Ark <ark_antos@...>: > > > >> Hand how large in bytes? It seems that is should be small enough to be > >> able to use sklearn.linear_model.LogisticRegression despite the data > >> copy in memory. > >> > > > > Right now its not even 100M, but it will extend to 1G atleast. > > Alright, have you tried sklearn.linear_model.LogisticRegression (as > always you need to do a basic grid search for the value of C). > > On datasets of the order of more than 1G the memory copy induced by > liblinear data layout might be an issue if you have 4GB on your > workstation. >
Well, logistic regression seems to work fine with lesser accuracy. I am fiddling with SelectKbest for features, and gridsearchcv to tune params, to see if it improves the results. On another note I was pointed to the liblinear FAQ at the question "Q: Why you support probability outputs for logistic regression only?". I tweaked the the mentioned liblinear source(and linearsvc) to get the probability estimates for LinearSVC, seemed to work , but I am not sure whether it is correct solution, if you have any thoughts on how I could verify please do comment. Thanks. ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
