Re: [Scikit-learn-general] Announce: scikit-learn v0.11

2012-05-08 Thread Gael Varoquaux
On Tue, May 08, 2012 at 08:36:10PM -0400, Satrajit Ghosh wrote: >thanks! this is a nice and helpful snapshot! perhaps it should be included >in the documentation with future releases? We do have a "what's new" section. For this release, we also tried to have a 'highlights' part of the "wha

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Yaroslav Halchenko
On Wed, 09 May 2012, Olivier Grisel wrote: > > so if it fails for some specific seed, I could check if it gets > > replicated by running the same test again with the same seed. > > if it doesn't -- I know **for sure** that it is not related to having > > random data but smth more fun, worth valg

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Olivier Grisel
2012/5/9 Yaroslav Halchenko : > > > so if it fails for some specific seed, I could check if it gets > replicated by running the same test again with the same seed. > > if it doesn't -- I know **for sure** that it is not related to having > random data but smth more fun, worth valgrinding for decisi

Re: [Scikit-learn-general] Announce: scikit-learn v0.11

2012-05-08 Thread Satrajit Ghosh
hi gael, thanks! this is a nice and helpful snapshot! perhaps it should be included in the documentation with future releases? cheers, satra On Tue, May 8, 2012 at 7:24 PM, Gael Varoquaux < [email protected]> wrote: > For communication purposes, I have summarized a bit the latest

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Darren Govoni
On Wed, 2012-05-09 at 01:45 +0200, Olivier Grisel wrote: > 2012/5/8 Darren Govoni : > > Still assessing the best models/algorithms to use, but primarily > > unsupervised learning ones. The models will come from 100's of millions > > of data points. We're looking at learned bayesian networks, predic

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Olivier Grisel
2012/5/8 Darren Govoni : > Still assessing the best models/algorithms to use, but primarily > unsupervised learning ones. The models will come from 100's of millions > of data points. We're looking at learned bayesian networks, predictive > analysis, multivariate analysis and clustering approaches

Re: [Scikit-learn-general] Announce: scikit-learn v0.11

2012-05-08 Thread Gael Varoquaux
For communication purposes, I have summarized a bit the latest developments in the scikit, trying to make it interesting to the experts and non-experts: http://gael-varoquaux.info/blog/?p=165 https://twitter.com/#!/GaelVaroquaux/status/22808966168576 Gael -

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Yaroslav Halchenko
On Tue, 08 May 2012, Yaroslav Halchenko wrote: > tests ... but @seed'ing every test is somewhat a burden (probably a > wise solution would be to come with with a nose plugin or smth to seed > RNG before running every test) a slight offtopic since nose-dev usergroup seems to be dead silent [1] (mi

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Yaroslav Halchenko
On Tue, 08 May 2012, Tiziano Zito wrote: > > > is there a point where generic numpy.random gets explicitly seeded > > > upon sklearn import? > > No, and I don't think that this is desirable: it would be a weird side > > effect of importing the scikit. It might be interesting to seed the > > global

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Tiziano Zito
> > is there a point where generic numpy.random gets explicitly seeded > > upon sklearn import? > > No, and I don't think that this is desirable: it would be a weird side > effect of importing the scikit. It might be interesting to seed the > global RNG in the tests, but I have found such an appro

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Darren Govoni
Still assessing the best models/algorithms to use, but primarily unsupervised learning ones. The models will come from 100's of millions of data points. We're looking at learned bayesian networks, predictive analysis, multivariate analysis and clustering approaches over distributed data. On Tue, 2

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Gael Varoquaux
On Tue, May 08, 2012 at 08:24:53PM +0200, Gael Varoquaux wrote: > > I guess then such global seeding would be of great help ;) > Don't think that it is related to RNGs. Specifically, in the codebase explore by the test that we are discussing, the only unseeded RNG that I can see is the C-level RN

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Gael Varoquaux
On Tue, May 08, 2012 at 08:24:53PM +0200, Gael Varoquaux wrote: > For the specific situation of ICA, there are indeed some unseeded RNGs, > which I am going to fix right now. Done in f6d7f45 G -- Live Security Virtual Co

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Gael Varoquaux
On Tue, May 08, 2012 at 02:19:31PM -0400, Yaroslav Halchenko wrote: > is there a point where generic numpy.random gets explicitly seeded > upon sklearn import? No, and I don't think that this is desirable: it would be a weird side effect of importing the scikit. It might be interesting to seed the

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Yaroslav Halchenko
On Tue, 08 May 2012, Gael Varoquaux wrote: > > I wonder what is the current approach to reproduce random failures of > > the tests battery? > There shouldn't be any :(. I concur! ;) > > would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess) > > to seed RNGs with some random but

Re: [Scikit-learn-general] reproducing test failures

2012-05-08 Thread Gael Varoquaux
On Tue, May 08, 2012 at 02:09:58PM -0400, Yaroslav Halchenko wrote: > I wonder what is the current approach to reproduce random failures of > the tests battery? There shouldn't be any :(. > would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess) > to seed RNGs with some random bu

[Scikit-learn-general] reproducing test failures

2012-05-08 Thread Yaroslav Halchenko
I wonder what is the current approach to reproduce random failures of the tests battery? would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess) to seed RNGs with some random but known seed and print it out (ideally if nose's verbosity is exposed then only in verbose mode) so exac

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Olivier Grisel
2012/5/8 Darren Govoni : > > Now, for my problem space, the data models _will not_ fit into memory on > a single CPU. So there inlies a problem. I suspect, as with most > engineering solutions, the tradeoff one is confronted with concerns > resources. One might be willing to trade off time for data

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Darren Govoni
Great explanation. Thanks for that. And I totally agree about the limitations of map/reduce. Many efforts seem to want to shoehorn all kinds of problems on map/reduce. Its curious to me how/why they want to do that. Now, for my problem space, the data models _will not_ fit into memory on a single

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Olivier Grisel
2012/5/7 Darren Govoni : > Good point. I'm no expert in the details of the algorithms per se, but I > wonder how the Apache Mahout folks are doing it using map/reduce. Is > there a data model in scikit that would be suitable for a map/reduce > algorithm approach? > > I know ipython can do map/reduc

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Darren Govoni
That's great. I'll wait and keep an eye on this. On Tue, 2012-05-08 at 18:39 +0200, Olivier Grisel wrote: > I also started some ipython parallel integration work during the pycon > 2012 sprint with Min RK, one of the main authors of ipython.parallel. > > Here is the code: > > https://github.com/

Re: [Scikit-learn-general] iPython integration?

2012-05-08 Thread Olivier Grisel
I also started some ipython parallel integration work during the pycon 2012 sprint with Min RK, one of the main authors of ipython.parallel. Here is the code: https://github.com/ogrisel/pycon-pydata-sprint I am planning to cleanup and package the resulting experiments from this sprint as a new p

Re: [Scikit-learn-general] Announce: scikit-learn v0.11

2012-05-08 Thread Jaques Grobler
Congratulations everyone! 2012/5/8 bthirion > Congratulations ! Thank you for the nice (and intense !) work. > > Bertrand > > On 05/08/2012 01:08 AM, Andreas Mueller wrote: > > Dear all, > I am happy to announce the 0.11 release of scikit-learn. > > This release includes some major new features

Re: [Scikit-learn-general] Announce: scikit-learn v0.11

2012-05-08 Thread bthirion
Congratulations ! Thank you for the nice (and intense !) work. Bertrand On 05/08/2012 01:08 AM, Andreas Mueller wrote: Dear all, I am happy to announce the 0.11 release of scikit-learn. This release includes some major new features such as randomized sparse models, gradient boosted regression

Re: [Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Andreas Mueller
On 05/08/2012 10:41 AM, Gael Varoquaux wrote: > On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote: >> I presumed there are valid reasons for using a numpy array, and represent >> error points as -1. > Reasons for using a numpy array are that it is faster and more memory > effici

Re: [Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Gael Varoquaux
On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote: > I presumed there are valid reasons for using a numpy array, and represent > error points as -1. Reasons for using a numpy array are that it is faster and more memory efficient than a list. Gael -

[Scikit-learn-general] dbscan: labels as numpy array

2012-05-08 Thread Félix-Antoine Fortin
Hi, I have recently used the DBSCAN implementation of scikit-learn, and I have a "quick" question. Currently, noise points are labelled as -1 in a numpy array. >From my point of view, clustering labels can be used for example as index of a >sequence. However, in Python -1 is still a valid ind