On Tue, May 08, 2012 at 08:36:10PM -0400, Satrajit Ghosh wrote:
>thanks! this is a nice and helpful snapshot! perhaps it should be included
>in the documentation with future releases?
We do have a "what's new" section. For this release, we also tried to have
a 'highlights' part of the "wha
On Wed, 09 May 2012, Olivier Grisel wrote:
> > so if it fails for some specific seed, I could check if it gets
> > replicated by running the same test again with the same seed.
> > if it doesn't -- I know **for sure** that it is not related to having
> > random data but smth more fun, worth valg
2012/5/9 Yaroslav Halchenko :
>
>
> so if it fails for some specific seed, I could check if it gets
> replicated by running the same test again with the same seed.
>
> if it doesn't -- I know **for sure** that it is not related to having
> random data but smth more fun, worth valgrinding for decisi
hi gael,
thanks! this is a nice and helpful snapshot! perhaps it should be included
in the documentation with future releases?
cheers,
satra
On Tue, May 8, 2012 at 7:24 PM, Gael Varoquaux <
[email protected]> wrote:
> For communication purposes, I have summarized a bit the latest
On Wed, 2012-05-09 at 01:45 +0200, Olivier Grisel wrote:
> 2012/5/8 Darren Govoni :
> > Still assessing the best models/algorithms to use, but primarily
> > unsupervised learning ones. The models will come from 100's of millions
> > of data points. We're looking at learned bayesian networks, predic
2012/5/8 Darren Govoni :
> Still assessing the best models/algorithms to use, but primarily
> unsupervised learning ones. The models will come from 100's of millions
> of data points. We're looking at learned bayesian networks, predictive
> analysis, multivariate analysis and clustering approaches
For communication purposes, I have summarized a bit the latest
developments in the scikit, trying to make it interesting to the experts
and non-experts:
http://gael-varoquaux.info/blog/?p=165
https://twitter.com/#!/GaelVaroquaux/status/22808966168576
Gael
-
On Tue, 08 May 2012, Yaroslav Halchenko wrote:
> tests ... but @seed'ing every test is somewhat a burden (probably a
> wise solution would be to come with with a nose plugin or smth to seed
> RNG before running every test)
a slight offtopic since nose-dev usergroup seems to be dead silent [1]
(mi
On Tue, 08 May 2012, Tiziano Zito wrote:
> > > is there a point where generic numpy.random gets explicitly seeded
> > > upon sklearn import?
> > No, and I don't think that this is desirable: it would be a weird side
> > effect of importing the scikit. It might be interesting to seed the
> > global
> > is there a point where generic numpy.random gets explicitly seeded
> > upon sklearn import?
>
> No, and I don't think that this is desirable: it would be a weird side
> effect of importing the scikit. It might be interesting to seed the
> global RNG in the tests, but I have found such an appro
Still assessing the best models/algorithms to use, but primarily
unsupervised learning ones. The models will come from 100's of millions
of data points. We're looking at learned bayesian networks, predictive
analysis, multivariate analysis and clustering approaches over
distributed data.
On Tue, 2
On Tue, May 08, 2012 at 08:24:53PM +0200, Gael Varoquaux wrote:
> > I guess then such global seeding would be of great help ;)
> Don't think that it is related to RNGs.
Specifically, in the codebase explore by the test that we are discussing,
the only unseeded RNG that I can see is the C-level RN
On Tue, May 08, 2012 at 08:24:53PM +0200, Gael Varoquaux wrote:
> For the specific situation of ICA, there are indeed some unseeded RNGs,
> which I am going to fix right now.
Done in f6d7f45
G
--
Live Security Virtual Co
On Tue, May 08, 2012 at 02:19:31PM -0400, Yaroslav Halchenko wrote:
> is there a point where generic numpy.random gets explicitly seeded
> upon sklearn import?
No, and I don't think that this is desirable: it would be a weird side
effect of importing the scikit. It might be interesting to seed the
On Tue, 08 May 2012, Gael Varoquaux wrote:
> > I wonder what is the current approach to reproduce random failures of
> > the tests battery?
> There shouldn't be any :(.
I concur! ;)
> > would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess)
> > to seed RNGs with some random but
On Tue, May 08, 2012 at 02:09:58PM -0400, Yaroslav Halchenko wrote:
> I wonder what is the current approach to reproduce random failures of
> the tests battery?
There shouldn't be any :(.
> would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess)
> to seed RNGs with some random bu
I wonder what is the current approach to reproduce random failures of
the tests battery?
would it be feasible to suggest (PR) for sklearn/test_setup.py (I guess)
to seed RNGs with some random but known seed and print it out (ideally if
nose's verbosity is exposed then only in verbose mode) so exac
2012/5/8 Darren Govoni :
>
> Now, for my problem space, the data models _will not_ fit into memory on
> a single CPU. So there inlies a problem. I suspect, as with most
> engineering solutions, the tradeoff one is confronted with concerns
> resources. One might be willing to trade off time for data
Great explanation. Thanks for that. And I totally agree about the
limitations of map/reduce. Many efforts seem to want to shoehorn all
kinds of problems on map/reduce. Its curious to me how/why they want to
do that.
Now, for my problem space, the data models _will not_ fit into memory on
a single
2012/5/7 Darren Govoni :
> Good point. I'm no expert in the details of the algorithms per se, but I
> wonder how the Apache Mahout folks are doing it using map/reduce. Is
> there a data model in scikit that would be suitable for a map/reduce
> algorithm approach?
>
> I know ipython can do map/reduc
That's great. I'll wait and keep an eye on this.
On Tue, 2012-05-08 at 18:39 +0200, Olivier Grisel wrote:
> I also started some ipython parallel integration work during the pycon
> 2012 sprint with Min RK, one of the main authors of ipython.parallel.
>
> Here is the code:
>
> https://github.com/
I also started some ipython parallel integration work during the pycon
2012 sprint with Min RK, one of the main authors of ipython.parallel.
Here is the code:
https://github.com/ogrisel/pycon-pydata-sprint
I am planning to cleanup and package the resulting experiments from
this sprint as a new p
Congratulations everyone!
2012/5/8 bthirion
> Congratulations ! Thank you for the nice (and intense !) work.
>
> Bertrand
>
> On 05/08/2012 01:08 AM, Andreas Mueller wrote:
>
> Dear all,
> I am happy to announce the 0.11 release of scikit-learn.
>
> This release includes some major new features
Congratulations ! Thank you for the nice (and intense !) work.
Bertrand
On 05/08/2012 01:08 AM, Andreas Mueller wrote:
Dear all,
I am happy to announce the 0.11 release of scikit-learn.
This release includes some major new features such as
randomized sparse models, gradient boosted regression
On 05/08/2012 10:41 AM, Gael Varoquaux wrote:
> On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote:
>> I presumed there are valid reasons for using a numpy array, and represent
>> error points as -1.
> Reasons for using a numpy array are that it is faster and more memory
> effici
On Mon, May 07, 2012 at 03:50:00PM -0400, Félix-Antoine Fortin wrote:
> I presumed there are valid reasons for using a numpy array, and represent
> error points as -1.
Reasons for using a numpy array are that it is faster and more memory
efficient than a list.
Gael
-
Hi,
I have recently used the DBSCAN implementation of scikit-learn, and I have a
"quick" question.
Currently, noise points are labelled as -1 in a numpy array.
>From my point of view, clustering labels can be used for example as index of a
>sequence.
However, in Python -1 is still a valid ind
27 matches
Mail list logo