Hi Youssef,
You're trying to do exactly what I did. First thing to note is that the
Microsoft guys don't precompute the features, rather they compute them on
the fly. That means that they only need enough memory to store the depth
images, and since they have a 1000 core cluster, computing the feat
At the moment your three options are
1) get more memory
2) do feature selection - 400k features on 200k samples seems to me to
contain a lot of redundant information or irrelevant features
3) submit a PR to support dense matrices - this is going to be a lot of
work and I doubt it's worth it.
All t
Congratulations Olivier!
On Apr 17, 2013 7:13 AM, "Gilles Louppe" wrote:
> Congratulations are in order :-)
>
>
> On 17 April 2013 08:06, Peter Prettenhofer
> wrote:
>
>> That's great - congratulations Olivier!
>>
>> Definitely, no pressure ;-)
>>
>>
>> 2013/4/17 Ronnie Ghose
>>
>>> wow :O cong
As Gilles says, the scanning windows approach is pretty common for object
(and face) detection. Have you looked at the Viola Jones paper? It's the
standard for face detection and now that we have adaboost classifiers you
should be able to knock up an example quite quickly. Scikit Image might be
qui
Unfortunately I recently moved to Ubuntu so I'm not going to be of much
help right now...
On Mar 15, 2013 11:48 AM, "george manus" wrote:
> Brian Holt writes:
>
> >
> >
> > Up until very recently I was working on windows 7 64bit without any
> troub
Up until very recently I was working on windows 7 64bit without any
trouble.
Are you using the Enthought Python Distribution or pythonxy or are you
building scikit learn for yourself?
On Mar 14, 2013 9:46 PM, "george manus" wrote:
>
>
> Leon Palafox writes:
>
> >
> >
> > What is the issue you'v
Is it any one of these?
acronyms.thefreedictionary.com/LOF
On Jan 30, 2013 2:21 PM, "Andreas Mueller" wrote:
> On 01/30/2013 03:15 PM, Oğuz Yarımtepe wrote:
> > I haven't seen any LOF implementation at the library. Any further
> > plans about it or a way to implement it?
> >
> >
> What is LOF? T
I'm with GIGO. The name of the model (classifier or regressor) should be
enough clue to the user which they should use for their problem.
On Oct 24, 2012 5:59 PM, "Andreas Mueller" wrote:
> Am 24.10.2012 18:53, schrieb Mathieu Blondel:
>
>
>
> On Thu, Oct 25, 2012 at 1:39 AM, Gael Varoquaux <
>
If you want rules you can create an exporter similar to the graphviz one.
But just to be clear this tree implementation is CART not C4.5, so you
shouldn't be expecting that the tree stores rules in your format.
Brian
On Oct 24, 2012 5:19 PM, "Didier Vila" wrote:
> >>>Ok - then that's the problem
> Just to make it clear: adding a dependency on make or cmake is just not an
option. These tools are not part of the standard Python build chain.
Are you sure? We already use make in scikit-learn...
On 15 October 2012 07:45, Andreas Mueller wrote:
> Am 15.10.2012 08:36, schrieb Mathieu Blonde
If we wanted to support MSVC then I'd strongly suggest using CMake, in fact
I'd recommend CMake anyway and just generate makefiles.
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Reli
Make is bundled with cygwin so I see no reason why it wouldn't work under
windows.
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
wha
Or (1000[L], 200[L])? The ellipses are a bit general in that they can match
anything.
--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
The latest build still has the L suffix doctest failures and still has
the fastMCD bad n_trials exception. However the spectral tests are
looking a bit better with 1 new failure:
==
ERROR: Tests the FastMCD algorithm implementatio
Gael,
Your idea of using `print`, which calls str(), does actually work on
longs, as does calling int():
In [9]: int(2000L)
Out[9]: 2000
In [10]: str(2000L)
Out[10]: '2000'
However, it doesn't have the desired effect on a tuple of longs
In [11]: str( (1000L,200L) )
Out[11]: '(1000L, 200L)'
So
Hi Gael,
I'm not sure its what you want to hear:
In [3]: import sklearn.datasets
In [4]: digits = sklearn.datasets.load_digits()
In [5]: digits.data.shape
Out[5]: (1797L, 64L)
In [6]: print digits.data.shape
(1797L, 64L)
On 7 October 2012 15:57, Gael Varoquaux wrote:
> Thanks a lot Brian,
>
Sorry guys, I've had loads of stuff on and I might have a chance to look at
it still tonight but don't bank on it...
On Oct 7, 2012 8:12 PM, "Andreas Müller" wrote:
>
> >
> > Can you reproduce the docstring issues? I cannot. I think that they
> > can
> > be solved simply by adding a 'print' in th
Doctest failures:
==
FAIL: Doctest: sklearn.datasets.base.load_boston
--
Traceback (most recent call last):
File "C:\Python27\lib\doctest.py", line 2201, in run
It seems that 0.12.X fixes these 2 errors that are present in master
without introducing others:
==
ERROR: test_locally_linear.test_lle_manifold
--
Traceback (mos
Hi Gael,
Here are the results of Win7 64bit build
EPD64bit 7.1.3, cygwin, numpy 1.6.1
Ran 1294 tests in 110.393s
FAILED (SKIP=11, errors=3, failures=9)
The 9 failures are all Doctest failures where integers are suffixed by
'L' on 64bit machines fail string comparisons to the number without an
'L
I can help with the windows build...
Brian
On Sep 30, 2012 4:18 PM, "Gael Varoquaux"
wrote:
> Hey list,
>
> Next week end Andy and I are going to release an 0.12.1 bugfix release.
> This will be a bug fix release: no additional feature compared to the
> 0.12.
>
> If you want to help us, you can
imators=10
> clf = RandomForestClassifier(n_estimators=10, oob_score=True)
> clf.fit(X,y)
> print clf.oob_score_
>
> clf.oob_score_ will give oob accuracy.
>
> But I would also like to know what percent of data is used to calculate
> this score?
>
>
>
>
> On Wed,
You're absolutely right, you can simply use the oob estimate as your
measure of generalisability. No need for GridSearchCV...
On Sep 12, 2012 12:09 PM, "Sheila the angel" wrote:
> Hello all,
> I want to optimize n_estimators and max_features for ensemble methods (say
> forRandomForestClassifier )
Hi Aliabbas,
By coincidence I've just spent the last 2 hours debugging my windows
build and I've just finally got it sorted, so I can empathise with
you!
May I suggest that you download the Enthought 64bit distribution? It
comes with sklearn 0.11 already and works out of the box. You'll need
to s
Hi Marcos,
The easiest option is always to uninstall version 0.11. Failing that, try
putting the new location at the beginning of your PYTHONPATH.
Cheers
Brian
On Sep 1, 2012 3:36 AM, "Marcos Wolff" wrote:
> for compiling yes:
>
> git clone git://github.com/scikit-learn/scikit-learn.git
> cd sc
Woohoo! I might be a bit biased though :)
Well done emanuele and well done Scikit-Learn for being such an awesome project!
On 30 August 2012 16:10, Alexandre Gramfort wrote:
>> Congrats indeed! Which of the 2 competitions did you / he won?
>
> the first and guess with what? ... Random forest ...
Thanks Jim,
I'm on numpy 1.3.0, which might be the problem.
Its not a show stopper for me, I think I've found a way not to end up
with this case.
Regards
Brian
On 2 August 2012 15:54, Jim Vickroy wrote:
> On 8/2/2012 8:27 AM, Brian Holt wrote:
>> Thanks Jim,
>>
>&
Thanks Jim,
Could you try it again with
X = np.array([[0]])
Note the double "[" bracket - this is what causes the problem for me.
Cheers
Brian
On 2 August 2012 15:23, Jim Vickroy wrote:
> On 8/2/2012 6:05 AM, Brian Holt wrote:
>> Hi list,
>>
>> I'm refa
Hi list,
I'm refactoring the tree module to introduce lazy argsorting and my
unit tests are failing with:
Exception ValueError: ValueError(u'ndarray is not Fortran
contiguous',) in 'sklearn.tree._tree.Tree.recursive_partition' ignored
I think I've pinned down the problem to this minimal samp
0.4441, 0.011 , 0.046 , 0.4921, 0.078 ],
> dtype=float32)
>
> Also, Y has some values = -1.0.
>
> regards
> shankar.
>
>
>
>
>
>
> On Thu, Jul 19, 2012 at 4:58 PM, Brian Holt wrote:
>>
>> Hi Shankar,
>>
>&g
Hi Shankar,
Can you paste a small snippet of your data (X_train, Y_train) that
reproduces this behaviour?
Cheers
Brian
--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
Hi Randy,
You're right that the current implementation doesn't support non-numeric types
(for efficiency and compatibility with other sklearn classifiers), but you're
also right that trees can theoretically support any type as input so long as
the < operator is defined for it. I'm not sure whet
Decision trees tend to overfit, so they are most often used (unpruned) in a
forest. That said, I think it would be a useful contribution to our offering.
Brian
-Original Message-
From: Charanpal Dhanjal
Date: Tue, 13 Mar 2012 11:20:45
To:
Reply-To: scikit-learn-general@lists.sourcefo
http://research.microsoft.com/pubs/12/decisionForests_MSR_TR_2011_114.pdf
--
Write once. Port to many.
Get the SDK and tools to simplify cross-platform app development. Create
new or port existing apps to sell to consu
Hi Andy,
The best way to understand the min_density parameter is to think of it as
'the minimum subset population density'. The idea is that if this density
parameter gets too low, then the program should copy the points and proceed
to split using the copied subset.
As an example, assume that the
As a follow up, I found a description of the parallel tree training
algorithm [2] that MSR used.
Regards,
Brian
[2] http://budiu.info/work/budiu-biglearn11.pdf
--
All the data continuously generated in your IT infrastruct
For those who might be interested, there was a very interesting tutorial on
decision trees[1] presented by Antonio Criminisi and Jamie Shotton (the
guys at MSR behind the human pose estimation algorithm for the Kinect) at
ICCV last week.
Their approach differs from the implementation that exists i
>I have myself made a lot of changes in tree.py and _tree.pyx in a lot
of places in the code. Wouldn't it be easier for you to merge your
code into my files? As I see in [1, 2] your changes are localized, and
hence it would be quicker for you to merge them into my files than for
me merging all my c
@pprett: Thanks for doing the hard work to change the tree into a numpy
representation. I have been thinking a lot about it, and I was just about
to implement it, but you've got there first. I have a few suggestions after
looking at your code that I'd like to try out, so I might make a clone.
---
> Right, but it seems to me that this is exactly what we want to test the>
> hyothesis. Maybe I am being dense, as I m a bit rushing through my mail,> but
> it seems to me that if you keep a reference to a, then you compensate> for
> the difference that was pointed out in the discussion below, i
>Still, almost 4 minutes just to extend the python heap and reallocate
>a bunch of already allocated objects seems unlikely. Also I don't
>understand why the Python interpreter would need to "move" allocated
>object: it can just grow the heap, reallocate a larger buffer list (if
>needed, with just
> Interesting. This hypothesis should be testable, for instance by keeping> a
> reference on 'a', appending it to a list. I'd be interested in the> results,
> if you mind trying out Brian.
I'm not sure I understand. I thought that by appending to a list I am
keeping a reference to the object.
cPickle with HIGHEST_PROTOCOL is significantly faster, it averages 15
seconds to load the 10 tree forest compared to the 5 minutes without.
What still confuses me is why loading the forests and storing them in
a list should be any slower than loading them individually. In other
words, why should
Firstly, thanks for all the helpful comments. I didn't know that the
protocol made such a big difference, so until now in ignorance I've
been using the default.
That said, I left a test running last night on one of our centre's
servers and it took 8hrs to load 20 forests ( each with 10 trees,
dep
Once a Decision Tree ( or a forest ) has been trained, I almost always
want to save the resulting classifier to disk and then load the
classifier at a later stage for testing.
My dataset is 5.2GB on disk: (690K * 2K) float32s. I can load this
into memory using `np.load('dataset.npy')` in 20 secon
I'd like to cite this paper, but I can't find it anywhere in
www.jmlr.org? Does anyone have a link?
--
The demand for IT networking professionals continues to grow, and the
demand for specialized networking skills is grow
What is the difference between `asarray` and `asanyarray`?
The documentation for `asanyarray` says: Convert the input to an
ndarray, but pass ndarray subclasses through.
The documentation for `asarray` says: Convert the input to an array.
What I don't get is why `asanyarray` won't convert a `matr
> I vote for CONVERTING and in addition we should implement a common
test suite that checks for input types/shape of our estimators (AFAIR
this was proposed by Mathieu a while ago).
+1
On 20 October 2011 14:15, Peter Prettenhofer
wrote:
> Thanks for raising this issue Lars.
>
> I vote for CONVER
This is cross-posted from the scikits.image mailing list; It was so
interesting, I thought it a waste not to use the opportunity.
We've had a number of discussions on cython types, and how we wish that
cython would support some sort of templates. This would be very useful for
the `tree` module (t
+1 even though its not as accurate. If the tests pass, then its accurate
enough IMHO.
--
All the data continuously generated in your IT infrastructure contains a
definitive record of customers, application performance, sec
Is there a way to specify the number of cluster centres required by
meanshift? From the documentation and a bit of playing around, it seems
like algorithm decides how many cluster centres to discover...
--
All the data con
Hi, PR310 is nearly ready to be merged, if anyone has any further comment,
please let me know.
Link: https://github.com/scikit-learn/scikit-learn/pull/310
This pull request contains an implementation of Classification and
Regression Trees. This version is highly optimised and is significantly
fas
> As for the Bayesian inference setting, there is already PyMC, and I think the
> focus should be on improving that project rather than trying to make
> scikit-learn do everything.
Thanks David! I've spent hours looking for a package that does
inference in python (hence this email) and PyMC looks
Does [Bayesian Inference](http://en.wikipedia.org/wiki/Bayesian_inference)
fall under the scope of scikit-learn? Probabilistic graphical models
are an exciting field in machine learning, with the theory going back
at least as far as 1982.
If it is of interest, then the obvious question is: do we r
54 matches
Mail list logo