On Thu, Aug 01, 2013 at 11:46:09AM +0800, Shuo Wang wrote:
> ImportError: [joblib] Attempting to do parallel computingwithout protecting
> your import on a system that does not support forking. To use
> parallel-computing in a script, you must protect you main loop using "if
> __name__ == '__main__
Hey Chris,
This is good news. The problems are fairly minor. Don't worry about the
issue. These tests failing are numerically unstable ones. We'll see what
we can do about them, but they are not release blockers. The good news is
that we don't have major building or linking problem.
Thanks a lot!
I think all those results correspond to the RBF kernel. You have far too
few samples to learn an RBF model, so it's stored trivial coefficients
independent of C and gamma.
On Thu, Aug 1, 2013 at 1:56 PM, Josh Wasserstein wrote:
> Hi,
>
> I am noticing that for some models in my grid search I get
Hi,
I am noticing that for some models in my grid search I get virtually the
same exact results across 100 repetitions of CV. Is this normal? In case it
matters, I am working with ~30 data points (I know, it's a small dataset)
with ~5 dimensions.
Below are the details of the configuration that I
Hi,
I am trying to run 4 jobs on windows xp with sklearn 0.13.1
model = RandomForestRegressor(n_estimators=500, compute_importances =True,
n_jobs =4)
I am receiving the following error
Traceback (most recent call last):
File "", line 1, in
File "C:\Python27\lib\multiprocessing\forking.py",
As Christoph,
I am contacting you because you are the guy that rocks and provides
fantastically useful binaries of many scientific-computing packages under
Windows. We (the scikit-learn team) are going to release a new version of
scikit-learn. I have tagged the alpha release and uploaded the sourc
It's what they have done in the mulan library.
Arnaud
On 19 Jul 2013, at 13:24, Olivier Grisel wrote:
> 2013/7/19 Arnaud Joly :
>> You can probably average the precision recall curve
>> or use some ranking metrics [1].
>>
>> Arnaud
>>
>> [1] Mining Multi-label Data
>> http://lkm.fri.uni-lj.s
{"word": vocabulary[word], ...}
the trained data is lie [[0.0, 1.0, 'xxx', 'yyy', '13.0', ...], ]
so when i use DictVectorizer it will create an array when i run
fit_transform somethign like
array([[ 1., 0.],
[ 0., 1.]])
with different shape and data. I am not sure how i will repla
2013/7/31 Oğuz Yarımtepe :
> How will i use DictVectorizer for string values above?
It won't do categorical integer coding directly. You can keep a
separate dict of the string values, say vocabulary, then feed
DictVectorizer dicts of the form
{"word": vocabulary[word], ...}
--
Lars Buitinck
On Mon, Jul 29, 2013 at 12:19 AM, Ross Boucher wrote:
> Interesting, I've been using DictVectorizer (and one hot coded categorical
> data) with Random Forests and getting decent results. Is this just
> coincidental, and will I see better results if I combine the categorical
> data into a single c
Hi,
> What you get from DictVectorizer is a sparse matrix containing one-hot
> coded categorical values (booleans). Random forests don't support
> those, but fortunately they (should) handle categorical values without
> one-hot coding, so you do something like
>
>
I tried with string values and
Makes sense to me to deprecate here +1
2013/7/31 Olivier Grisel
> +1 for deprecating boolean mask for CV as well.
>
>
> --
> Get your SQL database under version control now!
> Version control is standard for application
+1 for deprecating boolean mask for CV as well.
--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent
caught up. So what steps can you take to put your
On Wed, Jul 31, 2013 at 09:14:15AM +1000, Joel Nothman wrote:
> What is the intention behind indices=False;
Old design oversight (aka historical reasons).
> why not deprecate it and simplify the API and code? (And speed up
> indexing by using np.take.)
+1! Making things simpler is always better.
hi,
indeed we could stick to indices and use np.take whenever possible.
In [33]: A = np.random.randn(500, 500)
In [34]: idx = np.unique(np.random.randint(0, 499, 400))
In [35]: mask = np.zeros(500, dtype=np.bool)
In [36]: mask[idx] = True
In [37]: %timeit A[idx]
1000 loops, best of 3: 1.79 ms per
15 matches
Mail list logo