On Sun, Jun 2, 2013 at 4:43 PM, Mathieu Blondel wrote:
>
>
>> Sounds good to me. Only I would like some confirmation on whether
>> deprecating support for sequences of sequences is sensible.
>>
>
> Sequences of sequences and arrays of sets are both iterables of iterables,
> right? So, since it only
On Sun, Jun 2, 2013 at 4:26 PM, Joel Nothman
wrote:
>
> That's only true if users know they are required to pass binarized input
> to cross-validation routines such as GridSearchCV and cross_val_score, or
> else they might land up with a 2d array of ints instead of a 1d array of
> objects.
>
I ha
On Sun, Jun 2, 2013 at 6:08 PM, Mathieu Blondel wrote:
>
>
> On Sun, Jun 2, 2013 at 4:26 PM, Joel Nothman > wrote:
>
>>
>> That's only true if users know they are required to pass binarized input
>> to cross-validation routines such as GridSearchCV and cross_val_score, or
>> else they might land
On 06/01/2013 11:43 PM, o m wrote:
> Andy, on reading your tip, and reflecting on what I do, I'm tempted to
> claim
> that standardization is very important, regardless ...
>
> Assume x0 is very important but has a tiny range (-1/100, 1/100)
I think that something with a tiny range can be more "i
On Sun, Jun 2, 2013 at 6:34 PM, Joel Nothman
wrote:
> On Sun, Jun 2, 2013 at 6:08 PM, Mathieu Blondel wrote:
>
>>
>>
>> On Sun, Jun 2, 2013 at 4:26 PM, Joel Nothman <
>> jnoth...@student.usyd.edu.au> wrote:
>>
>>>
>>> That's only true if users know they are required to pass binarized input
>>> to
I got very good results on text century dating using random forests on
very few (20-ish) bag-of-words tf-idf features selected by chi2. It
depends on the problem.
Cheers,
Vlad
On Sat, Jun 1, 2013 at 9:01 PM, Andreas Mueller
wrote:
> On 06/01/2013 08:30 PM, Christian Jauvin wrote:
>> Hi,
>>
>> I
2013/6/1 Harold Nguyen :
> I was wondering if anyone can point me to a tutorial on clustering text
> documents, but then also displaying the results in a graph ? I see some
> examples on clustering text documents, but I'd like to be able to visualize
> the clusters.
You'll need dimensionality redu
Hi Lars,
Thank you very much for this response. Please excuse my questions since I'm
new.
>From here the document on TfidfVectorizer here:
http://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Does TfidfVectorizer take a sequence of filenames, where
or it invokes svm implementation from libsvm?--
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2%
Hi Andreas,
> Btw, you do encode the categorical variables using one-hot, right?
> The sklearn trees don't really support categorical variables.
I'm rather perplexed by this.. I assumed that sklearn's RF only
required its input to be numerical, so I only used a LabelEncoder up
to now.
My assumpt
On 06/02/2013 10:18 PM, mike wrote:
> or it invokes svm implementation from libsvm?
Yes it does, as it says in the docs:
http://scikit-learn.org/dev/modules/svm.html#implementation-details
Maybe we should put this into a more prominent place?
(in particular libsvm and liblinear are mentioned above
On 06/02/2013 10:53 PM, Christian Jauvin wrote:
> Hi Andreas,
>
>> Btw, you do encode the categorical variables using one-hot, right?
>> The sklearn trees don't really support categorical variables.
> I'm rather perplexed by this.. I assumed that sklearn's RF only
> required its input to be numeric
> Sklearn does not implement any special treatment for categorical variables.
> You can feed any float. The question is if it would work / what it does.
I think I'm confused about a couple of aspects (that's what happens I
guess when you play with algorithms for which you don't have a
complete and
On Mon, Jun 3, 2013 at 12:41 PM, Christian Jauvin wrote:
> > Sklearn does not implement any special treatment for categorical
> variables.
> > You can feed any float. The question is if it would work / what it does.
>
> I think I'm confused about a couple of aspects (that's what happens I
> guess
With the right settings, SGDClassifier is a home cooked implementation
of SVM so there's that too.
Vlad
On Mon, Jun 3, 2013 at 12:23 AM, Andreas Mueller
wrote:
> On 06/02/2013 10:18 PM, mike wrote:
>> or it invokes svm implementation from libsvm?
> Yes it does, as it says in the docs:
> http://s
On 06/03/2013 06:41 AM, Vlad Niculae wrote:
> With the right settings, SGDClassifier is a home cooked implementation
> of SVM so there's that too.
>
That is true. Thinking about it, it is a bit weird that SGDClassifier is
in linear_model and LinearSVC is in svm, as they both solve the
same optimiz
On 06/03/2013 05:19 AM, Joel Nothman wrote:
>
> However, in these last two cases, the number of possible splits at a
> single node is linear in the number of categories. Selecting an
> arbitrary partition allows exponentially many splits with respect to
> the number of categories (though there m
On 06/03/2013 04:41 AM, Christian Jauvin wrote:
>> Sklearn does not implement any special treatment for categorical variables.
>> You can feed any float. The question is if it would work / what it does.
> I think I'm confused about a couple of aspects (that's what happens I
> guess when you play wi
On 06/02/2013 08:48 PM, Harold Nguyen wrote:
Hi Lars,
Thank you very much for this response. Please excuse my questions
since I'm new.
From here the document on TfidfVectorizer here:
http://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html
Does Tfidf
19 matches
Mail list logo