They are defined in the beta release of version 0.15.
On 30 June 2014 02:53, Abijith Kp wrote:
> In which version of sklearn, is the above mention 'make_pipeline' and
> 'make_union' defined??
>
> When I read through some example, the idea of using FeatureUnion and
> Pipelined are easy, I guess.
In which version of sklearn, is the above mention 'make_pipeline' and
'make_union' defined??
When I read through some example, the idea of using FeatureUnion and
Pipelined are easy, I guess. Former chains the features obtained from each
individual estimators given as the input were as the latter u
Actually, it is a little easier with `make_pipeline` and `make_union` which
weren't around at the time. I think it's a little more abstracted than most
people who would come across this problem would be comfortable to implement.
Still, it needs an example.
On 22 June 2014 15:31, Andy wrote:
>
Yeah that is exactly what I was thinking about.
Though I would disagree that it is not simple to write and lengthy ;)
class GetItemTransformer(TransformerMixin):
def __init__(self, field):
self.field = field
# assume default fit()
def transform(X):
return X[fiel
It is possible to do what you want, but it is not simple to write.
Scikit-learn could definitely benefit from an example showing this sort of
thing, or from a better API to help the user do it, as suggested at
https://github.com/scikit-learn/scikit-learn/issues/2034. There you will
find a lengthy c
What would be the advantage for using a shared vocabulary for Count
Vectorizer??
When I read about FeatureUnion, what I understood was that, the given list
of transformers would process the given data set completely. Could we use
it to selectively process different features?? Or is my understandin
Yes, you can use CountVectorizer.
Do you want the different features to share the same vocabulary?
To use the Count Vectorizer, you probably have to either get all the
values (for a shared vocabulary)
or learn one CountVectorizer per key (you could use FeatureUnion for that).
So there is a litt
Hi,
Initially, one of my feature list looks like: {"a":"3", "b":"random1",
"c":"", "d":"random2 text"}.
The random text contains names of people, email ids, some description,
numbers and goes on.
When I used DictVectorizer, I could not get an accurate clustering.
I wanted know if I could get a
Hi Abijith.
It depends on how you want to interpret the strings.
If they are texts and you want to interpret them based on their content,
Brians suggestion is the right one.
If you want to consider each possible string as a distinct feature, the
OneHotEncoder would be the right choice.
Could
Hi Abijith,
This should get you started:
http://scikit-learn.org/dev/tutorial/text_analytics/working_with_text_data.html
Brian
On 6/20/14, 12:05 PM, Abijith Kp wrote:
> Can anyone help me with the problem of dealing with feature which are
> both strings of varying length(say from 0 to 100-150 c
Can anyone help me with the problem of dealing with feature which are both
strings of varying length(say from 0 to 100-150 characters) and numbers?
What will be the most widely used techniques in such kind of situations?
And can it be solved using only scikit-learn?
PS: Initially I have to conver
11 matches
Mail list logo