[Scikit-learn-general] scikit-learn-0.11

Andres Soto Tue, 07 Aug 2012 09:30:45 -0700

according to 
http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction
 (6.2.2.3. Common Vectorizer usage),
I did:
>>> from sklearn.feature_extraction.text import CountVectorizer
>>> vectorizer = CountVectorizer()
but I get


>>> vectorizer
CountVectorizer(analyzer=WordNGramAnalyzer(charset='utf-8', max_n=1, min_n=1,
         preprocessor=RomanPreprocessor(),
         stop_words=set(['all', 'six', 'less', 'being', 'indeed', 'over', 
'move', 'anyway', 'four', 'not', 'own', 'through', 'yourselves', 'fify', 
'where', 'mill', 'only', 'find', 'before', 'one'...without', 'so', 'five', 
'the', 'first', 'whereas', 'once']),
         token_pattern='\\b\\w\\w+\\b'),
        dtype=<type 'long'>, max_df=1.0, max_features=None,
        vocabulary=None)
in spite of 

>>> vectorizer CountVectorizer(analyzer='word', binary=False, charset='utf-8', 
>>> charset_error='strict', dtype=<type 'long'>, input='content', 
>>> lowercase=True, max_df=1.0, max_features=None, max_n=1, min_n=1, 
>>> preprocessor=None, stop_words=None, strip_accents=None, 
>>> token_pattern=u'\\b\\w\\w+\\b', tokenizer=None, vocabulary=None)
as it says in that web page 
regards


Prof. Dr. Andrés Soto
DES DACI
UNACAR



>________________________________
> From: Robert Layton <[email protected]>
>To: Andres Soto <[email protected]>; 
>[email protected] 
>Sent: Monday, August 6, 2012 7:24 PM
>Subject: Re: [Scikit-learn-general] scikit-learn-0.11 vs scikit-learn-0.9
> 
>
>On 7 August 2012 03:18, Andres Soto <[email protected]> wrote:
>
>Hi 
>>I am
using python-2.7.3, numpy-1.6.2-win32-superpack-python2.7, 
scipy-0.11.0rc1-win32-superpack-python2.7,
scikit-learn-0.11.win32-py2.7
>>I
tried the following 
>> 
>>>>>
train_set = ("The sky is blue.", "The sun is bright.")
>>>>>
test_set = ("The sun in the sky is bright.",
>>"We
can see the shining sun, the bright sun.")
>>>>>
from sklearn.feature_extraction.text import CountVectorizer
>>>>>
vectorizer = CountVectorizer()
>>>>>
print vectorizer
>>CountVectorizer(analyzer=word,
binary=False, charset=utf-8,
>>        charset_error=strict, dtype=<type
'long'>, input=content,
>>        lowercase=True, max_df=1.0,
max_features=None, max_n=1, min_n=1,
>>        preprocessor=None, stop_words=None,
strip_accents=None,
>>        token_pattern=\b\w\w+\b,
tokenizer=None, vocabulary=None)
>>>>>
vectorizer.fit_transform(train_set)
>><2x6
sparse matrix of type '<type 'numpy.int64'>'
>>            with 8 stored elements in COOrdinate
format>
>>>>>
print vectorizer.vocabulary
>> 
>>Traceback
(most recent call last):
>>  File "<pyshell#6>", line 1,
in <module>
>>    print vectorizer.vocabulary
>>AttributeError:
'CountVectorizer' object has no attribute 'vocabulary'
>>>>> 
>> 
>>I tried to fix the parameters of CountVectorizer (analyzer = 
>>WordNGramAnalyzer, vocabulary = dict) but
it didn’t work. Therefore I decided to install sklearn 0.9 and it works, so we
could say that everything is OK but I still would like to know what is wrong
with version sklearn 0.11
>>Andrés Soto
>>
>>------------------------------------------------------------------------------
>>Live Security Virtual Conference
>>Exclusive live event will cover all the ways today's security and
>>threat landscape has changed and how IT managers can respond. Discussions
>>will include endpoint security, mobile security and the latest in malware
>>threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>>_______________________________________________
>>Scikit-learn-general mailing list
>>[email protected]
>>https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>Hi Andres,
>
>
>Short answer: vocabulary_
>>>> print vectorizer.vocabulary_
>{u'blue': 0, u'sun': 4, u'is': 2, u'sky': 3, u'bright': 1, u'the': 5}
>
>
>(You can view all the methods and attributes by using dir(vectorizer) )
>
>
>Longer answer:
>The interface for this part of scikits.learn has been significantly changed in 
>the time between those releases.
>The changes make the section easier to use and maintain, which is why they 
>were updated, despite breaking compatability.
>The updated documentation can be found 
>here: http://scikit-learn.org/stable/modules/feature_extraction.html#text-feature-extraction
>
>
>Hope that helps,
>
>
>Robert
>
>
>-- 
>
>Public key at: http://pgp.mit.edu/ Search for this email address and select 
>the key from "2011-08-19" (key id: 54BA8735)
>
>
>
>

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

[Scikit-learn-general] scikit-learn-0.11

Reply via email to