[scikit-learn] Text classification of large dataet

2017-12-19 Thread Ranjana Girish
Hai all, I am doing text classification. I have around 10 million data to be classified to around 7k category. Below is the code I am using *# Importing the libraries* *import pandas as pd* *import nltk* *from nltk.corpus import stopwords* *from nltk.tokenize import word_tokenize* *from

Re: [scikit-learn] Text classification of large dataset

2017-12-27 Thread Ranjana Girish
Hai all, Thank you for your suggestions. But I am still getting *memory error* while doing feature selection *fs = feature_selection.SelectPercentile(feature_selection.chi2, percentile=20)* *documenttermmatrix1 = fs.fit_transform(documenttermmatrix,y1)* *documenttermmatrix* will be of shape

[scikit-learn] help-Renaming features in Sckit-learn's CountVectorizer()

2018-03-05 Thread Ranjana Girish
Hai all, I have a very large pandas dataframe. Below is the sample * Id description* 1switvch for air conditioner transformer.. 2control tfrmr... 3coling pad. 4DRLG machine 5hair smothing