subject:"\[scikit\-learn\] Text classification of large dataet"

Re: [scikit-learn] Text classification of large dataet

2017-12-20 Thread Joel Nothman

To clarify: You have 2.3M samples How many features? How many active features on average per sample? In 7k classes: multiclass or multilabel? Have you tried limiting the depth of the forest? Have you tried embedding your feature space into a smaller vector (pre-trained embeddings, hashing, lda,

Re: [scikit-learn] Text classification of large dataet

2017-12-20 Thread Roman Yurchak

Ranjana, have a look at this example http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html Since you have a lot of RAM, you may not need to make all the classification pipeline out-of-core, a start with your current code could be to write a generator

[scikit-learn] Text classification of large dataet

2017-12-19 Thread Ranjana Girish

Hai all, I am doing text classification. I have around 10 million data to be classified to around 7k category. Below is the code I am using *# Importing the libraries* *import pandas as pd* *import nltk* *from nltk.corpus import stopwords* *from nltk.tokenize import word_tokenize* *from

Re: [scikit-learn] Text classification of large dataet

Re: [scikit-learn] Text classification of large dataet

[scikit-learn] Text classification of large dataet

3 matches

Site Navigation

Mail list logo

Footer information