Re: [scikit-learn] Text classification of large dataet

2017-12-20 Thread Joel Nothman
To clarify: You have 2.3M samples How many features? How many active features on average per sample? In 7k classes: multiclass or multilabel? Have you tried limiting the depth of the forest? Have you tried embedding your feature space into a smaller vector (pre-trained embeddings, hashing, lda, PC

Re: [scikit-learn] Text classification of large dataet

2017-12-20 Thread Roman Yurchak
Ranjana, have a look at this example http://scikit-learn.org/stable/auto_examples/applications/plot_out_of_core_classification.html Since you have a lot of RAM, you may not need to make all the classification pipeline out-of-core, a start with your current code could be to write a generator

Re: [scikit-learn] Parallel MLP version

2017-12-20 Thread Johnson, Jeremiah
For neural network training, try one of tensorflow, pytorch, chainer, or mxnet. They’ll all parallelize the computations and can run the computations on Nvidia GPUs with CUDA. Best regards, Jeremiah Sent from my iPhone On Dec 20, 2017, at 11:45, Raphael C mailto:drr...@gmail.com>> wrote: I

Re: [scikit-learn] Parallel MLP version

2017-12-20 Thread Raphael C
I believe tensorflow will do what you want. Raphael On 20 Dec 2017 16:43, "Luigi Lomasto" wrote: > Hi all, > > I have a computational problem to training my neural network so, can you > say me if exists any parallel version about MLP library? > > > __

[scikit-learn] Parallel MLP version

2017-12-20 Thread Luigi Lomasto
Hi all, I have a computational problem to training my neural network so, can you say me if exists any parallel version about MLP library? ___ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Any plans on generalizing Pipeline and transformers?

2017-12-20 Thread Manuel Castejón Limas
Thank you all for your interest! In order to clarify the case allow me to try to synthesize the spirit of what I'd like to put into the pipeline using this sequence of steps: #%% import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.cluster import DBSCAN from sklear