[Scikit-learn-general] Speed up transformation step with multiple 1 vs rest binary text classifiers.

2015-07-02 Thread nmura...@masonlive.gmu.edu
Hello, I have a text classification problem where I have about 50 classes and have 50 binary classifiers (1 per topic). The training set used to train each topic classifier is different (some instances might overlap). Each instance consists of a text snippet which is transformed using tf-idf

Re: [Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

2015-04-29 Thread nmura...@masonlive.gmu.edu
stian On Apr 28, 2015, at 10:40 PM, nmura...@masonlive.gmu.edu<mailto:nmura...@masonlive.gmu.edu> wrote: Hello, I am very new to scikit-learn and am trying to run cross-validation on a data frame consisting of text features, classification class. I am trying to perform text data classific

[Scikit-learn-general] K-Fold-Cross-validation in Scikit-Learn

2015-04-28 Thread nmura...@masonlive.gmu.edu
Hello, I am very new to scikit-learn and am trying to run cross-validation on a data frame consisting of text features, classification class. I am trying to perform text data classification. It is a 2-class classification problem where the distribution between positive and negative instances i

Re: [Scikit-learn-general] Contributing in a New Topic : Recommender Systems

2014-02-02 Thread nmura...@masonlive.gmu.edu
This is in response to the thread on recommender system implementation in scikit-learn. I would also like to know if any of the scikit-learn contributors are willing to mentor a project which implements basic recommender system algorithms - collaborative filtering (user-based/item-based/matrix

Re: [Scikit-learn-general] Advice for where to start

2014-01-21 Thread nmura...@masonlive.gmu.edu
Firstly you need to preprocess your data a good tool for that is PANDAS. That is 60% of any machine learning task as you will see. What is the goal you are trying to achieve? If you don't have labelled data, again I only glanced at your post. Unsupervised learning is a good way to go in which c

Re: [Scikit-learn-general] Google Summer of Code 2014

2014-01-16 Thread nmura...@masonlive.gmu.edu
I agree that sparse matrices need to be supported as one of the main properties inherent to the user/item rating matrix in recommender systems is its sparsity. This sparsity is what has given rise to such a large scale of research in the field. Hence this property would have to be taken advanta

[Scikit-learn-general] twentynewsgroups dataset fetch not working

2013-12-09 Thread nmura...@masonlive.gmu.edu
I am running python 2.7.3, using enthought canopy and am having issues with fetching the twenty news groups dataset. It says empty file when I try using the code provided in the example on the following page: http://scikit-learn.org/stable/datasets/twenty_newsgroups.html The first two lines of