[Scikit-learn-general] Calculating statistical significance between two values

2015-05-15 Thread Jack Alan
Hi folks, I've a question in my mind I could not find a proper answer for it. Suppose I have two different systems A and B applied on the same dataset and using different algorithm. Each system scores a specific F-measure(F1) such as: System A: 88% F1 System B: 89.6% F1 I want to see if the diffe

[Scikit-learn-general] Micro and Macro F-measure for text classification

2015-04-09 Thread Jack Alan
Hi folks, I wonder for classification of text documents available on: http://scikit-learn.org/stable/auto_examples/text/mlcomp_sparse_document_classification.html#example-text-mlcomp-sparse-document-classification-py What sort of F-measure that has been used? Is it Micro or Macro? and how to chan

[Scikit-learn-general] Latent Semantic Indexing (LSI) with Sklearn

2013-01-03 Thread Jack Alan
Hi all, I'm working in document classification and I wonder if there is a way of having the feature vector calculated based on Latent Semantic Indexing (LSI) instead of tf or tf-idf. As you know with LSI or Latent Dirichlet Allocation (LDA), semantic features are captured. I found an online Pytho

[Scikit-learn-general] Text classification and file names output

2012-08-11 Thread Jack Alan
Hi everyone, I'm working on text classification on the tutorial provided: document_classification_20newsgroups.py I wonder how I'll be able to print a list of the documents' names being used in the test folder with their predicted classes after classification process. The output wanted is someho