[Scikit-learn-general] some points on the documentation

2016-01-26 Thread Panos Louridas
Hello, A few points on the documentation / examples in the scikit-learn site: * In the example that plots the decision surface of a decision tree on the Iris dataset (http://scikit-learn.org/stable/auto_examples/tree/plot_iris.html#example-tree-plot-iris-py), the dataset is initially shuffled

Re: [Scikit-learn-general] Building sklearn for different python versions (development)

2016-01-26 Thread WENDLINGER Antoine
Well I did not use travis because I thought it was a little cumbersome to have to push every little change I made to my Github repo, plus a travis build takes ~15 min. I was looking for a way to keep the binaries for both versions with the same source directory (I don't do edit cython files for

[Scikit-learn-general] Latent Dirichlet Allocation

2016-01-26 Thread Rockenkamm, Christian
Hallo, I have question concerning the Latent Dirichlet Allocation. The results I get from using it are a bit confusing. At first I use about 3000 documents. In the preparation with the CountVectorizrt I use the following parameters : max_df=0.95 and min_df=0.05. For the LDA fit I use the bath

Re: [Scikit-learn-general] Building sklearn for different python versions (development)

2016-01-26 Thread Jacob Vanderplas
> I don't see an easy way to maintain the changes in two different directories. If both directories are Git repositories linked to a common remote, you could commit the changes on a branch and then sync them that way. Jake VanderPlas Senior Data Science Fellow Director of Research in Physical

Re: [Scikit-learn-general] some points on the documentation

2016-01-26 Thread Andreas Mueller
On 01/26/2016 07:17 AM, Panos Louridas wrote: > Hello, > > A few points on the documentation / examples in the scikit-learn site: > > * In the example that plots the decision surface of a decision tree on the > Iris dataset >

Re: [Scikit-learn-general] Latent Dirichlet Allocation

2016-01-26 Thread Andreas Mueller
Hi Christian. Can you provide the data and code to reproduce? Best, Andy On 01/26/2016 08:21 AM, Rockenkamm, Christian wrote: Hallo, I have question concerning the Latent Dirichlet Allocation. The results I get from using it are a bit confusing. At first I use about 3000 documents. In the

Re: [Scikit-learn-general] Latent Dirichlet Allocation

2016-01-26 Thread Joel Nothman
How many distinct words are in your dataset? On 27 January 2016 at 00:21, Rockenkamm, Christian < c.rockenk...@stud.uni-goettingen.de> wrote: > Hallo, > > > I have question concerning the Latent Dirichlet Allocation. The results I > get from using it are a bit confusing. > > At first I use about

Re: [Scikit-learn-general] Latent Dirichlet Allocation

2016-01-26 Thread Rockenkamm, Christian
I used more datasets in a range from 2200 to 3500 distinct words in the tf for training the LDA. This data are preprocessed with lemmatizing before CountVectorizrt. Von: Joel Nothman [joel.noth...@gmail.com] Gesendet: Dienstag, 26. Januar 2016 23:35 An: