[Scikit-learn-general] Introducing spark-sklearn, a scikit-learn integration package for Spark

2016-02-10 Thread Tim Hunter
learn-with-spark.html Let us know if you have any questions. Also, documentation or code contributions are much welcome (Apache 2.0 license). Cheers Tim and Joseph -- Site24x7 APM Insight: Get Deep Visibility into A

[Scikit-learn-general] Feature selection and ranking for different feature types?

2015-05-16 Thread Tim
categorical or ordered features? The above three ways all return measurements and ranking of the features. But I wonder if the results can be reliable due to different feature types. What do you suggest me to do feature selection and feature ranking in my problem? Thanks, Tim

Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-16 Thread Tim
: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn? To: scikit-learn-general@lists.sourceforge.net Date: Wednesday, April 15, 2015, 1:48 PM Hi. Yes, run "make latexpdf" in the "doc" folder. Best, Andy On 04/15/2015 01:11 PM,

Re: [Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-15 Thread Tim
@lists.sourceforge.net Date: Wednesday, April 15, 2015, 12:55 PM Hi Tim. There are pdfs for 0.16.0 and 0.16.1 up now at http://sourceforge.net/projects/scikit-learn/files/documentation/ Let us know if there are issues with it. Cheers, Andy On 04/15/2015 12:08 PM, Tim wrote: > He

[Scikit-learn-general] Is there a pdf documentation for the latest stable scikit-learn?

2015-04-15 Thread Tim
Hello, I am looking for a pdf file for the documentation for the latest stable scikit-learn i.e. 0.16.1. I followed http://scikit-learn.org/stable/support.html#documentation-resources, which leads me to http://sourceforge.net/projects/scikit-learn/files/documentation/, But the pdf files are f

Re: [Scikit-learn-general] CV scores vs scores on a manual split

2015-02-19 Thread Tim Head
the output of decision_function() works. Which makes sense. Presumably if you used only the last column of the predict_proba() output it would also work. > This came up already quite a bit, not sure how we can avoid people making > this mistake. > > Not sure either, as soon as I read it

Re: [Scikit-learn-general] CV scores vs scores on a manual split

2015-02-19 Thread Tim Head
Hi Gilles, On Thu Feb 19 2015 at 8:35:35 AM Gilles Louppe wrote: > Hi Tim, > > By default, cross_val_score uses on StratifiedKFold(shuffle=False) to > create the train/test folds while train_test_split uses ShuffleSplit. > The discrepancy you observe might therefore come from eit

[Scikit-learn-general] CV scores vs scores on a manual split

2015-02-18 Thread Tim Head
Hello, I was comparing scores from CV with a score obtained from training on a subset of the data used in the CV and get very different answers. This surprised me, should I be? If not how do I understand how/why this happens? I run: scores = cross_validation.cross_val_score(clf, X_dev, y_dev, sc

Re: [Scikit-learn-general] Manual categories/separate classifiers

2014-05-31 Thread Tim Head
Hi Gilles, On 23 May 2014 15:06, Gilles Louppe wrote: > Hi Tim, > > In principles, what you describe exactly corresponds to the decision tree > algorithm. You partition the input space into smaller subspaces, on which > you recursively build sub-decision trees. > Exactly. Wh

[Scikit-learn-general] Manual categories/separate classifiers

2014-05-23 Thread Tim Head
there something like this in scikit-learn already? What I am looking for is something to help with the book keeping of which classifier to use when etc. If it doesn't exist I will try my hand at writing something ;) Tim -- http://betatim.gith

[Scikit-learn-general] KModes

2014-02-06 Thread tim pierson
Hi, Forgive the general inquiry, but I've been trying to find a python implementation of k modes clustering (for nominal/categorical data). Does anyone know of one in existence? (Would this be something the scikit learn community would be interested in?) Thanks,

Re: [Scikit-learn-general] User Survey

2013-02-04 Thread Tim Head
tick several boxes? The percentages are given as: replies / 306 which looks a bit odd at first glance as it means they sum to >100% Tim -- http://j.mp/timhead -- Everyone hates slow websites. So do we. Make you