Re: [Scikit-learn-general] SO question for the tree growers

2013-04-05 Thread Gilles Louppe
Hi Paul, sorry to jump into that discussion, but it raised my interest.. > In the R RandomForest package, MeanDecreaseGini can be calculated. > > > Does scikit-learn somehow scale MeanDecreaseGini to the percentage scale. > Yes, in randomForest R package there is basically no scaling or normaliz

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Andreas Mueller
On 04/05/2013 01:23 PM, Rafael Calsaverini wrote: > If you have data in the form of a list of dictionaries like this: > > data = [{'target': 0 , 'featureVector' : [...]}, {'target': 1, > 'featureVector': [...]}, ... ] > > You can use pandas to easily convert them into something that > scikit-lear

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Andreas Mueller
On 04/05/2013 01:05 PM, Bill Power wrote: > Lars: must have missed your response earlier. i guess i was hoping for > convenient instead of good :-) > > i don't concede to some of your points though. that validation is > significantly complicated is not true as presumably you just need to > check

Re: [Scikit-learn-general] SO question for the tree growers

2013-04-05 Thread Paul . Czodrowski
Dear Gilles, sorry to jump into that discussion, but it raised my interest.. In the R RandomForest package, MeanDecreaseGini can be calculated. Does scikit-learn somehow scale MeanDecreaseGini to the percentage scale. Please find attached the variable importance as compute by scikit-learn's RF

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Rafael Calsaverini
If you have data in the form of a list of dictionaries like this: data = [{'target': 0 , 'featureVector' : [...]}, {'target': 1, 'featureVector': [...]}, ... ] You can use pandas to easily convert them into something that scikit-learn would accept: In [18]: import pandas In [19]: from sklearn im

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Bill Power
Lars: must have missed your response earlier. i guess i was hoping for convenient instead of good :-) i don't concede to some of your points though. that validation is significantly complicated is not true as presumably you just need to check for the feature dimension of each class. what's that? a

Re: [Scikit-learn-general] misleading example for DBSCAN?

2013-04-05 Thread Johannes Knopp
On 05.04.2013 12:13, Lars Buitinck wrote: > 2013/4/5 Lars Buitinck : >> 2013/4/4 Andreas Mueller : >>> I think the example is just wrong. Can someone confirm this? >> >> The actual DBSCAN algorithm wants distances, so the example is off. >> However, just feeding D instead of S to the algorithm brea

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Andreas Mueller
On 04/05/2013 12:19 PM, Bill Power wrote: > I think you misunderstood me. I meant something (more efficiently > written) along the lines of below. > > import numpy as np > > X0 = [[-1, 0], [0,-1]] > X1 = [[ 1, 0], [0, 1]] > > trData = { 0: X0, 1: X1 } > > X = np.array( [v for v in trData.values()]

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Bill Power
I think you misunderstood me. I meant something (more efficiently written) along the lines of below. import numpy as np X0 = [[-1, 0], [0,-1]] X1 = [[ 1, 0], [0, 1]] trData = { 0: X0, 1: X1 } X = np.array( [v for v in trData.values()] ).reshape( -1, 2 ) Y = np.array( [np.ones( len(v) ) * k for

Re: [Scikit-learn-general] misleading example for DBSCAN?

2013-04-05 Thread Lars Buitinck
2013/4/5 Lars Buitinck : > 2013/4/4 Andreas Mueller : >> I think the example is just wrong. Can someone confirm this? > > The actual DBSCAN algorithm wants distances, so the example is off. > However, just feeding D instead of S to the algorithm breaks the > script. Never mind, I see you've alread

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Lars Buitinck
2013/4/5 Bill Power : > i know this is going to sound a little silly, but I was thinking there that > it might be nice to be able to do this with scikit learn > > clf = sklearn.anyClassifier() > clf.fit( { 0: dataWithLabel0, >1: dataWithLabel1 } ) > > instead of having to separate the d

Re: [Scikit-learn-general] Fit functions

2013-04-05 Thread Philipp Singer
Dictionaries do not have duplicate keys (labels). You could only make a list of datawithLabelX for each key label. But what is the benefit of this? Philipp Am 05.04.2013 11:37, schrieb Bill Power: > i know this is going to sound a little silly, but I was thinking there > that it might be nice to

[Scikit-learn-general] Fit functions

2013-04-05 Thread Bill Power
i know this is going to sound a little silly, but I was thinking there that it might be nice to be able to do this with scikit learn clf = sklearn.anyClassifier() clf.fit( { 0: dataWithLabel0, 1: dataWithLabel1 } ) instead of having to separate the data/labels manually. i guess fit wou

Re: [Scikit-learn-general] misleading example for DBSCAN?

2013-04-05 Thread Lars Buitinck
2013/4/4 Andreas Mueller : > I think the example is just wrong. Can someone confirm this? The actual DBSCAN algorithm wants distances, so the example is off. However, just feeding D instead of S to the algorithm breaks the script. -- Lars Buitinck Scientific programmer, ILPS University of Amster