Re: [Scikit-learn-general] BIRCH - Testing datasets

2015-12-06 Thread Dženan Softić
Thanks.

That makes sense. Actually, I am trying as well to make the threshold
dynamic. Still have to test my approach.

Best,

On Mon, Nov 30, 2015 at 9:15 PM, Manoj Kumar  wrote:

> Ah well, the value of the threshold set depends on your data.
>
> If your data is on the scale of 1e4 - 1e5, it is expected to provide a
> really high threshold, because the sample distances are on the same scale.
>
> We are trying to produce heuristics for an optimal "auto" threshold
> parameter here, (https://github.com/scikit-learn/scikit-learn/pull/5593)
> but it is under progress.
>
> On Mon, Nov 30, 2015 at 3:10 PM, Manoj Kumar <
> manojkumarsivaraj...@gmail.com> wrote:
>
>> Hi,
>>
>> Can you provide your script for testing?
>>
>> Thanks !
>>
>>
>>
>> On Mon, Nov 30, 2015 at 3:06 PM, Dženan Softić  wrote:
>>
>>> Hi,
>>>
>>> I am trying to test BIRCH with the original datasets found here:
>>> https://cs.joensuu.fi/sipu/datasets/
>>> (100K points, 100 clusters)
>>>
>>> The problem is setting the threshold. I need to set it above 10 000 to
>>> get decent results. That is very weird because on BIRCH example (
>>> http://scikit-learn.org/stable/auto_examples/cluster/plot_birch_vs_minibatchkmeans.html),
>>> similar dataset has been produced, and with threshold set to 0.0 - 2.0
>>> normal results could be obtained.
>>>
>>> I thought there was something wrong with the dataset itself, but then I
>>> found on BIRCH issues that it was actually used for testing during the
>>> development:(https://gist.github.com/MechCoder/16f121698ccd50568c2a)
>>>
>>> Am I doing something wrong here?
>>>
>>> Thanks,
>>> Dzeno
>>>
>>>
>>> --
>>> Go from Idea to Many App Stores Faster with Intel(R) XDK
>>> Give your users amazing mobile app experiences with Intel(R) XDK.
>>> Use one codebase in this all-in-one HTML5 development environment.
>>> Design, debug & build mobile apps & 2D/3D high-impact games for multiple
>>> OSs.
>>> http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
>>> ___
>>> Scikit-learn-general mailing list
>>> Scikit-learn-general@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Manoj Kumar,
>> http://manojbits.wordpress.com
>> 
>> http://github.com/MechCoder
>>
>
>
>
> --
> Godspeed,
> Manoj Kumar,
> http://manojbits.wordpress.com
> 
> http://github.com/MechCoder
>
>
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple
> OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Import error for Robust scaler

2015-12-06 Thread Andy
On 12/02/2015 05:19 AM, Sumedh Arani wrote:
>
> Greetings!!
>
> Yet still the problem still arises and it still shows import error for 
> RobustScaler
>
> And I also am reading the documentation of 0.16 for reference. When I 
> tried to run one of the examples given in examples folder and it has 
> this file named plot_robust_scaling.py which when ran resulted in an 
> import error.
>
The 0.16 examples don't contain RobustScaler:
http://scikit-learn.org/0.16/auto_examples/index.html
And the 0.16 api docs don't contain RobustScaler:
http://scikit-learn.org/0.16/modules/classes.html

If you want to use RobustScaler, you need to install 0.17 as said above.

--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


[Scikit-learn-general] Dynamic Time Warping Contribution

2015-12-06 Thread Dan Shiebler
Hello,

I’m not sure if this is the correct place to send this. If it is not, could
you please direct me to the best place? Thank you.

I’d like to add a dynamic time warping metric to
sklearn.neighbors.DistanceMetric.
Dynamic time warping is one of the most used distance metrics for time
series, and it would be very convenient for users if it were integrated
into the module.

Right now users can use an existing implementation of dynamic time warping
as a custom metric for any of the nearest neighbors classes. However, this
requires users to find a good existing implementation. In addition, users
cannot take advantage of the LB Keogh lower bound of dynamic time warping,
which can dramatically speed up the nearest neighbors search.

I propose that first a “dtw” metric be added to the DistanceMetric class.
After this integration is successful, I propose that the LB Keogh lower
bound optimization be added to the NearestNeighbors class.

Please let me know your thoughts on this, and I would happy to work on this
if it would improve the scikit-learn module.

Thank you,
Dan
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] speceficity metric for Crossvalidation

2015-12-06 Thread Andy

There is no specificity metric, but it is easy to implement.
What have you tried and what exactly was the error you got when using 
``make_scorer``?


You can either write a callable that takes "trained estimator, X_test, 
y_test" or you can write a function

that takes y_test, y_pred and call make_scorer on it.

On 11/26/2015 07:53 AM, H Schulz wrote:

Hello all,

i want to validat my classifiers with the

cross_validation.cross_val_score(clf,  X,y,  cv=5,  scoring='accuracy')
function. I found every metric i'll need, except for the speceficity. 
I found a "make_scorer" function, but this doesnt work due to some 
"shape" errors for the y-vector.(I also reshaped the vector, but this 
doesnt workd)
Is there a other method for validating the classifier with CV and the 
metric specificity?

I found ; precision, accuracy, roc_auc and sensitivity.
I'll gratefull for any help


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Import error for Robust scaler

2015-12-06 Thread Sumedh Arani
Greetings!!

My negligence!! Thanks for the reply!!:-)

Yours sincerely,
Sumedh Arani,
PES University.
On 7 Dec 2015 00:56, "Andy"  wrote:

> On 12/02/2015 05:19 AM, Sumedh Arani wrote:
> >
> > Greetings!!
> >
> > Yet still the problem still arises and it still shows import error for
> > RobustScaler
> >
> > And I also am reading the documentation of 0.16 for reference. When I
> > tried to run one of the examples given in examples folder and it has
> > this file named plot_robust_scaling.py which when ran resulted in an
> > import error.
> >
> The 0.16 examples don't contain RobustScaler:
> http://scikit-learn.org/0.16/auto_examples/index.html
> And the 0.16 api docs don't contain RobustScaler:
> http://scikit-learn.org/0.16/modules/classes.html
>
> If you want to use RobustScaler, you need to install 0.17 as said above.
>
>
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple
> OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Jeff Levesque: '.predict_proba()' me tho for smaller datasets

2015-12-06 Thread Gilles Louppe
Hi Jeff,

In general, most implementations of predict_proba are some proxy the
conditional probability p(y|x). Some of them really are modelling this
quantity quite well (e.g., gaussian process) while for some others it
is closer to a heuristic than to the actual p(y|x) (e.g., with linear
models).

If for your application, it is important to have an accurate
estimation of p(y|x) I would recommend using an algorithm that
explicitly computes this quantity and/or would resort to calibration.

Hope this helps,
Gilles

On 26 November 2015 at 22:00, Jeff Levesque  wrote:
> Hey all,
>
> I have a specific question: how do I ensure that the '.predict_proba()' 
> method, associated with the classification sklearn, accurately provides 
> probability, that a provided value is one of the predefined class:
>
> https://github.com/jeff1evesque/machine-learning/issues/1924#issuecomment-159491052
>
> There seems to be a level of error present for small datasets.  Is this 
> normal?
>
>
> Jeffrey Levesque
> https://github.com/jeff1evesque/
> (603) 969-5363
>
> Sent from my iPhone
> --
> Go from Idea to Many App Stores Faster with Intel(R) XDK
> Give your users amazing mobile app experiences with Intel(R) XDK.
> Use one codebase in this all-in-one HTML5 development environment.
> Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
> http://pubads.g.doubleclick.net/gampad/clk?id=254741551=/4140
> ___
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Jeff Levesque: '.predict_proba()' me tho for smaller datasets

2015-12-06 Thread Andy
On 12/01/2015 11:28 PM, Jeff Levesque wrote:
> Is there a way to determine if the data used with the SVC class, used to 
> generate an SVM model, would generate a poor model, or confidence percentage 
> (or 'decision_function', if that's preferred)?
>
>
I don't understand the question.

--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140
___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] sklearn.cross_decomposition.PLSRegression: how to re-scale my prediction?

2015-12-06 Thread Andy

Hi Ola.
Can you please raise an issue on the issue tracker, preferably with code 
and data to reproduce?
If you can't share your data, see if you can reproduce the issue with 
synthetic data.


Best,
Andy

On 12/02/2015 10:34 AM, Ola Pawluczyk wrote:

Hello all,

I'm doing a PLS regression on spectral data [x.train] with respect to 
concentrations of a few substances [y.train], and find that 
centering/scaling my data gives me good stats.  However, my 
y_pred_train is always scaled, and I cannot figure out how to get it 
back to the unscaled space.


To make it a bit clearer:

x_train.shape

(40, 904)

y_train.shape

(40, 7)

x_test.shape

(10, 904)

y_test.shape

(10, 7)


pls2 = PLSRegression(copy=True, max_iter=500, n_components=7, 
scale=True, tol=1e-06)


pls2.fit(x_train, y_train)

y_pred_train = pls2.predict(x_train)


y_pred_train is always several orders of magnitude larger than 
y_train.  I tried scaling y_pred_train by pls2.y_mean_ and pls2.y_std_ 
but I cannot figure things out.  Any help would be greatly appreciated!



Thank you,


Ola



--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140


___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


Re: [Scikit-learn-general] Multi Label classification using OneVsRest Classifier

2015-12-06 Thread Startup Hire
Hi all,

Hope you are doing well.

I was able to successfully complete multi label classification using SGD
Classifier inside OneVsRest Classifier.


Something peculiar is happening:

When I am using the classifier to predict on new data, the prediction
probability is 1 for particular 2 columns while it is always zero for
everything else.

Though this is theoretically correct, this wasn't the case  before.

The input to classifier is a Sparse Matrix.

Only difference from previous implementation is use of Dict Ventorizer to
encode instead of One Hot encoding


Let me know in what are the ways this can be resolved. Should I make
any upstream changes?


Regards,

Sanant


On Wed, Dec 2, 2015 at 12:29 PM, Startup Hire 
wrote:

> Hi,
>
> I guess the error was due to the fact that I was using One hot encoding of
> a data frame which include Strings.
>
> Currently, I started using Dict Vectorizer to encode  both my categorical
> variables ( in integers) and categorical variables which are strings.
>
> It seems to be working fine.
>
> My Y is as follows
>
> import scipy.sparse as sps
> from sklearn.feature_extraction import DictVectorizer
>
> vec = DictVectorizer()
>
> # Convert Panda Data frame to Dict
> train_df   = df_modified[['locationid','dep_departtime',
> 'arr_arrivetime',
> 'arr_departtime',
> 'dep_arrivetime',
> 'departureairport_or_point',
> 'destinationairport_or_point',
> 'bookeddate',
> 'departuredate']]
>
> train_dict = train_df.T.to_dict().values()
>
> # Create Fit_Transform
>
> b=vec.fit(train_dict)
> a=b.transform(train_dict)
>
>
>
> I hope I am working in the right direction. Let me know your thoughts
>
> Regards,
> Sanant
>
>
>
>
> Subject: Re: [Scikit-learn-general] Multi Label classification using
> OneVsRest Classifier
> To: scikit-learn-general@lists.sourceforge.net
> Message-ID: <565e7223.3090...@gmail.com>
> Content-Type: text/plain; charset="windows-1252"
>
> Please provide the full traceback.
> What is the type of y here, and what are its entries?
>
>
> On 11/30/2015 07:45 PM, Startup Hire wrote:
> > Hi Pypers,
> >
> > Hope you are doing well.
> >
> > I am doing multi label classification in which my X and Y are sparse
> > matrices with Y properly binarized.
> >
> > Though my Y has multi-labels properly binarized, I am getting the
> > following error:
> >
> > Value Error: Multioutput target data is not supported with label
> binarization
> >
> >
> > The Classifier I am using is as follows:
> >
> > Classifier =
> > OneVsRestClassifier(SGDClassifier(random_state=0,
> loss='log',alpha=0.1,penalty='elasticnet')).fit(Finaldata,y)
> >
> > Let me know in what are the ways this can be resolved. Should I make
> > any upstream changes?
> > Regards,
> > Sanant
>
--
Go from Idea to Many App Stores Faster with Intel(R) XDK
Give your users amazing mobile app experiences with Intel(R) XDK.
Use one codebase in this all-in-one HTML5 development environment.
Design, debug & build mobile apps & 2D/3D high-impact games for multiple OSs.
http://pubads.g.doubleclick.net/gampad/clk?id=254741911=/4140___
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general