Re: [scikit-learn] Retracting model from the 'blackbox' SVM (Sebastian Raschka)

2018-05-04 Thread David Burns

Hi Sebastian,

If you are looking to reduce the feature space for your model, I suggest 
you look at the scikit-learn page on doing just that


http://scikit-learn.org/stable/modules/feature_selection.html

David


On 2018-05-04 12:00 PM, scikit-learn-requ...@python.org wrote:

Send scikit-learn mailing list submissions to
scikit-learn@python.org

To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
scikit-learn-requ...@python.org

You can reach the person managing the list at
scikit-learn-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

1. Re: Retracting model from the 'blackbox' SVM (Sebastian Raschka)


--

Message: 1
Date: Fri, 4 May 2018 05:51:26 -0400
From: Sebastian Raschka 
To: Scikit-learn mailing list 
Subject: Re: [scikit-learn] Retracting model from the 'blackbox' SVM
Message-ID:
<5331a676-d6c6-4f01-8a4d-edde9318e...@sebastianraschka.com>
Content-Type: text/plain;   charset=us-ascii

Dear Wouter,

for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the 
scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the 
linear kernel, you could use the more efficient LinearSVC scikit-learn class to 
get similar results. I guess this in turn is easier to handle in terms of


  Is there a way to get the underlying formula for the model out of scikit 
instead of having it as a 'blackbox' in my svm function.

More specifically, LinearSVC uses the _fit_liblinear code available here: 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py

And more info on the LIBLINEAR library it is using can be found here: 
https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical 
reports and implementation details there)

Best,
Sebastian


On May 4, 2018, at 5:12 AM, Wouter Verduin  wrote:

Dear developers of Scikit,

I am working on a scientific paper on a predictionmodel predicting 
complications in major abdominal resections. I have been using scikit to create 
that model and got good results (score of 0.94). This makes us want to see what 
the model is like that is made by scikit.

As for now we got 100 input variables but logically these arent all as usefull 
as the others and we want to reduce this number to about 20 and see what the 
effects on the score are.

My question: Is there a way to get the underlying formula for the model out of 
scikit instead of having it as a 'blackbox' in my svm function.

At this moment i am predicting a dichtomous variable with 100 variables, 
(continuous, ordinal and binair).

My code:

import numpy as
  np

from numpy import *
import pandas as
  pd

from sklearn import tree, svm, linear_model, metrics,
  preprocessing

import
  datetime

from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, 
GridSearchCV
from time import gmtime,
  strftime


#database openen en voorbereiden

file
= "/home/wouter/scikit/DB_SCIKIT.csv"

DB
= pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()

DBT
=
  DB

print "Vorm van de DB: ", DB.
shape
target
= []
for i in range(len(DB[:,-1])):

 target
.append(DB[i,-1])

DB
= delete(DB,s_[-1],1) #Laatste kolom verwijderen
AantalOutcome = target.count(1)
print "Aantal outcome:", AantalOutcome
print "Aantal patienten:", len(target)


A
=
  DB
b
=
  target


print len(DBT)


svc
=svm.SVC(kernel='linear', cache_size=500, probability=True)

indices
= np.random.permutation(len(DBT))


rs
= ShuffleSplit(n_splits=5, test_size=.15, random_state=None)

scores
= cross_val_score(svc, A, b, cv=rs)

A
= ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print
  A

X_train
= DBT[indices[:-302]]

y_train
= []
for i in range(len(X_train[:,-1])):

 y_train
.append(X_train[i,-1])

X_train
= delete(X_train,s_[-1],1) #Laatste kolom verwijderen


X_test
= DBT[indices[-302:]]

y_test
= []
for i in range(len(X_test[:,-1])):

 y_test
.append(X_test[i,-1])

X_test
= delete(X_test,s_[-1],1) #Laatste kolom verwijderen


model
= svc.fit(X_train,y_train)
print
  model

uitkomst
= model.score(X_test, y_test)
print
  uitkomst

voorspel
= model.predict(X_test)
print voorspel
And output:

Vorm van de DB:  (2011, 101)
Aantal outcome: 128
Aantal patienten: 2011
2011
Accuracy: 0.94 (+/- 0.01)

SVC
(C=1.0, cache_size=500, class_weight=None, coef0=0.0,

   decision_function_shape
='ovr', degree=3, gamma='auto', kernel='linear',

   max_iter
=-1, probability=True, random_state=None, shrinking=True,

   tol
=0.001, verbose=False)
0.927152317881
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

  
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 

[scikit-learn] pipeline for modifying target and number of samples

2018-08-01 Thread David Burns

Hi,

I posted a while back about this, and am reposting now since I have made 
progress on this topic. As you are probably aware, the sklearn Pipeline 
only supports transformers for X, and the number of samples must stay 
the same.


I work with time series where the learning pipeline relies on 
transformations like resampling, segmentation, etc that change the 
target and number of samples in the data set. In order to address this, 
I created an sklearn compatible pipeline that handles transformers that 
alter X, y, and sample_weight together. It can undergo model selection 
using the sklearn tools, and integrates with all the sklearn 
transformers and estimators. It also has some new options for setting 
hyper-parameters with callables and in reference to other parameters.


The implementation is in my time series package seglearn:

https://github.com/dmbee/seglearn

- Best

David Burns


___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


[scikit-learn] seglearn: package for time series and sequence learning

2018-03-13 Thread David Burns
I implemented a meta-estimator and transformers for time series / 
sequence learning with sliding window segmentation. It can be used for 
classification, regression, or forecasting - supporting multivariate 
time series / sequences and contextual (time-independent) data. It can 
learn time series or contextual targets.


It is (mostly) compatible with the sklearn model evaluation and 
selection tools - despite changing the number of samples and the target 
vector mid pipeline (during segmentation).


I've created a pull request on related_projects.rst - but thought I 
would share it here for those of you interested in this area.


https://github.com/dmbee/seglearn

Cheers,

David Burns

___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] New Transformer (Guillaume Lema?tre)

2018-02-28 Thread David Burns

Thanks everyone for your suggested.

I will have a look at PipeGraph - which might be a suitable option for 
us as Guillaume suggested.


If it works out, I will share it

Thanks

David


On 02/28/2018 08:29 AM, scikit-learn-requ...@python.org wrote:

Send scikit-learn mailing list submissions to
scikit-learn@python.org

To subscribe or unsubscribe via the World Wide Web, visit
https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
scikit-learn-requ...@python.org

You can reach the person managing the list at
scikit-learn-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

1. New Transformer (David Burns)
2. Re: New Transformer (Guillaume Lema?tre)
3. Re: New Transformer (Manuel Castej?n Limas)


--

Message: 1
Date: Tue, 27 Feb 2018 12:02:27 -0500
From: David Burns <david.mo.bu...@gmail.com>
To: scikit-learn@python.org
Subject: [scikit-learn] New Transformer
Message-ID: <726f2e70-63eb-783f-b470-5ea45af93...@gmail.com>
Content-Type: text/plain; charset="utf-8"; Format="flowed"

First post on this mailing list.

I have been working with time series data for a project, and thought I
could contribute a new transformer to segment time series data using a
sliding window, with variable overlap. I have attached demonstration of
how this would fit in the existing framework. The only challenge for me
here is that the transformer needs to transform both the X and y
variable in order to perform the segmentation. I am not sure from the
documentation how to implement this in the framework.

Overlapping segments is a great way to boost performance for time series
classifiers, so this may be a worthwhile contribution for some in this
area of ML. Ultimately, model_selection.TimeSeries.Split would need to
be modified to support overlapping segments, or a new class created to
enable validation for this.

Please let me know if this would be a worthwhile contribution, and if so
how to go about transforming the target vector y in the framework /
pipeline?

Thanks!

David Burns



-- next part --
A non-text attachment was scrubbed...
Name: TimeSeriesSegment.py
Type: text/x-python
Size: 3336 bytes
Desc: not available
URL: 
<http://mail.python.org/pipermail/scikit-learn/attachments/20180227/143ced86/attachment-0001.py>

--

Message: 2
Date: Tue, 27 Feb 2018 19:42:52 +0100
From: Guillaume Lema?tre <g.lemaitr...@gmail.com>
To: Scikit-learn mailing list <scikit-learn@python.org>
Subject: Re: [scikit-learn] New Transformer
Message-ID:
<cacdxx9gy91jwt+xjfgtnub_5wvmv279dgums6autzffsnfe...@mail.gmail.com>
Content-Type: text/plain; charset="utf-8"

Transforming y is a big deal :)
You can refer to
https://github.com/scikit-learn/enhancement_proposals/pull/2
and the associated issues/PR to see what is going on. This is probably an
additional use case to think about when designing estimator which will be
modifying y.

Regarding the pipeline, I assume that your strategy would be to resample at
fit
and do nothing at predict, isn't it?

NB: you could actually implement this sampling in a FunctionSampler of
imblearn:
http://contrib.scikit-learn.org/imbalanced-learn/dev/generated/imblearn.FunctionSampler.html#imblearn.FunctionSampler
and then use the imblearn pipeline which would apply the transform at fit
time but not
at predict.

On 27 February 2018 at 18:02, David Burns <david.mo.bu...@gmail.com> wrote:


First post on this mailing list.

I have been working with time series data for a project, and thought I
could contribute a new transformer to segment time series data using a
sliding window, with variable overlap. I have attached demonstration of how
this would fit in the existing framework. The only challenge for me here is
that the transformer needs to transform both the X and y variable in order
to perform the segmentation. I am not sure from the documentation how to
implement this in the framework.

Overlapping segments is a great way to boost performance for time series
classifiers, so this may be a worthwhile contribution for some in this area
of ML. Ultimately, model_selection.TimeSeries.Split would need to be
modified to support overlapping segments, or a new class created to enable
validation for this.

Please let me know if this would be a worthwhile contribution, and if so
how to go about transforming the target vector y in the framework /
pipeline?

Thanks!

David Burns




___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn






___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


Re: [scikit-learn] Inclusion of an LSTM Classifier

2019-02-17 Thread David Burns
There is an sklearn wrapper for Keras models in the Keras library. That's
an easy way to use LSTM in sklearn. Also the sklearn estimator API is
pretty easy to figure out if you want to roll your own wrapper for any
model really.
___
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn