Re: [scikit-learn] Retracting model from the 'blackbox' SVM (Sebastian Raschka)

David Burns Fri, 04 May 2018 09:49:28 -0700

Hi Sebastian,

If you are looking to reduce the feature space for your model, I suggestyou look at the scikit-learn page on doing just that


http://scikit-learn.org/stable/modules/feature_selection.html

David


On 2018-05-04 12:00 PM, scikit-learn-requ...@python.org wrote:

Send scikit-learn mailing list submissions to
        scikit-learn@python.org

To subscribe or unsubscribe via the World Wide Web, visit
        https://mail.python.org/mailman/listinfo/scikit-learn
or, via email, send a message with subject or body 'help' to
        scikit-learn-requ...@python.org

You can reach the person managing the list at
        scikit-learn-ow...@python.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of scikit-learn digest..."


Today's Topics:

    1. Re: Retracting model from the 'blackbox' SVM (Sebastian Raschka)


----------------------------------------------------------------------

Message: 1
Date: Fri, 4 May 2018 05:51:26 -0400
From: Sebastian Raschka <m...@sebastianraschka.com>
To: Scikit-learn mailing list <scikit-learn@python.org>
Subject: Re: [scikit-learn] Retracting model from the 'blackbox' SVM
Message-ID:
        <5331a676-d6c6-4f01-8a4d-edde9318e...@sebastianraschka.com>
Content-Type: text/plain;       charset=us-ascii

Dear Wouter,

for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the 
scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the 
linear kernel, you could use the more efficient LinearSVC scikit-learn class to 
get similar results. I guess this in turn is easier to handle in terms of

  Is there a way to get the underlying formula for the model out of scikit 
instead of having it as a 'blackbox' in my svm function.

More specifically, LinearSVC uses the _fit_liblinear code available here: 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py

And more info on the LIBLINEAR library it is using can be found here: 
https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical 
reports and implementation details there)

Best,
Sebastian

On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverd...@gmail.com> wrote:

Dear developers of Scikit,

I am working on a scientific paper on a predictionmodel predicting 
complications in major abdominal resections. I have been using scikit to create 
that model and got good results (score of 0.94). This makes us want to see what 
the model is like that is made by scikit.

As for now we got 100 input variables but logically these arent all as usefull 
as the others and we want to reduce this number to about 20 and see what the 
effects on the score are.

My question: Is there a way to get the underlying formula for the model out of 
scikit instead of having it as a 'blackbox' in my svm function.

At this moment i am predicting a dichtomous variable with 100 variables, 
(continuous, ordinal and binair).

My code:

import numpy as
  np

from numpy import *
import pandas as
  pd

from sklearn import tree, svm, linear_model, metrics,
  preprocessing

import
  datetime

from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, 
GridSearchCV
from time import gmtime,
  strftime


#database openen en voorbereiden

file
= "/home/wouter/scikit/DB_SCIKIT.csv"

DB
= pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()

DBT
=
  DB

print "Vorm van de DB: ", DB.
shape
target
= []
for i in range(len(DB[:,-1])):

         target
.append(DB[i,-1])

DB
= delete(DB,s_[-1],1) #Laatste kolom verwijderen
AantalOutcome = target.count(1)
print "Aantal outcome:", AantalOutcome
print "Aantal patienten:", len(target)


A
=
  DB
b
=
  target


print len(DBT)


svc
=svm.SVC(kernel='linear', cache_size=500, probability=True)

indices
= np.random.permutation(len(DBT))


rs
= ShuffleSplit(n_splits=5, test_size=.15, random_state=None)

scores
= cross_val_score(svc, A, b, cv=rs)

A
= ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print
  A

X_train
= DBT[indices[:-302]]

y_train
= []
for i in range(len(X_train[:,-1])):

         y_train
.append(X_train[i,-1])

X_train
= delete(X_train,s_[-1],1) #Laatste kolom verwijderen


X_test
= DBT[indices[-302:]]

y_test
= []
for i in range(len(X_test[:,-1])):

         y_test
.append(X_test[i,-1])

X_test
= delete(X_test,s_[-1],1) #Laatste kolom verwijderen


model
= svc.fit(X_train,y_train)
print
  model

uitkomst
= model.score(X_test, y_test)
print
  uitkomst

voorspel
= model.predict(X_test)
print voorspel
And output:

Vorm van de DB:  (2011, 101)
Aantal outcome: 128
Aantal patienten: 2011
2011
Accuracy: 0.94 (+/- 0.01)

SVC
(C=1.0, cache_size=500, class_weight=None, coef0=0.0,

   decision_function_shape
='ovr', degree=3, gamma='auto', kernel='linear',

   max_iter
=-1, probability=True, random_state=None, shrinking=True,

   tol
=0.001, verbose=False)
0.927152317881
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.

0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Thanks in advance!

with kind regards,

Wouter Verduin

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn



------------------------------

Subject: Digest Footer

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn


------------------------------

End of scikit-learn Digest, Vol 26, Issue 5
*******************************************


_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Retracting model from the 'blackbox' SVM (Sebastian Raschka)

Reply via email to