On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverd...@gmail.com> wrote:
Dear developers of Scikit,
I am working on a scientific paper on a predictionmodel predicting
complications in major abdominal resections. I have been using scikit to create
that model and got good results (score of 0.94). This makes us want to see what
the model is like that is made by scikit.
As for now we got 100 input variables but logically these arent all as usefull
as the others and we want to reduce this number to about 20 and see what the
effects on the score are.
My question: Is there a way to get the underlying formula for the model out of
scikit instead of having it as a 'blackbox' in my svm function.
At this moment i am predicting a dichtomous variable with 100 variables,
(continuous, ordinal and binair).
My code:
import numpy as
np
from numpy import *
import pandas as
pd
from sklearn import tree, svm, linear_model, metrics,
preprocessing
import
datetime
from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit,
GridSearchCV
from time import gmtime,
strftime
#database openen en voorbereiden
file
= "/home/wouter/scikit/DB_SCIKIT.csv"
DB
= pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
DBT
=
DB
print "Vorm van de DB: ", DB.
shape
target
= []
for i in range(len(DB[:,-1])):
target
.append(DB[i,-1])
DB
= delete(DB,s_[-1],1) #Laatste kolom verwijderen
AantalOutcome = target.count(1)
print "Aantal outcome:", AantalOutcome
print "Aantal patienten:", len(target)
A
=
DB
b
=
target
print len(DBT)
svc
=svm.SVC(kernel='linear', cache_size=500, probability=True)
indices
= np.random.permutation(len(DBT))
rs
= ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
scores
= cross_val_score(svc, A, b, cv=rs)
A
= ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
print
A
X_train
= DBT[indices[:-302]]
y_train
= []
for i in range(len(X_train[:,-1])):
y_train
.append(X_train[i,-1])
X_train
= delete(X_train,s_[-1],1) #Laatste kolom verwijderen
X_test
= DBT[indices[-302:]]
y_test
= []
for i in range(len(X_test[:,-1])):
y_test
.append(X_test[i,-1])
X_test
= delete(X_test,s_[-1],1) #Laatste kolom verwijderen
model
= svc.fit(X_train,y_train)
print
model
uitkomst
= model.score(X_test, y_test)
print
uitkomst
voorspel
= model.predict(X_test)
print voorspel
And output:
Vorm van de DB: (2011, 101)
Aantal outcome: 128
Aantal patienten: 2011
2011
Accuracy: 0.94 (+/- 0.01)
SVC
(C=1.0, cache_size=500, class_weight=None, coef0=0.0,
decision_function_shape
='ovr', degree=3, gamma='auto', kernel='linear',
max_iter
=-1, probability=True, random_state=None, shrinking=True,
tol
=0.001, verbose=False)
0.927152317881
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
Thanks in advance!
with kind regards,
Wouter Verduin
_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn