Re: [scikit-learn] Retracting model from the 'blackbox' SVM

Sebastian Raschka Fri, 04 May 2018 03:18:13 -0700

Dear Wouter,

for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the 
scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the 
linear kernel, you could use the more efficient LinearSVC scikit-learn class to 
get similar results. I guess this in turn is easier to handle in terms of


>  Is there a way to get the underlying formula for the model out of scikit 
> instead of having it as a 'blackbox' in my svm function.

More specifically, LinearSVC uses the _fit_liblinear code available here: 
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py

And more info on the LIBLINEAR library it is using can be found here: 
https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical 
reports and implementation details there)

Best,
Sebastian

> On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverd...@gmail.com> wrote:
> 
> Dear developers of Scikit,
> 
> I am working on a scientific paper on a predictionmodel predicting 
> complications in major abdominal resections. I have been using scikit to 
> create that model and got good results (score of 0.94). This makes us want to 
> see what the model is like that is made by scikit.
> 
> As for now we got 100 input variables but logically these arent all as 
> usefull as the others and we want to reduce this number to about 20 and see 
> what the effects on the score are.
> 
> My question: Is there a way to get the underlying formula for the model out 
> of scikit instead of having it as a 'blackbox' in my svm function.
> 
> At this moment i am predicting a dichtomous variable with 100 variables, 
> (continuous, ordinal and binair).
> 
> My code:
> 
> import numpy as
>  np
> 
> from numpy import *
> import pandas as
>  pd
> 
> from sklearn import tree, svm, linear_model, metrics,
>  preprocessing
> 
> import
>  datetime
> 
> from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, 
> GridSearchCV
> from time import gmtime,
>  strftime
> 
> 
> #database openen en voorbereiden
> 
> file 
> = "/home/wouter/scikit/DB_SCIKIT.csv"
> 
> DB 
> = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix()
> 
> DBT 
> =
>  DB
> 
> print "Vorm van de DB: ", DB.
> shape
> target 
> = []
> for i in range(len(DB[:,-1])):
> 
>         target
> .append(DB[i,-1])
> 
> DB 
> = delete(DB,s_[-1],1) #Laatste kolom verwijderen
> AantalOutcome = target.count(1)
> print "Aantal outcome:", AantalOutcome
> print "Aantal patienten:", len(target)
> 
> 
> A 
> =
>  DB
> b 
> =
>  target
> 
> 
> print len(DBT)
> 
> 
> svc
> =svm.SVC(kernel='linear', cache_size=500, probability=True)
> 
> indices 
> = np.random.permutation(len(DBT))
> 
> 
> rs 
> = ShuffleSplit(n_splits=5, test_size=.15, random_state=None)
> 
> scores 
> = cross_val_score(svc, A, b, cv=rs)
> 
> A 
> = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))
> print
>  A
> 
> X_train 
> = DBT[indices[:-302]]
> 
> y_train 
> = []
> for i in range(len(X_train[:,-1])):
> 
>         y_train
> .append(X_train[i,-1])
> 
> X_train 
> = delete(X_train,s_[-1],1) #Laatste kolom verwijderen
> 
> 
> X_test 
> = DBT[indices[-302:]]
> 
> y_test 
> = []
> for i in range(len(X_test[:,-1])):
> 
>         y_test
> .append(X_test[i,-1])
> 
> X_test 
> = delete(X_test,s_[-1],1) #Laatste kolom verwijderen
> 
> 
> model 
> = svc.fit(X_train,y_train)
> print
>  model
> 
> uitkomst 
> = model.score(X_test, y_test)
> print
>  uitkomst
> 
> voorspel 
> = model.predict(X_test)
> print voorspel
> And output:
> 
> Vorm van de DB:  (2011, 101)
> Aantal outcome: 128
> Aantal patienten: 2011
> 2011
> Accuracy: 0.94 (+/- 0.01)
> 
> SVC
> (C=1.0, cache_size=500, class_weight=None, coef0=0.0,
> 
>   decision_function_shape
> ='ovr', degree=3, gamma='auto', kernel='linear',
> 
>   max_iter
> =-1, probability=True, random_state=None, shrinking=True,
> 
>   tol
> =0.001, verbose=False)
> 0.927152317881
> [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
> 
>  
> 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
> Thanks in advance!
> 
> with kind regards,
> 
> Wouter Verduin
> 
> _______________________________________________
> scikit-learn mailing list
> scikit-learn@python.org
> https://mail.python.org/mailman/listinfo/scikit-learn

_______________________________________________
scikit-learn mailing list
scikit-learn@python.org
https://mail.python.org/mailman/listinfo/scikit-learn

Re: [scikit-learn] Retracting model from the 'blackbox' SVM

Reply via email to