Dear Wouter, for the SVM, scikit-learn wraps the LIBSVM and LIBLINEAR. I think the scikit-learn class SVC uses LIBSVM for every kernel. Since you are using the linear kernel, you could use the more efficient LinearSVC scikit-learn class to get similar results. I guess this in turn is easier to handle in terms of
> Is there a way to get the underlying formula for the model out of scikit > instead of having it as a 'blackbox' in my svm function. More specifically, LinearSVC uses the _fit_liblinear code available here: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/base.py And more info on the LIBLINEAR library it is using can be found here: https://www.csie.ntu.edu.tw/~cjlin/liblinear/ (they have links to technical reports and implementation details there) Best, Sebastian > On May 4, 2018, at 5:12 AM, Wouter Verduin <wouterverd...@gmail.com> wrote: > > Dear developers of Scikit, > > I am working on a scientific paper on a predictionmodel predicting > complications in major abdominal resections. I have been using scikit to > create that model and got good results (score of 0.94). This makes us want to > see what the model is like that is made by scikit. > > As for now we got 100 input variables but logically these arent all as > usefull as the others and we want to reduce this number to about 20 and see > what the effects on the score are. > > My question: Is there a way to get the underlying formula for the model out > of scikit instead of having it as a 'blackbox' in my svm function. > > At this moment i am predicting a dichtomous variable with 100 variables, > (continuous, ordinal and binair). > > My code: > > import numpy as > np > > from numpy import * > import pandas as > pd > > from sklearn import tree, svm, linear_model, metrics, > preprocessing > > import > datetime > > from sklearn.model_selection import KFold, cross_val_score, ShuffleSplit, > GridSearchCV > from time import gmtime, > strftime > > > #database openen en voorbereiden > > file > = "/home/wouter/scikit/DB_SCIKIT.csv" > > DB > = pd.read_csv(file, sep=";", header=0, decimal= ',').as_matrix() > > DBT > = > DB > > print "Vorm van de DB: ", DB. > shape > target > = [] > for i in range(len(DB[:,-1])): > > target > .append(DB[i,-1]) > > DB > = delete(DB,s_[-1],1) #Laatste kolom verwijderen > AantalOutcome = target.count(1) > print "Aantal outcome:", AantalOutcome > print "Aantal patienten:", len(target) > > > A > = > DB > b > = > target > > > print len(DBT) > > > svc > =svm.SVC(kernel='linear', cache_size=500, probability=True) > > indices > = np.random.permutation(len(DBT)) > > > rs > = ShuffleSplit(n_splits=5, test_size=.15, random_state=None) > > scores > = cross_val_score(svc, A, b, cv=rs) > > A > = ("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2)) > print > A > > X_train > = DBT[indices[:-302]] > > y_train > = [] > for i in range(len(X_train[:,-1])): > > y_train > .append(X_train[i,-1]) > > X_train > = delete(X_train,s_[-1],1) #Laatste kolom verwijderen > > > X_test > = DBT[indices[-302:]] > > y_test > = [] > for i in range(len(X_test[:,-1])): > > y_test > .append(X_test[i,-1]) > > X_test > = delete(X_test,s_[-1],1) #Laatste kolom verwijderen > > > model > = svc.fit(X_train,y_train) > print > model > > uitkomst > = model.score(X_test, y_test) > print > uitkomst > > voorspel > = model.predict(X_test) > print voorspel > And output: > > Vorm van de DB: (2011, 101) > Aantal outcome: 128 > Aantal patienten: 2011 > 2011 > Accuracy: 0.94 (+/- 0.01) > > SVC > (C=1.0, cache_size=500, class_weight=None, coef0=0.0, > > decision_function_shape > ='ovr', degree=3, gamma='auto', kernel='linear', > > max_iter > =-1, probability=True, random_state=None, shrinking=True, > > tol > =0.001, verbose=False) > 0.927152317881 > [0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. > > > 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.] > Thanks in advance! > > with kind regards, > > Wouter Verduin > > _______________________________________________ > scikit-learn mailing list > scikit-learn@python.org > https://mail.python.org/mailman/listinfo/scikit-learn _______________________________________________ scikit-learn mailing list scikit-learn@python.org https://mail.python.org/mailman/listinfo/scikit-learn