Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

Morgan Hoffman Tue, 06 Jan 2015 10:23:10 -0800

Hi Andy,
Thanks for your help. Is there something in the scikit-learn documentation (or 
any other resource) that explains why the kernel matrix at test time needs to 
be the kernel between the test data and the training data? I am quite new to 
machine learning. What is the reason as to why we do this and how do we obtain 
a kernel matrix between the test and the training data?
I applied the MinMaxScaler to the gram matrix to scale the values in my matrix. 
Right now I get entries in the gram matrix that range from 0.7 to 1 and I want 
to scale this range of values from 0 to 1, so that a 0.7 is really a 0.
Thanks!


Date: Tue, 6 Jan 2015 12:45:06 -0500
From: [email protected]
To: [email protected]
Subject: Re: [Scikit-learn-general] Cross validation with a pre-computed        
kernel


  
    
  
  
    The kernel matrix at test time needs to be the kernel between the
    test data and the training data.

    Which I guess is not what get_gram_matrix
      does.

      

      Why are you applying the MinMaxScaler to the gram matrix? I'm not
      sure that makes sense...

      Without the scaler you could just do

      

      print(cross_val_score(SVC(kernel=precomputed), get_gram_matrix(X),
      Y))

      

      with the MinMaxScaler you can do

      

    pipe = make_pipeline(MinMaxScaler(),
    SVC(kernel='precomputed'))

    print(cross_val_score(pipe,
        get_gram_matrix(X), Y))

    

    which is a bit shorter than your code and resolves the need to worry
    about the gram matrix ;)

    

    

    

    On 01/06/2015 12:27 PM, Morgan Hoffman
      wrote:

    
    
      
      
        Hi,
        

        
        I am trying to do a k-fold cross validation with a
          precomputed kernel. However, I end up with an error message
          that looks like this:
        

        
        Traceback (most recent call last):
          File "kfold_simple_data.py", line 64, in <module>
            score = clf.score(test_gram_matrix, test_labels)
          File "/usr/local/lib/python2.7/
        dist-packages/sklearn/base.py", line 291, in score
            return accuracy_score(y, self.predict(X),
          sample_weight=sample_weight)
          File
          "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py",
          line 467, in predict
            y = super(BaseSVC, self).predict(X)
          File
          "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py",
          line 283, in predict
            X = self._validate_for_predict(X)
          File
          "/usr/local/lib/python2.7/dist-packages/sklearn/svm/base.py",
          line 401, in _validate_for_predict
            (X.shape[1], self.shape_fit_[0]))
        ValueError: X.shape[1] = 2 should be equal to 6, the number
          of samples at training time
        

        
        This is what my code looks like:
        

        
        def cross_validate(data, folds, is_scaled):
        

        
              X = data["values"]
              Y = data["labels"]
        

        
              kf = KFold(len(Y), folds,
          indices=False)
        

        
              scores = []
        

        
              for train, test in kf:
        

        
                    scaler =
            preprocessing.MinMaxScaler()
                    X_train, X_test, y_train, y_test =
            X[train], X[test], Y[train], Y[test]
        

        
                    training_data = OrderedDict()
                    for i in range(len(X_train)):
                          training_data[X_train[i]]
            = y_train[i]
        

        
                    train_gram_matrix =
            get_gram_matrix(training_data)
                    train_gram_matrix =
            scaler.fit_transform(train_gram_matrix)
                    train_labels =
            get_label_array(training_data)
        

        
                    test_data = OrderedDict()
                    for i in range(len(X_test)):
                          test_data[X_test[i]]
            = y_test[i]
        

        
                    test_gram_matrix =
            get_gram_matrix(test_data)
                    test_gram_matrix =
            scaler.transform(test_gram_matrix)
                    test_labels =
            get_label_array(test_data)
        

        
                    clf = svm.SVC(kernel='precomputed')
                    clf.fit(train_gram_matrix,
            train_labels)
        

        
                    print "Score:"
                    score = clf.score(test_gram_matrix,
            test_labels)
                    scores.append(score)
                    print score
        

        
        

        
        Does anyone have an idea of what I may be doing wrong? Any
          help is appreciated.
        

        
        Thanks!
      
      

      
      

      
------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
      

      
      

      _______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

    
    

  


------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Dive into the World of Parallel Programming! The Go Parallel Website,
sponsored by Intel and developed in partnership with Slashdot Media, is your
hub for all things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials and more. Take a
look and join the conversation now. http://goparallel.sourceforge.net

_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] Cross validation with a pre-computed kernel

Reply via email to