On Wed, Apr 24, 2013 at 5:03 PM, John Richey <[email protected]> wrote:

> Hello,
>
> I am having difficulty with a cross validation problem, and any help would
> be much appreciated.
>
> I have a large number of research subjects from 15 different data
> collection sites. I want to assess whether "site" has any influence on the
> data.
>
> It occurred to me that one way to do this would be to perform a
> cross-validation, via stratified k folds (stratified, because some sites
> have a larger number of subjects than others).  Unless I am mistaken, the
> results of this analysis should reveal whether "site" has an influence on
> the data.  However, I am running into a problem because my training set is
> a different shape than the test data, which causes the analysis to fail.
>
> My data structure is pretty simple.
>
> X is a 3 by 1000 matrix of datapoints (that is, 3 datapoints per subject)
> y is a 1 by 1000 matrix indicating the site (expressed as an integer
> ranging between 1 and 15).
>
>
hi John,

on top of what Lars mentioned, I think your data is transposed, the
convention would be to have X.shape = (1000, 3), y.shape = (1000,)

Fabian.
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service 
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to