Hi everyone

First time posting to the list. I am quite new on developing with scikit-learn 
and from the documentation it seems like the right module for us to analyze our 
data. I might have a very simple question, maybe based on my lack of experience 
with numpy, and it might not be completely related with scikit learn. We have a 
quite large file that we want to run some analysis on. The file look like this 
(with more columns and rows)

Phenotype       Sample# Spots - Spot Threshold Compactness 50% SER-Spot - Mean 
per Well Spots - Spot Axial Len
Drp1    1       1.99833747092   0.281688858081  100.117830248   0.147604050353  
1.78982437209   0.399409918413
Drp1    2       1.97424267297   0.264998073022  107.198146362   0.146706013852  
1.78831244082   0.388041138106
Drp1    3       1.98316867489   0.25965267354   91.0449868081   0.149716780149  
1.7810567071    0.381995225259
Drp1    4       1.97584425304   0.268130140138  92.3474907176   0.148656816321  
1.78800597873   0.369574449618
Drp1    41      1.91457014241   0.251069883932  103.0949514     0.147634342712  
1.75424087726   0.391473420716  0.
Drp1    42      1.93245846682   0.2526078789    100.088149597   0.147115751667  
1.76189001016   0.386142583259
NegCtrl 43      1.97502923055   0.252300534919  109.71335396    0.126942535041  
1.86754950025   0.34520675119
NegCtrl 44      1.94486591029   0.241857838192  116.021885973   0.126606286994  
1.86573201019   0.32008954027
NegCtrl 45      1.97536753582   0.250341610391  115.529211267   0.128220539644  
1.87312348462   0.313735873529
NegCtrl 46      1.96729727595   0.248706740137  113.099560031   0.126860402492  
1.87935900983   0.31268657512
NegCtrl 47      1.96715230854   0.250441972713  116.117905066   0.127405872829  
1.87358350323   0.317959591222
NegCtrl 48      1.98540587615   0.259804089622  111.072286475   0.127788656033  
1.86490413633   0.355506113683
NegCtrl 49      1.95986181972   0.247510314134  116.172522829   0.128591691834  
1.86579218026   0.333907111824
NegCtrl 50      1.99267766563   0.258771125617  114.191838173   0.127972359662  
1.87990793259   0.337497512941
NegCtrl 51      1.90929902172   0.22761604482   118.350113814   0.125654159901  
1.84688625036   0.334283734292
(…)

We want to use every row not named NegCtrl as our data and rows name NegCtrl as 
our leaning set. I am having trouble transforming data on the flat file into a 
numpy array. My first attempts were to create a list of lists based on the row 
names and then just use

data_array = np.array(data)

to create the bumpy array. But when I do this I get a ValueError that X and y 
have incompatible shapes.

My main question is, what is the best way to transform a flat file into a numpy 
array that is suitable for scikit-learn? 

Any help is appreciated.

Thanks
Paulo
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to