Hi everyone First time posting to the list. I am quite new on developing with scikit-learn and from the documentation it seems like the right module for us to analyze our data. I might have a very simple question, maybe based on my lack of experience with numpy, and it might not be completely related with scikit learn. We have a quite large file that we want to run some analysis on. The file look like this (with more columns and rows)
Phenotype Sample# Spots - Spot Threshold Compactness 50% SER-Spot - Mean per Well Spots - Spot Axial Len Drp1 1 1.99833747092 0.281688858081 100.117830248 0.147604050353 1.78982437209 0.399409918413 Drp1 2 1.97424267297 0.264998073022 107.198146362 0.146706013852 1.78831244082 0.388041138106 Drp1 3 1.98316867489 0.25965267354 91.0449868081 0.149716780149 1.7810567071 0.381995225259 Drp1 4 1.97584425304 0.268130140138 92.3474907176 0.148656816321 1.78800597873 0.369574449618 Drp1 41 1.91457014241 0.251069883932 103.0949514 0.147634342712 1.75424087726 0.391473420716 0. Drp1 42 1.93245846682 0.2526078789 100.088149597 0.147115751667 1.76189001016 0.386142583259 NegCtrl 43 1.97502923055 0.252300534919 109.71335396 0.126942535041 1.86754950025 0.34520675119 NegCtrl 44 1.94486591029 0.241857838192 116.021885973 0.126606286994 1.86573201019 0.32008954027 NegCtrl 45 1.97536753582 0.250341610391 115.529211267 0.128220539644 1.87312348462 0.313735873529 NegCtrl 46 1.96729727595 0.248706740137 113.099560031 0.126860402492 1.87935900983 0.31268657512 NegCtrl 47 1.96715230854 0.250441972713 116.117905066 0.127405872829 1.87358350323 0.317959591222 NegCtrl 48 1.98540587615 0.259804089622 111.072286475 0.127788656033 1.86490413633 0.355506113683 NegCtrl 49 1.95986181972 0.247510314134 116.172522829 0.128591691834 1.86579218026 0.333907111824 NegCtrl 50 1.99267766563 0.258771125617 114.191838173 0.127972359662 1.87990793259 0.337497512941 NegCtrl 51 1.90929902172 0.22761604482 118.350113814 0.125654159901 1.84688625036 0.334283734292 (…) We want to use every row not named NegCtrl as our data and rows name NegCtrl as our leaning set. I am having trouble transforming data on the flat file into a numpy array. My first attempts were to create a list of lists based on the row names and then just use data_array = np.array(data) to create the bumpy array. But when I do this I get a ValueError that X and y have incompatible shapes. My main question is, what is the best way to transform a flat file into a numpy array that is suitable for scikit-learn? Any help is appreciated. Thanks Paulo ------------------------------------------------------------------------------ AlienVault Unified Security Management (USM) platform delivers complete security visibility with the essential security capabilities. Easily and efficiently configure, manage, and operate all of your security controls from a single console and one unified framework. Download a free trial. http://p.sf.net/sfu/alienvault_d2d _______________________________________________ Scikit-learn-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
