On Dec 22, 11:12 am, Luca <nioski...@yahoo.it> wrote: > Dear all, excuse me if i post a simple question.. I am trying to find > a software/algorythm that can "cluster" simple data on an excel sheet > > Example: > Variable a Variable b Variable c > Case 1 1 0 0 > Case 2 0 1 1 > Case 3 1 0 0 > Case 4 1 1 0 > Case 5 0 1 1 > > The systems recognizes that there are 3 possible clusters: > > the first with cases that has Variable a as true, > the second has Variables b and c > the third is "all the rest" > > Variabile a Variabile b Variabile c > > Case 1 1 0 0 > Case 3 1 0 0 > > Case 2 0 1 1 > Case 5 0 1 1 > > Case 4 1 1 0 > > Thank you in advance
If you haven't already, download and install xlrd from http://www.python-excel.org for a library than can read excel workbooks (but not 2007 yet). Or, export as CSV... Then using either the csv module/xlrd (both well documented) or any other way of reading the data, you effectively want to end up with something like this: rows = [ #A #B #C #D ['Case 1', 1, 0 ,0], ['Case 2', 0, 1, 1], ['Case 3', 1, 0, 0], ['Case 4', 1, 1, 0], ['Case 5', 0, 1, 1] ] One approach is to sort 'rows' by B,C & D. This will bring the identical elements adjacent to each other in the list. Then you need an iterator to group them... take a look at itertools.groupby. Another is to use a defaultdict(list) found in collections. And just loop over the rows, again with B, C & D as a key, and A being appended to the list. hth Jon. -- http://mail.python.org/mailman/listinfo/python-list