[email protected] wrote:

> Dear All,
> Sorry for my bad presentation of my problem!!
> I have this tipe of input:
> A file with a long liste of gene ad the occurence for sample:
> 
> gene  Samples
> FUS   SampleA
> TP53  SampleA
> ATF4  SampleB
> ATF3  SampleC
> ATF4  SampleD
> FUS   SampleE
> RORA  SampleE
> RORA  SampleC
> 
> WHat I want to obtain is amtrix where I have the occurence for sample.
> SampleA       SampleB SampleC SampleD SampleE
> FUS   1       0       0       0       1
> TP53  1       0       0       0       0
> ATF4  0       1               1       0
> ATF3  0       0       1       0       0
> RORA  0       0       1       0
> 
> In that way I count count the occurence in fast way!
> 
> At the moment I only able to do the list of the rownames and the sample
> names. Unfortunately I don't know how to create this matrix.
> Cold you help me ?
> Thanks for the patience and the help

Open the file, skip the first line and convert the remaining lines into 
(gene, sample) tuples. I assume that you know enough Python to do that.

Then build dict that maps (gene, sample) tuples to the number of occurences:

pivot = {
   ("FUS", "SampleA"): 1,
   ...
   ("RORA", "SampleC"): 1,
}

Remember to handle both the case when the tuple is already in the dict and 
when it's not in the dict. (Once you did it successfully have a look at the 
collections.Counter class).

Now you need the row/column labels. You can extract them from the dict with

rows = sorted(set(row for row, column in pivot)) # use set(...) to avoid 
duplicates
columns = ... # something very similar

You can then print the table with

print([""] + columns)
for row in rows:
    print([row] + [pivot.get((row, column), 0) for column in columns])

Use the str.format() method on the table cells to prettify the output and 
you're done.

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to