Dear all, There is a simple question regarding gene set enrichment analysis. Say, we have a simple denominator and numerator, therefore hypergeometric test looks like:
p=phyper(white-1,total white,total black,drawn). However, there is a question regarding database size. Say, my denominator (total genes on array) is equal to 10000. However, database (say GO database) harbor only 8000 from this 10000. The question is should I subtract genes from all values in phyper that do not fall into the database? By other words: original function ie: phyper(50,200,9800,500). subtract genes that didn't fall into database for example: phyper(50,180,7700,400). Should I correct my gene lists with database records? Which way is correct? Thank you in advance for the replies. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.