Fred G wrote:
Hey guys,

I want to write a short script that takes from an input excel file w/ a
bunch of rows and columns.  The two columns I'm interested in are "high
gains" and "genes."  I want the user to write:

Which genes are associated with gains over 20%?


Depending on your requirements, the simplest data structure would be just a sorted list of (high gain, genes):

data = [ (1, "GAT"),
         (3, "CCT"),
         ...
         (99, "TTG")]

Then you can answer questions like:

"Which genes are associated with gains between 17% and 48%?"

with a simple helper function:

# untested
def extract_genes(data, low=0, high=100):
    """Get the genes between low% and high% high gain."""
    for gain, genes in data:
        if gain < low: continue
        if gain > high: break
        yield genes


genes = list(extract_genes(data, 17, 48)


and then I want the script to search through the excel file, find in the
"high gains" column when the number is greater than 20%, and then store the
corresponding genes in that row (but in the "genes" column).

Unless you have hundreds of megabytes of data, you certainly don't want to be reading from the Excel file every single time you do a query. That will be very slow.

Does it have to be an .xls file? If so, you will need to install and use a third-part package to read data from the file. Google on "python excel" for more:

https://duckduckgo.com/html/?q=python+excel

Otherwise, a simple CSV file is 100% Excel compatible: use Excel to do a Save As and choose Comma Separated Values. You can use the csv module to read the data into a list, and then do all your processing in memory instead of on disk, which is like a thousand times faster.

To get the data out of the CSV file into a list, something like this (again, untested) will work:

import csv

f = open("my data file.csv", "r")
reader = csv.reader(f)
# Skip the first line, if it is a header.
header = next(reader)  # or f.next() if using Python 2.5
data = []
for row in reader:
    # I don't know the format of your data file, so I'll make it up.
    # Ignore everything except high_gain and genes.
    colour, truth, beauty, strangeness, low_gain, high_gain, genes = row
    pair = convert(high_gain), genes
    data.append(pair)

f.close()
data.sort()


You have to write your own "convert" function to turn the gains into numbers. Something like this might do it:

def convert(s):
    """Convert strings like '0.42' or '42%' to floats."""
    if s.endswith("%"):
        num = float(s[:-1])
        return num/100
    return float(s)




--
Steven


_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to