Greetings Chelsea, >So what I am trying to do is take in a csv file and the format of >the csv file is:
While I do not have a suggestion to you about the Python you have sent to the list, I have a few suggestions about how to approach the problem differently. I hope that this may make your task easier. >something, something1, something2,something3, something4, >something5, something6, something7. Then in the csv file I want to >search for a certain range of dates like 1/3/2016 - 2/3/2016. I can >get individual dates but not a range of dates. Option 1 for date handling: Consider converting your date column to a datetime object, which then can be used in comparisons (and sorting). >>> d0 = datetime.datetime.strptime('1/3/2016', '%m/%d/%Y') >>> d1 = datetime.datetime.strptime('2/3/2016', '%m/%d/%Y') >>> d0 > d1 False >>> d0 < d1 True This is good for general date and time handling. If you are only handling date,s then you may wish to use the strptime() function to parse your date and then choose to convert it to a date. See also below my point about serialized (on disk) representation and memory representation. Option 2 for date handling: Consider converting your date to a time struct, which can then by used in comparisons and sorting. >>> t0 = time.strptime('1/3/2016', '%m/%d/%Y') >>> t1 = time.strptime('2/3/2016', '%m/%d/%Y') >>> t0 > t1 False >>> t0 < t1 True The time library is good, too. I like it because I have, historically, preferred the Epoch timestamp, which is how several classes of computers expose their time-tracking to applications. >>> time.time() 1455044423.592147 >I have an if elif statement to read row5 which is the date row. My >if statement is the initial pass at returning the values within the >date range and then my elif is the parameter and if there is no >parameter passed in then it returns all data. >I am having some trouble trying to pass in a date range parameter. >The piece of code is under the def populate_dict function the >date_choice part. Here is the code: Even though I would separate the selection of record from the reading and conversion of the data (see more below), I think you want to do something more like this (to handle a range of time): def populate_dict(self, filename, date_choice, key_choice): # -- date_choice must be a list/tuple of datetime types (datetime, datetime) with open(filename, 'rb') as f: reader = csv.reader(f) next(reader, None) # -- skipping the header, eh? for row in reader: eventdate = datetime.datetime.strptime(row[5], '%m/%d/%Y') if date_choice is None: self.dict[row[2]].append(row[3]) elif date_choice[0] <= eventdate <= date_choice[1]: self.dict[row[2]].append(row[3]) if key_choice == row[3]: self.dict[row[2]].append(row[3]) elif key_choice == "none": self.dict[row[2]].append(row[3]) Handling time ranges is always a wee bit tricky. If you don't convert the data into some sort of convenient format before trying to apply the range criteria, you'll slowly go crazy trying to figure out what is wrong. How much data do you have? I'm assuming it is small(er than half a million rows). If so, then, there's no harm in loading it all into memory and then filtering in (or out) the stuff you want. But, most importantly, I have a general suggestion about separating the serialized (on-disk) format from the format you are using in memory. I will make reference to another data handling library in Python, but mostly to illustrate the idea. You can read CSV data sets with a tool called pandas [0]. There are many reasons it is popular, but I'll examine one feature. If you load a CSV file using pandas, it tries to apply a bit of inference logic to each data type as it reads a file. You said that the sixth column of your data is a date-formatted field. If the pandas library can determine that the sixth field in every row can be converted to a datetime() type (or an int() or a float(), etc.), it will automatically convert that field. This is useful because the serialized form is a string that looks like '1/3/2016' or '2016-02-09 11:13:06' or 'Tue Feb 9 19:13:24 UTC 2016'. But, you as a human are not thinking "this is a string". You are thinking of it as a date. So, simply put: Serialized form: a string representation that contains a date Form in memory: a date, datetime, or time type (of some kind) Again, if you are working with a small amount of data (like less than 50MB), don't be afraid to load it into memory. You can make things more efficient later, if need be. If you have a larger data set or you want to perform all sorts of different computations, sorting and grouping, you may find a SQLite database a good solution (as Joel Goldstick has just posted). So, finally, if you are merely reading a CSV file, I might suggest that you convert the serialized representation of the data (a string) into an object or datatype in memory that allows you to perform the desired sorting, computation, calculation or range-finding. Also, if you have a smaller block of code and data, we may find it easier to make specific suggestions. And, one last general comment.... Unless you have a specific reason why not to do so, it's best to convert and decode inputs into a canonical memory representation as soon as possible when reading from disk/network. Good luck, -Martin [0] http://pandas.pydata.org/ -- Martin A. Brown http://linux-ip.net/ _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor