@ Stephen, Thank you for the answers. I appreciate your understanding, and patience; I understand that it was confusing (unintentionally) and probably irritating to any of the seasoned tutor list members.
Your examples helped greatly, and was the push I needed. Happy Friday, and thanks again, Mike On 02/14/2013 05:48 PM, Steven D'Aprano wrote: > On 15/02/13 07:55, Michael McConachie wrote: > >> Essentially: >> >> 1. I have a list of numbers that already exist in a file. I >> generate this file by parsing info from logs. >> 2. Each line contains an integer on it (corresponding to the number >> of milliseconds that it takes to complete a certain repeated task). >> 3. There are over a million entries in this file, one per line; at >> any given time it can be just a few thousand, or more than a million. >> >> Example: >> ------- >> 173 >> 1685 >> 1152 >> 253 >> 1623 > > > A million entries sounds like a lot to you or me, but to your > computer, it's not. When you start talking tens or hundreds of > millions, that's possibly a lot. > > Do you know how to read those numbers into a Python list? Here is the > "baby step" way to do so: > > > data = [] # Start with an empty list. > f = open("filename") # Obviously you have to use the actual file name. > for line in f: # Read the file one line at a time. > num = int(line) # Convert each line into an integer (whole number) > data.append(num) # and append it to the end of the list. > f.close() # Close the file when done. > > > Here's a more concise way to do it: > > with open("filename") as f: > data = [int(line) for line in f] > > > > Once you have that list of numbers, you can sum the whole lot: > > sum(data) > > > or just a range of the items: > > sum(data[:100]) # The first 100 items. > > sum(data[100:200]) # The second 100 items. > > sum(data[-50:]) # The last 50 items. > > sum(data[1000:]) # Item 1001 to the end. (See below.) > > sum(data[5:99:3]) # Every third item, starting at index 5 and ending > at index 98. > > > > This is called "slicing", and it is perhaps the most powerful and > useful technique that Python gives you for dealing with lists. The > rules though are not necessarily the most intuitive though. > > > A slice is either a pair of numbers separated with a colon, inside the > square brackets: > > data[start:end] > > or a triple: > > data[start:end:step] > > Any of these three numbers can be left out. The default values are: > > start=0 > end=length of the sequence being sliced > step=1 > > They can also be negative. If start or end are negative, they are > interpreted as "from the end" rather than "from the beginning". > > Item positions are counted from 0, which will be very familiar to C > programmers. The start index is included in the slice, the end > position is excluded. > > The model that you should think of is to imagine the sequence of items > labelled with their index, starting from zero, and with a vertical > line *between* each position. Here is a sequence of 26 items, showing > the index in the first line and the value in the second: > > > |0|1|2|3|4|5|6|7|8|9| ... |25| > |a|b|c|d|e|f|g|h|i|j| ... |z | > > When you take a slice, the items are always cut at the left. So, if > the above is called "letters", we have: > > letters[0:4] # returns "abcd" > > letters[2:8] # returns "cdefgh" > > letters[2:8:2] # returns "ceg" > > letters[-3:] # returns "xyz" > > > >> Eventually what I'll need to do is: >> >> 1. Index the file and/or count the lines, as to identify each line's >> positional relevance so that it can average any range of numbers that >> are sequential; one to one another. > > > No need. Python already does that, automatically, when you read the > data into a list. > > > >> 2. Calculate the difference between any given (x) range. In order >> to be able to ask the program to average every 5, 10, 100, 100, or >> 10,000 etc. --> until completion. This includes the need to dealing >> with stray remainders at the end of the file that aren't divisible by >> that initial requested range. > > I don't quite understand you here. First you say "difference", then > you say "average". Can you show a sample of data, say, 10 values, and > the sorts of typical calculations you want to perform, with the > answers you expect to get? > > > For example, here's 10 numbers: > > > 103, 104, 105, 109, 111, 112, 115, 120, 123, 128 > > > Here are the running averages of 3 values: > > (103+104+105)/3 > > (104+105+109)/3 > > (105+109+111)/3 > > (109+111+112)/3 > > (111+112+115)/3 > > (112+115+120)/3 > > (115+120+123)/3 > > (120+123+128)/3 > > > Is that what you mean? If so, then Python can deal with this > trivially, using slicing. With your data stored in list "data", as > above, I can say: > > > for i in range(0, len(data)-3): # Stop 3 from the end. > print sum(data[i:i+3]) > > > to print the running sums taking three items at a time. > > > > The rest of your post just confuses me. Until you explain exactly what > calculations you are trying to perform, I can't tell you how to > perform them :-) > > > > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor