[Tutor] python clusters
Hi I've 30 variables in a text file and I want to read this text file and create 10 clusters based on 18 variables. I want to read an other text file and find the closest match using these clusters Could you pls. help me with this. Thanks, Sree. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Need help with dates in Python
Thanks for your help Francesco. This works. Sree. --- On Fri, 3/11/11, Francesco Loffredo f...@libero.it wrote: From: Francesco Loffredo f...@libero.it Subject: Re: [Tutor] Need help with dates in Python To: tutor@python.org Date: Friday, March 11, 2011, 1:05 AM On 09/03/2011 9.21, nookasree ponamala wrote: Hi, I need help in finding the minimum date and maximum date in a file. Here is my test file: s.no: dt1 amt id1 id2 452 2010-02-20 $23.26 059542 06107 452 2010-02-05 $20.78 059542 06107 451 2010-02-24 $5.99 059542 20151 452 2010-02-12 $114.25 839745 98101 452 2010-02-06 $28.00 839745 06032 451 2010-02-12 $57.00 839745 06269 I want to get the minimum and maximum dt1 for each id1 Required result: id1 mindate maxdate 059542 2010-02-24 2010-02-20 839745 2010-02-06 2010-02-12 Code: The code I tried. It doesn't work though. I noticed that your dates are formatted in a way that makes it easy to compare them as strings. This allows you not only to do without splitting dates into year, month and day, but also to do without the datetime module: I'm also, AFAIK, the first one to address your need for the min and max date FOR EACH ID1, not in the whole file. . ids = {} # create an empty dictionary to store results . for L in open(test.txt, r): . S = L.split() # allow direct access to fields . if S[3] in ids: . mindate, maxdate = ids[S[3]] # current stored minimum and maximum date . if S[1] mindate: . mindate = S[1] . if S[1] maxdate: . maxdate = S[1] . ids[S[3]] = (mindate, maxdate) # new stored min and max . else: . ids[S[3]] = (S[1], S[1]) # initialize storage for the current id1, with min and max in a tuple . #leave print formatting as an exercise to the reader (but you can do without it!) . print ids Hope this helps... Francesco - Nessun virus nel messaggio. Controllato da AVG - www.avg.com Versione: 10.0.1204 / Database dei virus: 1497/3495 - Data di rilascio: 09/03/2011 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Need help with dates in Python
Hi All: Thanks for all of your answers. I could solve this problem using strptime (datetime.datetime.strptime(dt1, '%Y-%m-%d') and correct indentation. As I am new to python programming, I don't know how to use classes yet, still learning. Thanks, Sree. --- On Thu, 3/10/11, James Reynolds eire1...@gmail.com wrote: From: James Reynolds eire1...@gmail.com Subject: Re: [Tutor] Need help with dates in Python To: nookasree ponamala nookas...@yahoo.com Cc: Andre Engels andreeng...@gmail.com, tutor@python.org Date: Thursday, March 10, 2011, 2:26 AM On Wed, Mar 9, 2011 at 1:34 PM, nookasree ponamala nookas...@yahoo.com wrote: Hi, I'm new to Python programming. I've changed the code to below, but still it is not working, Could you pls. make the corrections in my code. import datetime t = () tot = [] min = datetime.date(2008, 1, 1) max = datetime.date(2012, 12, 31) for line in open ('test2.txt','r'): data = line.rstrip().split() a = data[9] b = data[4] (year, month, day) = b.split('-') year = int(year) month = int(month) day = int(day) t = (year,month,day) if t max: maxyr = max if t min: minyr = min t = (a,b,maxyr,minyr) tot.append(t) print t Thanks Sree. --- On Wed, 3/9/11, Andre Engels andreeng...@gmail.com wrote: From: Andre Engels andreeng...@gmail.com Subject: Re: [Tutor] Need help with dates in Python To: nookasree ponamala nookas...@yahoo.com Cc: tutor@python.org Date: Wednesday, March 9, 2011, 2:16 PM On Wed, Mar 9, 2011 at 9:21 AM, nookasree ponamala nookas...@yahoo.com wrote: Hi, I need help in finding the minimum date and maximum date in a file. Here is my test file: s.no: dt1 amt id1 id2 452 2010-02-20 $23.26 059542 06107 452 2010-02-05 $20.78 059542 06107 451 2010-02-24 $5.99 059542 20151 452 2010-02-12 $114.25 839745 98101 452 2010-02-06 $28.00 839745 06032 451 2010-02-12 $57.00 839745 06269 I want to get the minimum and maximum dt1 for each id1 Required result: id1 mindate maxdate 059542 2010-02-24 2010-02-20 839745 2010-02-06 2010-02-12 Code: The code I tried. It doesn't work though. import sys import os t = () tot = [] maxyr = 2012 minyr = 2008 maxday = 31 minday = 1 maxmon = 12 minmon = 1 for line in open ('test2.txt','r'): data = line.rstrip().split() a = data[3] b = data[1] (year, month, day) = b.split('-') year = int(year) month = int(month) day = int(day) if year maxyr: maxyr = year elif year minyr: minyr = year if month maxmon: maxmon = month elif month minmon: minmon = month if day maxday: maxday = day elif day minday: minday = day max = (maxyr,maxmon,maxday) min = (minyr,minmon,minday) t = (a,b,max,min) tot.append(t) print t Could you pls. help me with this. I see several things go wrong. Here a list, which may well not be complete: * You want the mindate and maxdate for each id1, but you remember only a single minyr, maxyr etcetera. There's no way that that is going to work. * You initialize minyr etcetera to a date before the first date you will see, nd maxyr etcetera to a date after the last date. This means that you will never find an earlier respectively later one, so they would never be changed. You should do it exactly the other way around - minyr etcetera should be _later_ than any date that may occur, maxyr etcetera _earlier_. * You move if year maxyr back to the left. This means that it is not part of the loop, but is executed (only) once _after_ the loop has been gone through * year minyear should be if, not elif: it is possible that the new date is both the first _and_ the last date that has been found (this will be the case with the first date) * You change maxyear, maxmonth and maxday independently. That is not what you are trying to do - you want the last date, not the highest year, highest month and highest day (if the dates were 2001-12-01, 2011-11-03 and 2005-05-30, you want the maximum date to be 2011-11-03, not 2011-12-30). You should thus find a way to compare the *complete date* and then if it is later than the maxdate or earlier than the mindate change the *complete date* * At the end you show (well, in this case you don't because it is under if month maxmon) a quadruple consisting of id1, current date, lowest date and highest date - EACH time. You want only the triple and only after the last date of some value of id1 has been parsed (best to do that after all lines have been parsed
[Tutor] Need help with dates in Python
Hi, I need help in finding the minimum date and maximum date in a file. Here is my test file: s.no: dt1 amt id1 id2 452 2010-02-20 $23.26 05954206107 452 2010-02-05 $20.78 05954206107 451 2010-02-24 $5.99 05954220151 452 2010-02-12 $114.25 83974598101 452 2010-02-06 $28.00 83974506032 451 2010-02-12 $57.00 83974506269 I want to get the minimum and maximum dt1 for each id1 Required result: id1 mindate maxdate 059542 2010-02-24 2010-02-20 839745 2010-02-06 2010-02-12 Code: The code I tried. It doesn't work though. import sys import os t = () tot = [] maxyr = 2012 minyr = 2008 maxday = 31 minday = 1 maxmon = 12 minmon = 1 for line in open ('test2.txt','r'): data = line.rstrip().split() a = data[3] b = data[1] (year, month, day) = b.split('-') year = int(year) month = int(month) day = int(day) if year maxyr: maxyr = year elif year minyr: minyr = year if month maxmon: maxmon = month elif month minmon: minmon = month if day maxday: maxday = day elif day minday: minday = day max = (maxyr,maxmon,maxday) min = (minyr,minmon,minday) t = (a,b,max,min) tot.append(t) print t Could you pls. help me with this. Thanks Sree. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Need help with dates in Python
Hi, I'm new to Python programming. I've changed the code to below, but still it is not working, Could you pls. make the corrections in my code. import datetime t = () tot = [] min = datetime.date(2008, 1, 1) max = datetime.date(2012, 12, 31) for line in open ('test2.txt','r'): data = line.rstrip().split() a = data[9] b = data[4] (year, month, day) = b.split('-') year = int(year) month = int(month) day = int(day) t = (year,month,day) if t max: maxyr = max if t min: minyr = min t = (a,b,maxyr,minyr) tot.append(t) print t Thanks Sree. --- On Wed, 3/9/11, Andre Engels andreeng...@gmail.com wrote: From: Andre Engels andreeng...@gmail.com Subject: Re: [Tutor] Need help with dates in Python To: nookasree ponamala nookas...@yahoo.com Cc: tutor@python.org Date: Wednesday, March 9, 2011, 2:16 PM On Wed, Mar 9, 2011 at 9:21 AM, nookasree ponamala nookas...@yahoo.com wrote: Hi, I need help in finding the minimum date and maximum date in a file. Here is my test file: s.no: dt1 amt id1 id2 452 2010-02-20 $23.26 059542 06107 452 2010-02-05 $20.78 059542 06107 451 2010-02-24 $5.99 059542 20151 452 2010-02-12 $114.25 839745 98101 452 2010-02-06 $28.00 839745 06032 451 2010-02-12 $57.00 839745 06269 I want to get the minimum and maximum dt1 for each id1 Required result: id1 mindate maxdate 059542 2010-02-24 2010-02-20 839745 2010-02-06 2010-02-12 Code: The code I tried. It doesn't work though. import sys import os t = () tot = [] maxyr = 2012 minyr = 2008 maxday = 31 minday = 1 maxmon = 12 minmon = 1 for line in open ('test2.txt','r'): data = line.rstrip().split() a = data[3] b = data[1] (year, month, day) = b.split('-') year = int(year) month = int(month) day = int(day) if year maxyr: maxyr = year elif year minyr: minyr = year if month maxmon: maxmon = month elif month minmon: minmon = month if day maxday: maxday = day elif day minday: minday = day max = (maxyr,maxmon,maxday) min = (minyr,minmon,minday) t = (a,b,max,min) tot.append(t) print t Could you pls. help me with this. I see several things go wrong. Here a list, which may well not be complete: * You want the mindate and maxdate for each id1, but you remember only a single minyr, maxyr etcetera. There's no way that that is going to work. * You initialize minyr etcetera to a date before the first date you will see, nd maxyr etcetera to a date after the last date. This means that you will never find an earlier respectively later one, so they would never be changed. You should do it exactly the other way around - minyr etcetera should be _later_ than any date that may occur, maxyr etcetera _earlier_. * You move if year maxyr back to the left. This means that it is not part of the loop, but is executed (only) once _after_ the loop has been gone through * year minyear should be if, not elif: it is possible that the new date is both the first _and_ the last date that has been found (this will be the case with the first date) * You change maxyear, maxmonth and maxday independently. That is not what you are trying to do - you want the last date, not the highest year, highest month and highest day (if the dates were 2001-12-01, 2011-11-03 and 2005-05-30, you want the maximum date to be 2011-11-03, not 2011-12-30). You should thus find a way to compare the *complete date* and then if it is later than the maxdate or earlier than the mindate change the *complete date* * At the end you show (well, in this case you don't because it is under if month maxmon) a quadruple consisting of id1, current date, lowest date and highest date - EACH time. You want only the triple and only after the last date of some value of id1 has been parsed (best to do that after all lines have been parsed) * The code as shown will lead to a syntax error anyway because you did not indent extra after elif month minmon:, if day maxday: and elif day minday:. -- André Engels, andreeng...@gmail.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] calculate the sum of a variable - python
Hi : I'm a Senior SAS Analyst. I'm trying to learn Python. I would appreciate if anybody could help me with this. It works fine if I give input instead of reading a text file. I don't understand where I'm going wrong. I'm trying to read a text file and find out the following: 1. Sum of amt for each id 2. Count of id 3. minimum of date1 4. maximum of date1 Here is the sample text file: test.txt file: bin1cd1 date1 amt cdid cd2 452 2 2010-02-20 $23.26 0810005954206107 452 2 2010-02-20 $20.78 0 810005954206107 452 2 2010-02-24 $5.99 2 810083974520151 452 2 2010-02-12 $114.25 7 810083974598101 452 2 2010-02-06 $28.00 0 810114236206032 452 2 2010-02-09 $15.01 0 810027445306040 452 18 2010-02-13 $113.24 0 810027445306040 452 2 2010-02-13 $31.80 0 810027445306040 Here is the code I've tried out to calculate sum of amt by id: import sys from itertools import groupby from operator import itemgetter t = () tot = [] for line in open ('test.txt','r'): aline = line.rstrip().split() a = aline[5] b = (aline[3].strip('$')) t = (a,b) t1 = str(t) tot.append(t1) print tot def summary(data, key=itemgetter(0), value=itemgetter(1)): for k, group in groupby(data, key): yield (k, sum(value(row) for row in group)) if __name__ == __main__: for id, tot_spend in summary(tot, key=itemgetter(0), value=itemgetter(1)): print id, tot_spend Error: Traceback (most recent call last): File stdin, line 2, in module File stdin, line 3, in summary TypeError: unsupported operand type(s) for +: 'int' and 'str' Thanks, Sree. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] calculate the sum of a variable - python
Thanks for the reply Wayne, but still it is not working, when I used int It throws the below error: File stdin, line 2, in module File stdin, line 3, in summary File stdin, line 3, in genexpr ValueError: invalid literal for int() with base 10: ' I tried using float and the error is: Traceback (most recent call last): File stdin, line 2, in module File stdin, line 3, in summary File stdin, line 3, in genexpr ValueError: invalid literal for float(): ' Thanks, Sree. --- On Mon, 3/7/11, Wayne Werner waynejwer...@gmail.com wrote: From: Wayne Werner waynejwer...@gmail.com Subject: Re: [Tutor] calculate the sum of a variable - python To: nookasree ponamala nookas...@yahoo.com Cc: tutor@python.org Date: Monday, March 7, 2011, 9:14 AM On Sun, Mar 6, 2011 at 9:31 PM, nookasree ponamala nookas...@yahoo.com wrote: Hi : I'm a Senior SAS Analyst. I'm trying to learn Python. I would appreciate if anybody could help me with this. It works fine if I give input instead of reading a text file. I don't understand where I'm going wrong. I'm trying to read a text file and find out the following: 1. Sum of amt for each id 2. Count of id 3. minimum of date1 4. maximum of date1 Here is the sample text file: test.txt file: bin1 cd1 date1 amt cd id cd2 452 2 2010-02-20 $23.26 0 8100059542 06107 452 2 2010-02-20 $20.78 0 8100059542 06107 452 2 2010-02-24 $5.99 2 8100839745 20151 452 2 2010-02-12 $114.25 7 8100839745 98101 452 2 2010-02-06 $28.00 0 8101142362 06032 452 2 2010-02-09 $15.01 0 8100274453 06040 452 18 2010-02-13 $113.24 0 8100274453 06040 452 2 2010-02-13 $31.80 0 8100274453 06040 Here is the code I've tried out to calculate sum of amt by id: import sys from itertools import groupby from operator import itemgetter t = () tot = [] for line in open ('test.txt','r'): aline = line.rstrip().split() a = aline[5] b = (aline[3].strip('$')) t = (a,b) t1 = str(t) tot.append(t1) print tot def summary(data, key=itemgetter(0), value=itemgetter(1)): for k, group in groupby(data, key): yield (k, sum(value(row) for row in group)) if __name__ == __main__: for id, tot_spend in summary(tot, key=itemgetter(0), value=itemgetter(1)): print id, tot_spend Error: Traceback (most recent call last): File stdin, line 2, in module File stdin, line 3, in summary TypeError: unsupported operand type(s) for +: 'int' and 'str' Of course I first have to commend you for including the full traceback with the code because it makes this entirely easy to answer. In general, the traceback tells you the most important stuff last, so I'll start with this line: TypeError: unsupported operand type(s) for +: 'int' and 'str' That tells us that the problem is you are trying to use + (addition) on an integer and a string - which you can't do because of the type mismatch (TypeError). The next line File stdin, line 3, in summary tells us that the error occurred on line3 in summary: 1 | def summary(data, key=itemgetter(0), value=itemgetter(1)): 2 | for k, group in groupby(data, key): 3 | yield (k, sum(value(row) for row in group)) Well, there's no '+', but you do have 'sum', which uses addition under the hood. So how do you go about fixing it? Well, you change the value getting passed to sum to an integer (or other number): sum(int(value(row)) for row in group) Should either fix your problem, or throw a differen error if you try to convert a string like 'Hello' to an integer. (Alternatively, use float if you're interested in decimals) HTH, Wayne ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] calculate the sum of a variable - python
Thanks a lot Marc. This works now. Sree. --- On Mon, 3/7/11, Marc Tompkins marc.tompk...@gmail.com wrote: From: Marc Tompkins marc.tompk...@gmail.com Subject: Re: [Tutor] calculate the sum of a variable - python To: nookasree ponamala nookas...@yahoo.com Cc: Wayne Werner waynejwer...@gmail.com, tutor@python.org Date: Monday, March 7, 2011, 10:54 AM On Sun, Mar 6, 2011 at 8:46 PM, nookasree ponamala nookas...@yahoo.com wrote: Thanks for the reply Wayne, but still it is not working, when I used int It throws the below error: File stdin, line 2, in module File stdin, line 3, in summary File stdin, line 3, in genexpr ValueError: invalid literal for int() with base 10: ' I tried using float and the error is: Traceback (most recent call last): File stdin, line 2, in module File stdin, line 3, in summary File stdin, line 3, in genexpr ValueError: invalid literal for float(): ' Thanks, Sree. I played with it a bit and simplified things a (little) bit: b = (aline[3].strip('$')) t = (a, float(b)) tot.append(t) print tot You were converting the tuple to a string before adding it to the list; you don't need to do that, and it was concealing the real cause of your problem, which is that you either need to skip/get rid of the top line of your file, or write some error-handling code to deal with it. Currently, you're trying to convert the string 'amt' into a number, and you just can't do that. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor