Re: [Tutor] understanding the behavious of parameter 'key' in sort
it was a bit tricky, thanks :) On Tue, Nov 24, 2009 at 9:00 AM, Lie Ryan lie.1...@gmail.com wrote: Shashwat Anand wrote: I intended to sort a list which sorts according to user-defined custom sorting-order. For example: If sorting-order is zabc...wxy, then the output will be in lexicographically sorted order but z will always be given priority over rest others. as a test case i took sorting order as reverse of normal sorting, hence i defined user_key as string.ascii_lowercase. It should sort in reverse manner but I'm not getting expected output. (Even though it could have just been sorted as reverse=True, but my intention is to generalize it for this is just a test-case). I'm not able to find where the bug lies nor am i exactly sure how the key function works, even though i use it in a regular fashion. Can you guys help me out ? Your code is not wrong. It's your expected output (or your need) that's different from a typical definition of lexicographical sorting. In a typical lexicographical sorting a comes before ab since a is shorter than ab. So, if you want this: expected output: ['cba', 'cab', 'abc', 'ab', 'aa', 'a'] you must use a custom cmp= argument to reverse the shorter substring case: like this: import string def my_cmp(s1, s2): if s1.startswith(s2): return -1 elif s2.startswith(s1): return 1 else: return cmp(s1, s2) def userdef_sort(l, user_key): table = string.maketrans(.join(sorted(user_key)), user_key) trans = lambda x: x.translate(table) return sorted(l, cmp=my_cmp, key=trans) #user_key = raw_input() user_key = string.ascii_lowercase[::-1] l = ['a', 'aa', 'ab', 'abc', 'cba', 'cab'] print userdef_sort(l, user_key) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Alternatives to get IP address of a computer : which one should I use ?
I was going through a python scrip woof ( http://www.home.unix-ag.org/simon/woof ) and there was a portion of the code dedicated to get IP address def find_ip (): if sys.platform == cygwin: ipcfg = os.popen(ipconfig).readlines() for l in ipcfg: try: candidat = l.split(:)[1].strip() if candidat[0].isdigit(): break except: pass return candidat os.environ[PATH] = /sbin:/usr/sbin:/usr/local/sbin: + os.environ[PATH] platform = os.uname()[0]; if platform == Linux: netstat = commands.getoutput (LC_MESSAGES=C netstat -rn) defiface = [i.split ()[-1] for i in netstat.split ('\n') if i.split ()[0] == 0.0.0.0] elif platform in (Darwin, FreeBSD, NetBSD): netstat = commands.getoutput (LC_MESSAGES=C netstat -rn) defiface = [i.split ()[-1] for i in netstat.split ('\n') if len(i) 2 and i.split ()[0] == default] elif platform == SunOS: netstat = commands.getoutput (LC_MESSAGES=C netstat -arn) defiface = [i.split ()[-1] for i in netstat.split ('\n') if len(i) 2 and i.split ()[0] == 0.0.0.0] else: print sys.stderr, Unsupported platform; please add support for your platform in find_ip().; return None if not defiface: return None if platform == Linux: ifcfg = commands.getoutput (LC_MESSAGES=C ifconfig + defiface[0]).split (inet addr:) elif platform in (Darwin, FreeBSD, SunOS, NetBSD): ifcfg = commands.getoutput (LC_MESSAGES=C ifconfig + defiface[0]).split (inet ) if len (ifcfg) != 2: return None ip_addr = ifcfg[1].split ()[0] # sanity check try: ints = [ i for i in ip_addr.split (.) if 0 = int(i) = 255] if len (ints) != 4: return None except ValueError: return None return ip_addr It gets OS name, run netstat -rn, gets the interface name via it ('en' in my case i.e. Darwin, and then run ifconfig and split it via 'inet ' and gets the IP and do a check. Nice !! However if I want to get my IP I can get it via: socket.gethostbyname(socket.gethostname()) I want to know why the above approach is followed, is it so because of a check via network ? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Sorting Data in Databases
On Mon, Nov 23, 2009 at 11:26 PM, Ken G. beach...@insightbb.com wrote: I am getting more and more surprised of what Python can do. Very comprehensive. It's a much safer bet to assume that Python can do ( http://xkcd.com/353/ )http://xkcd.com/353/ anything ( http://xkcd.com/413/ ). You just have to install the right libraries ;) -Wayne -- To be considered stupid and to be told so is more painful than being called gluttonous, mendacious, violent, lascivious, lazy, cowardly: every weakness, every vice, has found its defenders, its rhetoric, its ennoblement and exaltation, but stupidity hasn’t. - Primo Levi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] value of 'e'
Followed by this discussion on Hacker Newshttp://news.ycombinator.com/item?id=958323I checked this link http://www.isotf.org/?page_value=13223 and was wondering how to calculate value of 'e' http://mathworld.wolfram.com/e.html to a large extent as e = 1/0! + 1/1! +1/2! and so on... so i wrote this: sum(1.0 / math.factorial(i) for i in range(100)) 2.7182818284590455 It was not giving the precision that I wanted so I tried decimal module of which I was not much aware of. decimal.getcontext().prec = 100 sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in range(100)) Decimal('2.718281828459409995603699925637255290043107782360218523330012825771122202286299367023903783933889309') Until now no problem I matched the value of 'e' from herehttp://dl.dropbox.com/u/59605/ten_million_e.txt which claims it have 10 million digits of e the first few digits of 'e' from there which doesn't match with the result I got: 2.71828182845904523536028747135266249775724709369995957496696762772407 so i tried, sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in range(1000)) Traceback (most recent call last): File input, line 1, in module File input, line 2, in genexpr OverflowError: long int too large to convert to float And then i went clueless !! How can it be done ? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] value of 'e'
On Tue, Nov 24, 2009 at 5:47 AM, Shashwat Anand anand.shash...@gmail.comwrote: And then i went clueless !! How can it be done ? Well, upon inspection it seems that The math module consists mostly of thin wrappers around the platform C math library functions - one would presume those are accurate, but I don't know to how many places. You might try writing your own factorial function that works with the decimal type and compare with the result you get from using the math library. If you find a discrepancy I'm sure there are places to file a bug report. HTH, Wayne (Of course it's also possible that your source who claims to have the correct digits is faulty! See if you can verify by other sources) -- To be considered stupid and to be told so is more painful than being called gluttonous, mendacious, violent, lascivious, lazy, cowardly: every weakness, every vice, has found its defenders, its rhetoric, its ennoblement and exaltation, but stupidity hasn’t. - Primo Levi ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Is pydoc the right API docs?
I'm not sure I'm using pydoc correctly. I only seem to get abbreviated help rather than full documentation. It happens often enough that I think I'm doing something wrong. For example, I want to upgrade my scripts to use .format() from using %. $ pydoc format Help on built-in function format in module __builtin__: format(...) format(value[, format_spec]) - string Returns value.__format__(format_spec) format_spec defaults to Well, that just tells me that there is an entity called format_spec. I want to know what format_spec actually IS so I can use it. I try: $ pydoc -k format_spec $ Nothing. I found the answer using google, but that won't work if I'm offline. Am I using pydoc correctly? Or is there a more complete language spec? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] value of 'e'
Shashwat Anand wrote: How can it be done ? import decimal, math D = decimal.Decimal decimal.getcontext().prec = 100 sum(D(1) / D(math.factorial(i)) for i in range(1000)) Decimal('2.718281828459045235360287471352662497757247093699959574966967627724076 630353547594571382178525166428') ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] value of 'e'
On Tue, Nov 24, 2009 at 6:47 AM, Shashwat Anand anand.shash...@gmail.com wrote: Followed by this discussion on Hacker News I checked this link and was wondering how to calculate value of 'e' to a large extent as e = 1/0! + 1/1! +1/2! and so on... so i wrote this: sum(1.0 / math.factorial(i) for i in range(100)) 2.7182818284590455 It was not giving the precision that I wanted so I tried decimal module of which I was not much aware of. decimal.getcontext().prec = 100 sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in range(100)) You are using floating point division here. The argument to math.factorial() is an integer, so the conversion of i to Decimal is not doing anything - it is converted back to an integer. Then you compute 1./some large integer. This will use floating point math and will fail when the factorial is too large to represent in floating point. You should convert the result of factorial() to Decimal and compute Decimal(1.0)/Decimal(factorial). This should give you additional precision as well. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Is pydoc the right API docs?
On Tue, Nov 24, 2009 at 5:06 AM, Nick reply_to_is_va...@nowhere.com.invalid wrote: I'm not sure I'm using pydoc correctly. I only seem to get abbreviated help rather than full documentation. It happens often enough that I think I'm doing something wrong. I found the answer using google, but that won't work if I'm offline. Am I using pydoc correctly? Or is there a more complete language spec? pydoc just shows the docstrings. It does not include the full text of the documentation. For that see http://python.org/doc/ You can download the docs for offline use, see the above link. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] value of 'e'
On Tue, Nov 24, 2009 at 7:01 AM, Wayne Werner waynejwer...@gmail.com wrote: Well, upon inspection it seems that The math module consists mostly of thin wrappers around the platform C math library functions - one would presume those are accurate, but I don't know to how many places. You might try writing your own factorial function that works with the decimal type and compare with the result you get from using the math library. I don't think there is anything wrong with math.factorial(). The problem is that he is using floating-point (limited precision) division. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] value of 'e'
Wayne Werner wrote: You might try writing your own factorial function that works with the decimal type and compare with the result you get from using the math library. There is no need for that, math.factorial will use python int/long object instead of the platform's integer as necessary. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Class understanding
Hi all... Have been attempting to understand classes... Been getting along without them for a while now and feel it's time to jump in What I want to do it start a log with the logging module... I have this working without classes, but want to try... Here is a snippet of the code that I am hacking on: class logger(): import logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)-12s - %(levelname)-8s - %(message)s', #format='%(asctime)s %(levelname)s %(message)s', filename='T2Notify.log', filemode='a') logging_output = logging.getLogger('logging_output.core') print Log set up def write2log(log_info): logging_output.info(log started) print written to log return() logger() logger.write2log(log_info) What I want to do it be able to set up the log, but have something outside the class be able to write log updates to write2log under the logger class... the logger.write2log() is not working :)... Any ideas, encouragement, or pointers to good docs would be helpful... I've done a lot of searching via Google on classes, and it's all confusing to me... -Joe ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Is pydoc the right API docs?
11/24 Kent Johnson ken...@tds.net: On Tue, Nov 24, 2009 at 5:06 AM, Nick reply_to_is_va...@nowhere.com.invalid wrote: I'm not sure I'm using pydoc correctly. I only seem to get abbreviated help rather than full documentation. It happens often enough that I think I'm doing something wrong. I found the answer using google, but that won't work if I'm offline. Am I using pydoc correctly? Or is there a more complete language spec? pydoc just shows the docstrings. It does not include the full text of the documentation. For that see http://python.org/doc/ You can download the docs for offline use, see the above link. Kent ___ Tutor maillist - tu...@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor Although, iirc the online docs are generated by pydoc. It tells you that it calls value.__format__, so try $pydoc some_value.__format__ For example: $pydoc str.__format__ I don't have python installed here, so can't check it, but it might give you some more information. The same will happen if you try looking at the pydocs for, for example, str or repr: they are wrappers for magic methods on the object it is called with. -- Rich Roadie Rich Lovely There are 10 types of people in the world: those who know binary, those who do not, and those who are off by one. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Is pydoc the right API docs?
On Tue, Nov 24, 2009 at 11:43 AM, Rich Lovely roadier...@googlemail.com wrote: 11/24 Kent Johnson ken...@tds.net: pydoc just shows the docstrings. It does not include the full text of the documentation. For that see http://python.org/doc/ Although, iirc the online docs are generated by pydoc. No, they are created with Sphinx from reStructuredText source. Click the Show Source link in the sidebar of any docs page to see. The online docs do include much of the same text as the doc strings but they are separately written and generated. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Sorting Data in Databases
That is a surprise to me. I did not know that Python would work with SQLite. Sure, as someone else said, Python comes with a LOT of libraries built right in when you download Python. This is known as batteries included, that is, what comes with the standard distribution of Python. I will look into Alan's tutorial on DB. It is of course thorough and will really provide understanding. But just to emphasize how simple creating a SQLite database in Python is, I recommend you do these 4 simple steps: 1. Download and install the very nice SQLite Database Browser application, from here: http://sourceforge.net/projects/sqlitebrowser/ It's simple and good. 2. Now open IDLE (which also comes with Python), do File New Window, and paste this simple Python code into that window: #-- #get SQLite into Python...it's that simple! import sqlite3 #Make a connection to a database...if it doesn't exist yet, we'll create it. conn = sqlite3.connect('my_database.db') #Create a cursor, a kind of pen that writes into the database. cur = conn.cursor() #Write a table, called here MyTable, into the database, and give it two fields, # name and address. cur.execute('''CREATE TABLE if not exists MyTable (name, address)''') #Now actually write some data into the table you made: cur.execute('INSERT INTO MyTable VALUES(?,?)',('John','Chicago')) #Always have to commit any changes--or they don't stick! conn.commit() #You're done! #-- Without the comments, (which explain a bit about why it is written as it is) this is just this small an amount of Python code--6 lines: import sqlite3 conn = sqlite3.connect('my_database.db') cur = conn.cursor() cur.execute('''CREATE TABLE if not exists MyTable (name, address)''') cur.execute('INSERT INTO MyTable VALUES(?,?)',('John','Chicago')) conn.commit() 3. Run your program in IDLE (Run Run Module...or just hit F5). Save it to your Desktop. 4. Now view your handiwork in the SQLite Database Browser. Open it and then do File Open Database, then find a file on your Desktop called mydatabase.db. Open it. Now you are looking at the database you just made. Click on the Browse Data tab and you are now seeing that John lives in Chicago. It's that simple to at least get started. Thanks, Python. Che I am getting more and more surprised of what Python can do. Very comprehensive. Thanks all. Ken _ Bing brings you maps, menus, and reviews organized in one place. http://www.bing.com/search?q=restaurantsform=MFESRPpubl=WLHMTAGcrea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Class understanding
Date: Tue, 24 Nov 2009 10:27:05 -0600 From: jammer10...@gmail.com To: tutor@python.org Subject: [Tutor] Class understanding Hi all... Have been attempting to understand classes... Been getting along without them for a while now and feel it's time to jump in What I want to do it start a log with the logging module... I have this working without classes, but want to try... Here is a snippet of the code that I am hacking on: I'm sure the better explainers will jump in presently, but let me try a few tips... class logger(): The convention in Python is to make class names capitalized. It is not necessary, but it is a good habit to get into, so class Logger(). import logging Imports are traditionally done at the top of a Python file, not within a class. logger() This calls the class but doesn't create a name for an instance of the class, so you won't be able to access it later. Instead, try (assuming you rename logger() to Logger() ), logger_instance = Logger() Now you have a name for that instance of the class, and so can access the goodies inside the class. logger.write2log(log_info) So that would now be: logger_instance.write2log(log_info) encouragement, or pointers to good docs would be helpful... I've done a lot of searching via Google on classes, and it's all confusing to me... Keep trying. There have to be tons of good tutorials on classes. They fall under the heading of Object Oriented Programming. I tend to think of a class as a container that has all the stuff you will need to do a certain set of actions. It can contain data (facts) and it can contain methods (functions). You can create one or more instances of any class (a traditional example being that Dog() is a class whereas fluffy is an instance of a dog, and therefore has all the traditional dog methods, like bark(), wag(), etc.) CM _ Windows 7: It works the way you want. Learn more. http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009v2___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] the art of testing
Hi everyone, The list recently discussed the virtues of unit testing, and I was hoping someone could offer some high-level advice and further resources as I try to apply the TDD methodology. I'm trying to develop an application that regularly downloads some government data (in XML), parses the data and then updates a database. Simple enough in theory, but the problem I'm hitting is where to begin with tests on data that is ALL over the place. The agency in question performs little data validation, so a given field can have a huge range of possible inputs (e.g. - a Boolean field should be 0 or 1, but might be blank, have a negative number or even strings like the word 'None'). In such a case, should I be writing test cases for *expected* inputs and then coding the the parser portion of my program to handle the myriad of possible bad data? Or given the range of possible inputs, should I simply skip testing for valid data at the parser level, and instead worry about flagging (or otherwise handling) invalid input at the database-insertion layer (and therefore write tests at that layer)? Or should I not be testing data values at all, but rather the results of actions performed on that data? It seems like these questions must be a subset of the issues in the realm of testing. Can anyone recommend a resource on the types of tests that should be applied to the various tasks and stages of the development process? A friend recommended The Art of Software Testing -- is that the type of book that covers these issues? If so, can anyone recommend a suitable alternative that costs less than $100? As always, I appreciate the advice. Regards, Serdar ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] (no subject)
OkaMthembo zebr...@gmail.com wrote When i started off i had pretty much the same questions. I think you need to start with the Python tutorial as it will show you the basics Unfortunately it won't if the OP is a complete beginner - and from his email it sounds like he is. The standard tutorial assumes quite a lot of knowledge about programming, assuming you know at least one other language. Thats why there are several absolute beginners tutorials - because for many python programmers it is their first exposure and the standard tutorial is not ideal for them. OTOH, If you have ever done any programming before then the standard tutorial is excellent. keywords and how to define and use functions, classes, modules etc). This is a good example. The standard tutorial assumes readers know what a function is and why you'd want to use one. The section on classes starts with a fauirly detailed description of namespaces and scopes and the fine differences between them - completely meaningless to a complete beginner. And of course it doesn't describe IDLE - which is what the OP says he has available. -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
On Tue, Nov 24, 2009 at 2:02 PM, Serdar Tumgoren zstumgo...@gmail.com wrote: Hi everyone, The list recently discussed the virtues of unit testing, and I was hoping someone could offer some high-level advice and further resources as I try to apply the TDD methodology. I'm trying to develop an application that regularly downloads some government data (in XML), parses the data and then updates a database. Simple enough in theory, but the problem I'm hitting is where to begin with tests on data that is ALL over the place. The agency in question performs little data validation, so a given field can have a huge range of possible inputs (e.g. - a Boolean field should be 0 or 1, but might be blank, have a negative number or even strings like the word 'None'). In such a case, should I be writing test cases for *expected* inputs and then coding the the parser portion of my program to handle the myriad of possible bad data? Yes. The parser needs to handle the bad data in some appropriate way, unless you are filtering out the bad data before it reaches the parser. The tests should cover the range of expected inputs, both good and bad data. If you want to test the parser, you should write tests that ensure that it behaves appropriately for the full range of expected data. So your tests should include the full range of good data and some sample bad data. The book Pragmatic Unit Testing has a lot of guidelines about what to test. The examples are in Java (or C#) but JUnit and Python's unittest are pretty similar and the ideas certainly apply to any language. Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Serdar Tumgoren wrote: Hi everyone, The list recently discussed the virtues of unit testing, and I was hoping someone could offer some high-level advice and further resources as I try to apply the TDD methodology. TDD is different from data validation. TDD ensures program correctness. Data validation ensures input correctness. In such a case, should I be writing test cases for *expected* inputs and then coding the the parser portion of my program to handle the myriad of possible bad data? Yes, the parser should handle all bad data and respond in appropriate manner (raise an error or flag for manual check by programmer). Input should be sanitized as early as possible. If you want to apply TDD here; you will be checking that the parser correctly normalize all bad data into the proper form (e.g. all 0, None, False, empty/empty in the boolean field is properly normalized to False (I assume there is no difference between each different representation of False?)). ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Lie and Kent, Thanks for the quick replies. I've started writing some requirements, and combined with your advice, am starting to feel a bit more confident on how to approach this project. Below is an excerpt of my requirements -- basically what I've learned from reviewing the raw data using ElementTree at the command line. Are these the types of requirements that are appropriate for the problem at hand? Or am I not quite hitting the mark for the data validation angle? I figured once I write down these low-level rules about my input, I can start coding up the test cases...Is that correct? requirements snippet Root node of every XML file is PublicFiling Every PublicFiling node must contain at least one Filing node Every Filing must contain 'Type' attribute Every Filing must contain 'Year' attribute, etc. Filing node must be either a Registration or activity Report Filing is a Registration when 'Type' attribute equals 'Registration' or 'Registration Amendment' Registration must not have an 'Amount' attribute Registration must not have an 'is_state_or_local_attrib' end requirements ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Class understanding
Che M wrote: Date: Tue, 24 Nov 2009 10:27:05 -0600 From: jammer10...@gmail.com To: tutor@python.org Subject: [Tutor] Class understanding Hi all... Have been attempting to understand classes... Been getting along without them for a while now and feel it's time to jump in What I want to do it start a log with the logging module... I have this working without classes, but want to try... Here is a snippet of the code that I am hacking on: I'm sure the better explainers will jump in presently, but let me try a few tips... class logger(): The convention in Python is to make class names capitalized. It is not necessary, but it is a good habit to get into, so class Logger(). import logging Imports are traditionally done at the top of a Python file, not within a class. logger() This calls the class but doesn't create a name for an instance of the class, so you won't be able to access it later. Instead, try (assuming you rename logger() to Logger() ), logger_instance = Logger() Now you have a name for that instance of the class, and so can access the goodies inside the class. logger.write2log(log_info) So that would now be: logger_instance.write2log(log_info) encouragement, or pointers to good docs would be helpful... I've done a lot of searching via Google on classes, and it's all confusing to me... Keep trying. There have to be tons of good tutorials on classes. They fall under the heading of Object Oriented Programming. I tend to think of a class as a container that has all the stuff you will need to do a certain set of actions. It can contain data (facts) and it can contain methods (functions). You can create one or more instances of any class (a traditional example being that Dog() is a class whereas fluffy is an instance of a dog, and therefore has all the traditional dog methods, like bark(), wag(), etc.) CM For my first class, I'd have picked something self-contained, and probably something dumb simple, so as not to be confused between the stuff in the imports and the problems in understanding how class instances, methods, and attributes work. Anyway, you probably understand the logging module better than I; you certainly couldn't understand less. Also, probably because you used tabs, the current code is heavily indented, and pretty hard to follow. The def line is indented about 26 columns, where I'd expect four. CM has pointed out several important things. In addition, I need to point out that you need a self parameter on your method(s). And that if you use the same name for the argument as you used in the parameter, you can get confused as to who is doing what. Also, you want to derive new classes from object, for reasons that probably won't matter now, but when they do, it's easier if you've already got the habit. And finally I don't think you were planning to return an empty tuple. Probably you used syntax from other languages. In Python, to return nothing, use one of three forms: 1) fall off the end of the function/method 2) return with no argument 3) return None So your code would become: import logging class Logger: ... some initialization logic, which I don't know about... def write2log(self, log_msg): print writing to log, log_msg ... some logging stuff... return inst = Logger() log_info = This is first msg inst.write2log(log_info) I'm not sure why this is a class, unless you want to be able to have multiple loggers (instances of Logger). And in that case, you presumably would need another method, the Python constructor, which is called __init__() DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Difficulty with csv files - line breaks
What I'm trying to do is store a bunch of information into a .csv file. Each row will contain a date, webpage, etc of a job application. My difficulty is that it seems something I am doing is not recording the line breaks. I've read that \r\n are default in the csv module but so far I can not seem to use them successfully..So every time I enter a new list of data it just gets appended to the end of the last row in the .csv file. It should read: date, company, title, site, site url, ref_ID date, company, title, site, site url, ref_ID but instead it does this: date, company, title, site, site url, ref_IDdate, company, title, site, site url, ref_ID and so forth Here's the segment of code responsible for my troubles. import csv date = raw_input(Enter the date applied: ) company = raw_input(Enter the company applied to: ) job_title = raw_input(Enter the job title: ) site = raw_input(Enter the website used: ) site_url = raw_input(Paste the URL here: ) ref_ID = raw_input(Enter the reference ID: ) entry_list = [date, company, job_title, site, site_url, ref_ID] print Are you sure you want to add\n, for entry in entry_list: print entry print to the file? answer = yes_and_no() if answer == y: append_file(entry_list,filename) def append_file(list,filename): text_file = open(filename, a) writer = csv.writer(text_file, quoting=csv.QUOTE_NONNUMERIC) writer.writerow(list) text_file.close() ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
On Tue, Nov 24, 2009 at 3:41 PM, Serdar Tumgoren zstumgo...@gmail.com wrote: I've started writing some requirements, and combined with your advice, am starting to feel a bit more confident on how to approach this project. Below is an excerpt of my requirements -- basically what I've learned from reviewing the raw data using ElementTree at the command line. Are these the types of requirements that are appropriate for the problem at hand? Or am I not quite hitting the mark for the data validation angle? I'm not really sure where you are going with this? This looks like a data specification, but you said the data is poorly specified and not under your control. So is this a specification of a data validator? I figured once I write down these low-level rules about my input, I can start coding up the test cases...Is that correct? Yes...but I'm not really clear what it is you want to test. What does your code do? What if a Filing does not have a 'Type' attribute? Kent requirements snippet Root node of every XML file is PublicFiling Every PublicFiling node must contain at least one Filing node Every Filing must contain 'Type' attribute Every Filing must contain 'Year' attribute, etc. Filing node must be either a Registration or activity Report Filing is a Registration when 'Type' attribute equals 'Registration' or 'Registration Amendment' Registration must not have an 'Amount' attribute Registration must not have an 'is_state_or_local_attrib' end requirements ___ Tutor maillist - tu...@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Serdar Tumgoren wrote: Lie and Kent, Thanks for the quick replies. I've started writing some requirements, and combined with your advice, am starting to feel a bit more confident on how to approach this project. Below is an excerpt of my requirements -- basically what I've learned from reviewing the raw data using ElementTree at the command line. Are these the types of requirements that are appropriate for the problem at hand? Or am I not quite hitting the mark for the data validation angle? I figured once I write down these low-level rules about my input, I can start coding up the test cases...Is that correct? requirements snippet Root node of every XML file is PublicFiling Every PublicFiling node must contain at least one Filing node Every Filing must contain 'Type' attribute Every Filing must contain 'Year' attribute, etc. Filing node must be either a Registration or activity Report Filing is a Registration when 'Type' attribute equals 'Registration' or 'Registration Amendment' Registration must not have an 'Amount' attribute Registration must not have an 'is_state_or_local_attrib' end requirements That's a good start. You're missing one requirement that I think needs to be explicit. Presumably you're requiring that the XML be well-formed. This refers to things like matching xxx and /xxx nodes, and proper use of quotes and escaping within strings. Most DOM parsers won't even give you a tree if the file isn't well-formed. In addition, you want to state just how flexible each field is. You mentioned booleans could be 0, 1, blank, ... You might want ranges on numerics, or validation on specific fields such as Year, month, day, where if the month is 2, the day cannot be 30. But most importantly, you can divide the rules where you say if the data looks like the file is rejected. Versus if the data looks like , we'll pretend it's actually , and keep going. An example of that last might be what to do if somebody specifies March 35. You might just pretend March 31, and keep going. Don't forget that errors and warnings for the input data need to be output in a parseable form, at least if you expect more than one or two cases per run. DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
I'm not really sure where you are going with this? This looks like a data specification, but you said the data is poorly specified and not under your control. So is this a specification of a data validator? The short answer -- yes, these are specs for a data validator. And I should have been more specific about my problem domain. I'm cobbling together a specification using various government manuals and a *very* limited data definition. For instance, the agency states that a lobbyist's status must be either active (0), terminated (1) or administratively terminated (2) or undetermined (3). So I know the expected inputs for that field. However, the agency does not validate that data and it's possible for that field to be blank or even contain gobbledygook strings such as a 'Enter Lobbyist Status' (residue from software built atop the agency's automated filing service). In other cases, based on working with the raw data, I've ascertained that every Filing has at least a unique ID, and seems to have a Year, etc. So it's a mish-mash of pre-defined specs (as best as I can ascertain from the government), and patterns I'm discerning in the data. Yes...but I'm not really clear what it is you want to test. What does your code do? What if a Filing does not have a 'Type' attribute? At this stage, I'm just trying to perform some basic validation on the input. 'Type' is an attribute that I'd expect every filing to contain (and if it does not, my code would have to log the record for human review). ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
That's a good start. You're missing one requirement that I think needs to be explicit. Presumably you're requiring that the XML be well-formed. This refers to things like matching xxx and /xxx nodes, and proper use of quotes and escaping within strings. Most DOM parsers won't even give you a tree if the file isn't well-formed. I actually hadn't been checking for well-formedness on the assumption that ElementTree's parse method did that behind the scenes. Is that not correct? (I didn't see any specifics on that subject in the docs: http://docs.python.org/library/xml.etree.elementtree.html) But most importantly, you can divide the rules where you say if the data looks like the file is rejected. Versus if the data looks like , we'll pretend it's actually , and keep going. An example of that last might be what to do if somebody specifies March 35. You might just pretend March 31, and keep going. Ok, so if I'm understanding -- I should convert invalid data to sensible defaults where possible (like setting blank fields to 0); otherwise if the data is clearly invalid and the default is unknowable, I should flag the field for editing, deletion or some other type of handling. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Difficulty with csv files - line breaks
Tim, I've checked your code and it seems to work as far as using newlines for the line terminator. The default line terminator is \r\n, which might not show up correctly in some text editors. Otherwise, try checking to see if you've specified a blank line for the line terminator. You can set it explicitly when you create your csv.writer: writer = csv.writer(text_file, quoting=csv.QUOTE_NONNUMERIC, lineterminator='\r\n') -Al You should check out my free beginner's Python book here: http://inventwithpython.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Serdar Tumgoren zstumgo...@gmail.com wrote: Lie and Kent, Thanks for the quick replies. I've started writing some requirements, and combined with your advice, am starting to feel a bit more confident on how to approach this project. Below is an excerpt of my requirements -- basically what I've learned from reviewing the raw data using ElementTree at the command line. Are these the types of requirements that are appropriate for the problem at hand? Or am I not quite hitting the mark for the data validation angle? I figured once I write down these low-level rules about my input, I can start coding up the test cases...Is that correct? requirements snippet Root node of every XML file is PublicFiling Every PublicFiling node must contain at least one Filing node Every Filing must contain 'Type' attribute Every Filing must contain 'Year' attribute, etc. Filing node must be either a Registration or activity Report Filing is a Registration when 'Type' attribute equals 'Registration' or 'Registration Amendment' Registration must not have an 'Amount' attribute Registration must not have an 'is_state_or_local_attrib' end requirements This is a semantic schema (see wikipedia), meaning the specification of data structures describing something meaningfully. It seems the major part of your program's task is checking correctness of parsed data (semantic validation). Then the specification of your program should be the description of what it is supposed to do when processing valid and (most importantly) invalid data of all sorts. From this, you can directly write tests: in a sense, tests are a rewriting of a program's specification (*). Denis (*) The reason why, when a program is fully specified, one can write tests before starting coding. But afaik this works only for trivial apps, or inside a limited domain we know well. For we usually discover the app better as we develop it, which in turn changes its definition, often even dramatically. la vita e estrany http://spir.wikidot.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Alternatives to get IP address of a computer : which one should I use ?
On Tuesday 24 November 2009, Shashwat Anand wrote: On my openSuse 11.0 machine your method doesn't work as intended: e...@lixie:~ python Python 2.5.2 (r252:60911, Dec 1 2008, 18:10:01) [GCC 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]] on linux2 Type help, copyright, credits or license for more information. import socket socket.gethostbyname(socket.gethostname()) '127.0.0.2' It's a valid IP of my computer, but not the one you wanted. You could have written this one from memory (well I would have written 127.0.0.1, which is also valid). Parsing the output from ifconfig would work for my computer. Kind regards, Eike. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] How to get new messages from maildir?
I'm using the standard mailbox module to read a maildir, but it seems to be quite difficult to do some simple things. Is there any way to identify a message as new, unread, unseen or something similar? What about finding the most recent message? My aim is to write a program that will print out the From: and Subject: headers of new (or unread, or unseen, whatever I can get) messages, in chronological order. Or failing that, just print out all messages in chronological order. As far as I can tell there's no way to do the first, and to do the second you would have to use the date strings in the messages, converting them to datetimes with strptime first, although on my system there doesn't seem to be a valid strftime format for python that matches the date strings in my emails. They end like + (GMT), which I believe is %z (%Z) in strftime, but python will not accept the %z in the strftime pattern. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Serdar Tumgoren zstumgo...@gmail.com wrote Simple enough in theory, but the problem I'm hitting is where to begin with tests on data that is ALL over the place. I've just spent the last 2 days in a workshop with 30 of our company's end to end test team. These guys are professional testers, all they do is test software systems. Mostly they work on large scale systems comprising many millions of lines of code on multiple OS and physical servers/networks. It was very interesting working with them and learning more about the techniques they use. Some of these involve very esoteric (and expensive!) tools. However much of it is applicable to smaller systems. Two key principles they apply: 1) Define the System Under Test (SUT) and treat it as a black box. This involves your test boundary and working out what all the inputs are and all the outputs. Then you create a matrix mapping every set of inputs (the input vector) to every corresponding set ouf outputs (The output vector). The set of functions which maps the input to output is known as the transfor function matrix and if you can define it mathematically it becomes possible to automate a compete test cycle. Unfortunately its virtually never definable in real terms so we come to point 2... 2) Use risk based testing This means look at what is most likely to break and focus effort on those areas. Common breakage areas are poor data quality and faulty interfaces. So test the inputs and outputs thoroughly. In such a case, should I be writing test cases for *expected* inputs and then coding the the parser portion of my program to handle the myriad of possible bad data? Or given the range of possible inputs, should I simply skip testing for valid data at the parser level, and instead worry about flagging (or otherwise handling) invalid input at the database-insertion layer (and therefore write tests at that layer)? The typical way of testing inputs with ranges is to test just below the lower boundary, just above the boundary, the mid point, just below the upper boundary, just above the boundary known invalid values, wildy implausible values. Thus for an input that can accept values between 1 and 100 you would test 0,1,50,100,101, -50 and 'five' say Its not exhaustive but it covers a range of valid and invalid data points. You could also test very large data values such as 12165231862471893479073407147235801787789578917897 Which will check for buffer and overflow type problems But the point is you applyintelligence to determine the most likely forms of data error and test those values, not every possible value. Or should I not be testing data values at all, but rather the results of actions performed on that data? Data errors are a high risk area therefore should always be tested. Look to automate the testing if at all possible and write a program to generate the data sets used and ideally to generate the expected output data too - but thats hard since presumably you need the SUT to do that! It seems like these questions must be a subset of the issues in the realm of testing. Can anyone recommend a resource on the types of tests that should be applied to the various tasks and stages of the development process? A friend recommended The Art of Software Testing -- is that the type of book that covers these issues? Yes and is one of the stabndard texts. But most general software engineering texts cover testing issues. For example try Somerville, Pressman, McConell etc. suitable alternative that costs less than $100? Most of the mentioned authors have at least 1 chapter on testing. HTH, PS. It was also interesting to hear how testing has moved on from the days I spent as a tester at the beginning of my career in software engineering. In particular the challenges of Agile techniques for E2E testing and the move away from blanket testing to risk based testing, as well as the change in emphasis from try to break it - the dominant advice in my day - to check it breaks cleanly under real loads -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Nested loop of I/O tasks
Dear Python I am new to Python and having questions about its usage. Currently I have to read two .csv files INCT and INMRI which are similar to this INCT NONAME 121.57 34.71 14.81 1.35 0 0 1 Cella 129.25 100.31 27.25 1.35 1 1 1 Chiasm 130.3 98.49 26.05 1.35 1 1 1 FMagnum 114.89 144.94 -15.74 1.35 1 1 1 Iz 121.57 198.52 30.76 1.35 1 1 1 LEAM 160.53 127.6 -1.14 1.35 1 1 1 LEAM 55.2 124.66 12.32 1.35 1 1 1 LPAF 180.67 128.26 -9.05 1.35 1 1 1 LTM 77.44 124.17 15.95 1.35 1 1 1 Leye 146.77 59.17 -2.63 1.35 1 0 0 Nz 121.57 34.71 14.81 1.35 1 1 1 Reye 91.04 57.59 6.98 1.35 0 1 0 INMRI NONAME 121.57 34.71 14.81 1.35 0 0 1 Cella 129.25 100.31 27.25 1.35 1 1 1 Chiasm 130.3 98.49 26.05 1.35 1 1 1 FMagnum 114.89 144.94 -15.74 1.35 1 1 1 Iz 121.57 198.52 30.76 1.35 1 1 1 LEAM 160.53 127.6 -1.14 1.35 1 1 1 LEAM 55.2 124.66 12.32 1.35 1 1 1 LPAF 180.67 128.26 -9.05 1.35 1 1 1 LTM 77.44 124.17 15.95 1.35 1 1 1 Leye 146.77 59.17 -2.63 1.35 1 0 0 My job is to match the name on the two files and combine the first three attributes together. So far I tried to read two files. But when I tried to match the pattern using nested loop, but Python stops me after 1 iteration. Here is what I got so far. INCT = open(' *.csv') INMRI = open(' *.csv') for row in INCT: name, x, y, z, a, b, c, d = row.split(,) print aaa, for row2 in INMRI: NAME, X, Y, Z, A, B, C, D = row2.split(,) if name == NAME: print aaa The results are shown below NONAME NONAME Cella NONAME Chiasm NONAME FMagnum NONAME Inion NONAME LEAM NONAME LTM NONAME Leye NONAME Nose NONAME Nz NONAME REAM NONAME RTM NONAME Reye Cella Chiasm FMagnum Iz LEAM LEAM LPAF LTM Leye Nz Reye I was a MATLAB user and am really confused by what happens with me. I wish someone could help me with this intro problem and probably indicate a convenient way for pattern matching. Thanks! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
Serdar Tumgoren wrote: That's a good start. You're missing one requirement that I think needs to be explicit. Presumably you're requiring that the XML be well-formed. This refers to things like matching xxx and /xxx nodes, and proper use of quotes and escaping within strings. Most DOM parsers won't even give you a tree if the file isn't well-formed. I actually hadn't been checking for well-formedness on the assumption that ElementTree's parse method did that behind the scenes. Is that not correct? (I didn't see any specifics on that subject in the docs: http://docs.python.org/library/xml.etree.elementtree.html) I also would assume that ElementTree would do the check. But the point is: it's part of the spec, and needs to be explicitly handled in your list of errors: file yyy.xml was rejected because . I am not saying you need to separately test for it in your validator, but effectively it's the second test you'll be doing. (The first is: the file exists and is readable) But most importantly, you can divide the rules where you say if the data looks like the file is rejected. Versus if the data looks like , we'll pretend it's actually , and keep going. An example of that last might be what to do if somebody specifies March 35. You might just pretend March 31, and keep going. Ok, so if I'm understanding -- I should convert invalid data to sensible defaults where possible (like setting blank fields to 0); otherwise if the data is clearly invalid and the default is unknowable, I should flag the field for editing, deletion or some other type of handling. Exactly. As you said in one of your other messages, human intervention required. Then the humans may decide to modify the spec to reduce the number of cases needing human intervention. So I see the spec and the validator as a matched pair that will evolve. Note that none of this says anything about testing your code. You'll need a controlled suite of test data to help with that. The word test is heavily overloaded (and heavily underdone) in our industry. DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] the art of testing
As well as the other replies, consider that you are doing unit testing: http://en.wikipedia.org/wiki/Unit_test One method is black-box testing, which is where the thing (class, function, module) you are testing is treated as a black box, something that takes input and returns output, and how it does it are irrelevant from the tester's perspective. You remove the implementation of the thing from the test of the thing, which allows you to focus on the tests. In fact, you can write the tests before you code up the black box, which is a valid development technique that has many proponents. You throw all sorts of data at the black box, particularly invalid, boundary and corner-case data, and test that the black box returns consistent, valid output in all cases, and that it never crashes or triggers an exception (unless part of the design). Once you have run your tests and have produced a set of test failures, you have some leads on what within the black box is broken, and you take off your tester's hat, put on your developer hat, and go fix them. Repeat. A pre-built test suite makes this easier and gives consistency. You know that your input data is the same as it was before your changes and you can test for consistency of output over your development cycle so that you know you haven't inadvertently introduced new bugs. This process is regression testing. Clearly, ensuring your test data covers all possible cases is important. This is one reason for designing the test suite before building the black box. You won't have inadvertently tainted your mind-set with preconceived notions on what the data contains, since you haven't examined it yet. (You've only examined the specs to find out what it *should* contain; your black box implementation will handle everything else gracefully, returning output when it should and triggering an exception when it should.) This frees you up to create all sorts of invalid, i.e. non-specification, and boundary test data, without preconceived ideas. Once you are passing your test data, throw lots of samples of real-life data at it. If your test data was comprehensive, real-life data should be a piece of cake. Obviously I'm a fan of unit-testing. Sure, the first time they're a bit of work to build up, but you'll find you can re-use them over and over with a small amount of editing. Many conditions are the same for any program, such as (off the top of my head) file-not-found, file-has-one-record-only, file-has-a-trillion-records, string-where-int-expected, int-out-of- expected-range, and so on. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Nested loop of I/O tasks
Bo Li wrote: Dear Python I am new to Python and having questions about its usage. Currently I have to read two .csv files INCT and INMRI which are similar to this INCT NONAME 121.57 34.71 14.81 1.350 0 1 Cella 129.25 100.31 27.25 1.351 1 1 Chiasm 130.3 98.49 26.05 1.351 1 1 FMagnum 114.89 144.94 -15.74 1.351 1 1 Iz 121.57 198.52 30.76 1.351 1 1 LEAM160.53 127.6 -1.14 1.351 1 1 LEAM55.2124.66 12.32 1.351 1 1 LPAF180.67 128.26 -9.05 1.351 1 1 LTM 77.44 124.17 15.95 1.351 1 1 Leye146.77 59.17 -2.63 1.351 0 0 Nz 121.57 34.71 14.81 1.351 1 1 Reye91.04 57.59 6.981.350 1 0 INMRI NONAME 121.57 34.71 14.81 1.350 0 1 Cella 129.25 100.31 27.25 1.351 1 1 Chiasm 130.3 98.49 26.05 1.351 1 1 FMagnum 114.89 144.94 -15.74 1.351 1 1 Iz 121.57 198.52 30.76 1.351 1 1 LEAM160.53 127.6 -1.14 1.351 1 1 LEAM55.2124.66 12.32 1.351 1 1 LPAF180.67 128.26 -9.05 1.351 1 1 LTM 77.44 124.17 15.95 1.351 1 1 Leye146.77 59.17 -2.63 1.351 0 0 My job is to match the name on the two files and combine the first three attributes together. So far I tried to read two files. But when I tried to match the pattern using nested loop, but Python stops me after 1 iteration. Here is what I got so far. INCT = open(' *.csv') INMRI = open(' *.csv') for row in INCT: name, x, y, z, a, b, c, d = row.split(,) print aaa, for row2 in INMRI: NAME, X, Y, Z, A, B, C, D = row2.split(,) if name == NAME: print aaa The results are shown below NONAME NONAME Cella NONAME Chiasm NONAME FMagnum NONAME Inion NONAME LEAM NONAME LTM NONAME Leye NONAME Nose NONAME Nz NONAME REAM NONAME RTM NONAME Reye Cella Chiasm FMagnum Iz LEAM LEAM LPAF LTM Leye Nz Reye I was a MATLAB user and am really confused by what happens with me. I wish someone could help me with this intro problem and probably indicate a convenient way for pattern matching. Thanks! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor What's happening is you are iterating over the first file and on the first line on that file you start iterating over the second file. Once the second file has been completely looped through it is 'empty' so your further iterations over file 1 can't loop through file 2. If your output is going to be sorted like that so you know NONAME will be on the same line in both files what you can do is INCT = open('something.csv', 'r') INMRI = open('something_else.csv', 'r') rec_INCT = INCT.readline() rec_INMRI = INMRI.readline() while rec_INCT and rec_INMRI: name, x, y, z, a, b, c, d = rec_INCT.split(',') NAME, X, Y, Z, A, B, C, D = rec.INMRI.split(',') if name == NAME: print 'Matches' rec_INCT = INCT.readline() rec_INMRI = INMRI.readline() INCT.close() INMRI.close() What will happen is that you open the files, read the first line of each and then start with the while loop. It will only run the while as long as both the INCT and INMRI files have more lines to read, if one of them runs out then it will exit the loop. It then does the splitting, checks to see if it matches at which point you can do your further processing and after that read another line of each file. Of course if the files are not sorted then you would have to process it a little differently. If the file sizes are small you can use one of the files to build a dictionary, key being the `name` and value being the rest of your data, and then iterate over the second file checking to see if the name is in dictionary. It would also work for this scenario of perfect data as well. Hope that helps. -- Kind Regards, Christian Witts ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Nested loop of I/O tasks
On Tue, Nov 24, 2009 at 2:42 PM, Bo Li boli1...@gmail.com wrote: I am new to Python and having questions about its usage. Currently I have to read two .csv files INCT and INMRI which are similar to this: [...] I was a MATLAB user and am really confused by what happens with me. I wish someone could help me with this intro problem and probably indicate a convenient way for pattern matching. Thanks! greetings and welcome to Python! the problem you are experiencing is due to the fact that you do not read in and cache your data first. you are iterating over the data in both files once, which is what enables your first pass to work. however, on the second pass, INMRI does not return any more data because you have already exhausted all lines of the file on the first pass. if you intend on reiterating over the file, then you must read in all of the data first and just use that data structure rather than the actual file as you have. hope this helps! --wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Core Python Programming, Prentice Hall, (c)2007,2001 Python Fundamentals, Prentice Hall, (c)2009 http://corepython.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Nested loop of I/O tasks
Bo Li wrote: Dear Python I am new to Python and having questions about its usage. Currently I have to read two .csv files INCT and INMRI which are similar to this INCT NONAME 121.57 34.71 14.81 1.35 0 0 1 Cella 129.25 100.31 27.25 1.35 1 1 1 Chiasm 130.3 98.49 26.05 1.35 1 1 1 FMagnum 114.89 144.94 -15.74 1.35 1 1 1 Iz 121.57 198.52 30.76 1.35 1 1 1 LEAM 160.53 127.6 -1.14 1.35 1 1 1 LEAM 55.2 124.66 12.32 1.35 1 1 1 LPAF 180.67 128.26 -9.05 1.35 1 1 1 LTM 77.44 124.17 15.95 1.35 1 1 1 Leye 146.77 59.17 -2.63 1.35 1 0 0 Nz 121.57 34.71 14.81 1.35 1 1 1 Reye 91.04 57.59 6.98 1.35 0 1 0 INMRI NONAME 121.57 34.71 14.81 1.35 0 0 1 Cella 129.25 100.31 27.25 1.35 1 1 1 Chiasm 130.3 98.49 26.05 1.35 1 1 1 FMagnum 114.89 144.94 -15.74 1.35 1 1 1 Iz 121.57 198.52 30.76 1.35 1 1 1 LEAM 160.53 127.6 -1.14 1.35 1 1 1 LEAM 55.2 124.66 12.32 1.35 1 1 1 LPAF 180.67 128.26 -9.05 1.35 1 1 1 LTM 77.44 124.17 15.95 1.35 1 1 1 Leye 146.77 59.17 -2.63 1.35 1 0 0 My job is to match the name on the two files and combine the first three attributes together. So far I tried to read two files. But when I tried to match the pattern using nested loop, but Python stops me after 1 iteration. Here is what I got so far. INCT = open(' *.csv') INMRI = open(' *.csv') for row in INCT: name, x, y, z, a, b, c, d = row.split(,) print aaa, for row2 in INMRI: NAME, X, Y, Z, A, B, C, D = row2.split(,) if name == NAME: print aaa The results are shown below NONAME NONAME Cella NONAME Chiasm NONAME FMagnum NONAME Inion NONAME LEAM NONAME LTM NONAME Leye NONAME Nose NONAME Nz NONAME REAM NONAME RTM NONAME Reye Cella Chiasm FMagnum Iz LEAM LEAM LPAF LTM Leye Nz Reye I was a MATLAB user and am really confused by what happens with me. I wish someone could help me with this intro problem and probably indicate a convenient way for pattern matching. Thanks! I'm wondering how Christian's quote of your message was formatted so much better. Your csv contents are word-wrapped when I see your email. Did you perhaps send it using html mail, instead of text? The other thing I note (and this is the same with Christian's version of your message), is that the code you show wouldn't run, and also wouldn't produce the output you supplied, so you must have retyped it instead of copy/pasting it. That makes the job harder, for anybody trying to help. Christian's analysis of your problem was spot-on. Files can only be iterated once, and thus the inner loop will fail the second time through the outer loop. However, there are two possible fixes that are both closer to what you have, and therefore perhaps more desirable. Simplest change is to do a readlines() on the second file. This means you have to have enough memory for the whole file, stored as a list. INCT = open('file1.csv') INMRIlist = open('file2.csv').readlines() for row in INCT: name, x, y, z, a, b, c, d = row.split(,) print name, for row2 in INMRIlist: NAME, X, Y, Z, A, B, C, D = row2.split(,) print NAME, if name == NAME: print ---matched--- The other choice, somewhat slower, but saving of memory, is INCT = open('file1.csv') #INMRI = open('file2.csv') for row in INCT: name, x, y, z, a, b, c, d = row.split(,) print name, for row2 in open('file2.csv'): NAME, X, Y, Z, A, B, C, D = row2.split(,) print NAME, if name == NAME: print ---matched--- There are many other things I would change (probably eventually going to the dictionary that Christian mentioned), but these are the minimum changes to let you continue down the path you've envisioned. (all code untested, I just typed it directly into the email, assuming Python2.6) DaveA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor