date:20091124

Re: [Tutor] understanding the behavious of parameter 'key' in sort

2009-11-24 Thread Shashwat Anand

it was a bit tricky, thanks :)

On Tue, Nov 24, 2009 at 9:00 AM, Lie Ryan lie.1...@gmail.com wrote:

 Shashwat Anand wrote:

 I intended to sort a list which sorts according to user-defined custom
 sorting-order.
 For example: If sorting-order is zabc...wxy, then the output will be in
 lexicographically sorted order but z will always be given priority over rest
 others.
 as a test case i took sorting order as reverse of normal sorting, hence i
 defined user_key as string.ascii_lowercase. It should sort in reverse manner
 but I'm not getting expected output.
 (Even though it could have just been sorted as reverse=True, but my
 intention is to generalize it for this is just a test-case). I'm not able to
 find where the bug lies nor am i exactly sure how the key function works,
 even though i use it in a regular fashion. Can you guys help me out ?


 Your code is not wrong. It's your expected output (or your need) that's
 different from a typical definition of lexicographical sorting. In a
 typical lexicographical sorting a comes before ab since a is shorter
 than ab.


 So, if you want this:

  expected output: ['cba', 'cab', 'abc', 'ab', 'aa', 'a']

 you must use a custom cmp= argument to reverse the shorter substring case:


 like this:

 import string

 def my_cmp(s1, s2):
if s1.startswith(s2):
return -1
elif s2.startswith(s1):
return 1
else:
return cmp(s1, s2)


 def userdef_sort(l, user_key):
table = string.maketrans(.join(sorted(user_key)), user_key)
trans = lambda x: x.translate(table)
return sorted(l, cmp=my_cmp, key=trans)


 #user_key = raw_input()
 user_key = string.ascii_lowercase[::-1]
 l = ['a', 'aa', 'ab', 'abc', 'cba', 'cab']

 print userdef_sort(l, user_key)

 ___
 Tutor maillist  -  Tutor@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Alternatives to get IP address of a computer : which one should I use ?

2009-11-24 Thread Shashwat Anand

I was going through a python scrip woof (
http://www.home.unix-ag.org/simon/woof ) and there was a portion of the code
dedicated to get IP address

def find_ip ():
   if sys.platform == cygwin:
  ipcfg = os.popen(ipconfig).readlines()
  for l in ipcfg:
 try:
candidat = l.split(:)[1].strip()
if candidat[0].isdigit():
   break
 except:
pass
  return candidat

   os.environ[PATH] = /sbin:/usr/sbin:/usr/local/sbin: + os.environ[PATH]
   platform = os.uname()[0];

   if platform == Linux:
  netstat = commands.getoutput (LC_MESSAGES=C netstat -rn)
  defiface = [i.split ()[-1] for i in netstat.split ('\n')
if i.split ()[0] == 0.0.0.0]
   elif platform in (Darwin, FreeBSD, NetBSD):
  netstat = commands.getoutput (LC_MESSAGES=C netstat -rn)
  defiface = [i.split ()[-1] for i in netstat.split ('\n')
if len(i)  2 and i.split ()[0] ==
default]
   elif platform == SunOS:
  netstat = commands.getoutput (LC_MESSAGES=C netstat -arn)
  defiface = [i.split ()[-1] for i in netstat.split ('\n')
if len(i)  2 and i.split ()[0] ==
0.0.0.0]
   else:
  print sys.stderr, Unsupported platform; please add support
for your platform in find_ip().;
  return None

   if not defiface:
  return None

   if platform == Linux:
  ifcfg = commands.getoutput (LC_MESSAGES=C ifconfig 
  + defiface[0]).split (inet addr:)
   elif platform in (Darwin, FreeBSD, SunOS, NetBSD):
  ifcfg = commands.getoutput (LC_MESSAGES=C ifconfig 
  + defiface[0]).split (inet )

   if len (ifcfg) != 2:
  return None
   ip_addr = ifcfg[1].split ()[0]

   # sanity check
   try:
  ints = [ i for i in ip_addr.split (.) if 0 = int(i) = 255]
  if len (ints) != 4:
 return None
   except ValueError:
  return None

   return ip_addr


It gets OS name, run netstat -rn, gets the interface name via it ('en'
in my case i.e. Darwin, and then run ifconfig and split it via 'inet '
and gets the IP and do a check. Nice !!
However if I want to get my IP I can get it via:
socket.gethostbyname(socket.gethostname())

I want to know why the above approach is followed, is it so because of
a check via network ?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Sorting Data in Databases

2009-11-24 Thread Wayne Werner

On Mon, Nov 23, 2009 at 11:26 PM, Ken G. beach...@insightbb.com wrote:

 I am getting more and more surprised of what Python can do.  Very
 comprehensive.


It's a much safer bet to assume that Python can do (
http://xkcd.com/353/ )http://xkcd.com/353/ anything
( http://xkcd.com/413/ ).

You just have to install the right libraries ;)

-Wayne


-- 
To be considered stupid and to be told so is more painful than being called
gluttonous, mendacious, violent, lascivious, lazy, cowardly: every weakness,
every vice, has found its defenders, its rhetoric, its ennoblement and
exaltation, but stupidity hasn’t. - Primo Levi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] value of 'e'

2009-11-24 Thread Shashwat Anand

Followed by this discussion on Hacker
Newshttp://news.ycombinator.com/item?id=958323I checked this
link http://www.isotf.org/?page_value=13223 and was wondering how to
calculate value of 'e' http://mathworld.wolfram.com/e.html to a large
extent

as e = 1/0! + 1/1! +1/2!  and so on...
so i wrote this:
 sum(1.0 / math.factorial(i) for i in range(100))
2.7182818284590455

It was not giving the precision that I wanted so I tried decimal module of
which I was not much aware of.

 decimal.getcontext().prec = 100
 sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in
range(100))
Decimal('2.718281828459409995603699925637255290043107782360218523330012825771122202286299367023903783933889309')

Until now no problem
I matched the value of 'e' from
herehttp://dl.dropbox.com/u/59605/ten_million_e.txt
which claims it have 10 million digits of e
the first few digits of 'e' from there which doesn't match with the result I
got:
2.71828182845904523536028747135266249775724709369995957496696762772407

so i tried,
 sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in
range(1000))
Traceback (most recent call last):
  File input, line 1, in module
  File input, line 2, in genexpr
OverflowError: long int too large to convert to float

And then i went clueless !!
How can it be done ?
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] value of 'e'

2009-11-24 Thread Wayne Werner

On Tue, Nov 24, 2009 at 5:47 AM, Shashwat Anand anand.shash...@gmail.comwrote:


 And then i went clueless !!
 How can it be done ?


Well, upon inspection it seems that The math module consists mostly of thin
wrappers around the platform C math library functions - one would presume
those are accurate, but I don't know to how many places. You might try
writing your own factorial function that works with the decimal type and
compare with the result you get from using the math library.

If you find a discrepancy I'm sure there are places to file a bug report.

HTH,
Wayne

(Of course it's also possible that your source who claims to have the
correct digits is faulty! See if you can verify by other sources)
-- 
To be considered stupid and to be told so is more painful than being called
gluttonous, mendacious, violent, lascivious, lazy, cowardly: every weakness,
every vice, has found its defenders, its rhetoric, its ennoblement and
exaltation, but stupidity hasn’t. - Primo Levi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Is pydoc the right API docs?

2009-11-24 Thread Nick

I'm not sure I'm using pydoc correctly.  I only seem to get abbreviated 
help rather than full documentation.  It happens often enough that I think 
I'm doing something wrong.

For example, I want to upgrade my scripts to use .format() from using %.

   $ pydoc format
   Help on built-in function format in module __builtin__:

   format(...)
   format(value[, format_spec]) - string

   Returns value.__format__(format_spec)
   format_spec defaults to 

Well, that just tells me that there is an entity called format_spec.  I 
want to know what format_spec actually IS so I can use it.  I try:

   $ pydoc -k format_spec
   $

Nothing.

I found the answer using google, but that won't work if I'm offline.  Am I 
using pydoc correctly?  Or is there a more complete language spec?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] value of 'e'

2009-11-24 Thread Lie Ryan


Shashwat Anand wrote:

How can it be done ?


 import decimal, math
 D = decimal.Decimal
 decimal.getcontext().prec = 100
 sum(D(1) / D(math.factorial(i)) for i in range(1000))
Decimal('2.718281828459045235360287471352662497757247093699959574966967627724076
630353547594571382178525166428')

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] value of 'e'

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 6:47 AM, Shashwat Anand
anand.shash...@gmail.com wrote:
 Followed by this discussion on Hacker News I checked this link and was
 wondering how to calculate value of 'e' to a large extent

 as e = 1/0! + 1/1! +1/2!  and so on...
 so i wrote this:
 sum(1.0 / math.factorial(i) for i in range(100))
 2.7182818284590455

 It was not giving the precision that I wanted so I tried decimal module of
 which I was not much aware of.

 decimal.getcontext().prec = 100
 sum(decimal.Decimal(str(1./math.factorial(decimal.Decimal(i for i in
 range(100))

You are using floating point division here. The argument to
math.factorial() is an integer, so the conversion of i to Decimal is
not doing anything - it is converted back to an integer. Then you
compute 1./some large integer. This will use floating point math and
will fail when the factorial is too large to represent in floating
point.

You should convert the result of factorial() to Decimal and compute
Decimal(1.0)/Decimal(factorial). This should give you additional
precision as well.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Is pydoc the right API docs?

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 5:06 AM, Nick
reply_to_is_va...@nowhere.com.invalid wrote:
 I'm not sure I'm using pydoc correctly.  I only seem to get abbreviated
 help rather than full documentation.  It happens often enough that I think
 I'm doing something wrong.

 I found the answer using google, but that won't work if I'm offline.  Am I
 using pydoc correctly?  Or is there a more complete language spec?

pydoc just shows the docstrings. It does not include the full text of
the documentation. For that see
http://python.org/doc/

You can download the docs for offline use, see the above link.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] value of 'e'

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 7:01 AM, Wayne Werner waynejwer...@gmail.com wrote:

 Well, upon inspection it seems that The math module consists mostly of thin
 wrappers around the platform C math library functions - one would presume
 those are accurate, but I don't know to how many places. You might try
 writing your own factorial function that works with the decimal type and
 compare with the result you get from using the math library.

I don't think there is anything wrong with math.factorial(). The
problem is that he is using floating-point (limited precision)
division.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] value of 'e'

2009-11-24 Thread Lie Ryan


Wayne Werner wrote:

You
might try writing your own factorial function that works with the 
decimal type and compare with the result you get from using the math 
library.


There is no need for that, math.factorial will use python int/long 
object instead of the platform's integer as necessary.



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Class understanding

2009-11-24 Thread Joe Bennett

Hi all... Have been attempting to understand classes... Been getting
along without them for a while now and feel it's time to jump in

What I want to do it start a log with the logging module... I have
this working without classes, but want to try... Here is a snippet of
the code that I am hacking on:


class logger():





import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s -
%(name)-12s - %(levelname)-8s - %(message)s',
#format='%(asctime)s
%(levelname)s %(message)s',
filename='T2Notify.log',
filemode='a')
logging_output =
logging.getLogger('logging_output.core')
print Log set up

def write2log(log_info):

logging_output.info(log started)
print written to log
return()


logger()
logger.write2log(log_info)



What I want to do it be able to set up the log, but have something
outside the class be able to write log updates to write2log under the
logger class... the logger.write2log() is not working :)... Any ideas,
encouragement, or pointers to good docs would be helpful... I've done
a lot of searching via Google on classes, and it's all confusing to
me...




-Joe
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Is pydoc the right API docs?

2009-11-24 Thread Rich Lovely

11/24 Kent Johnson ken...@tds.net:
 On Tue, Nov 24, 2009 at 5:06 AM, Nick
 reply_to_is_va...@nowhere.com.invalid wrote:
 I'm not sure I'm using pydoc correctly.  I only seem to get abbreviated
 help rather than full documentation.  It happens often enough that I think
 I'm doing something wrong.

 I found the answer using google, but that won't work if I'm offline.  Am I
 using pydoc correctly?  Or is there a more complete language spec?

 pydoc just shows the docstrings. It does not include the full text of
 the documentation. For that see
 http://python.org/doc/

 You can download the docs for offline use, see the above link.

 Kent
 ___
 Tutor maillist  -  tu...@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor


Although, iirc the online docs are generated by pydoc.

It tells you that it calls value.__format__, so try $pydoc some_value.__format__

For example:

$pydoc str.__format__

I don't have python installed here, so can't check it, but it might
give you some more information.   The same will happen if you try
looking at the pydocs for, for example, str or repr:  they are
wrappers for magic methods on the object it is called with.

-- 
Rich Roadie Rich Lovely

There are 10 types of people in the world: those who know binary,
those who do not, and those who are off by one.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Is pydoc the right API docs?

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 11:43 AM, Rich Lovely roadier...@googlemail.com wrote:
 11/24 Kent Johnson ken...@tds.net:

 pydoc just shows the docstrings. It does not include the full text of
 the documentation. For that see
 http://python.org/doc/

 Although, iirc the online docs are generated by pydoc.

No, they are created with Sphinx from reStructuredText source. Click
the Show Source link in the sidebar of any docs page to see.

The online docs do include much of the same text as the doc strings
but they are separately written and generated.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Sorting Data in Databases

2009-11-24 Thread Che M



 That is a surprise to me.  I did not know that Python would work with
SQLite.   

Sure, as someone else said, Python comes with a LOT of libraries built right
in when you download Python.  This is known as batteries included, that is,
what comes with the standard distribution of Python.  

 I will look into Alan's tutorial on DB. 

It is of course thorough and will really provide understanding.  But just to
emphasize how simple creating a SQLite database in Python is, I recommend
you do these 4 simple steps:

1. Download and install the very nice SQLite Database Browser application, 
from here:
http://sourceforge.net/projects/sqlitebrowser/

It's simple and good.  

2. Now open IDLE (which also comes with Python), do File  New Window,
and paste this simple Python code into that window:

#--
#get SQLite into Python...it's that simple!
import sqlite3

#Make a connection to a database...if it doesn't exist yet, we'll create it.
conn = sqlite3.connect('my_database.db')   

#Create a cursor, a kind of pen that writes into the database.
cur = conn.cursor()

#Write a table, called here MyTable, into the database, and give it two fields,
# name and address.
cur.execute('''CREATE TABLE if not exists MyTable (name, address)''')

#Now actually write some data into the table you made:
cur.execute('INSERT INTO MyTable VALUES(?,?)',('John','Chicago'))

#Always have to commit any changes--or they don't stick!
conn.commit()

#You're done!
#--

Without the comments, (which explain a bit about why it is written
as it is) this is just this small an amount of Python code--6 lines:

import sqlite3

conn = sqlite3.connect('my_database.db')   

cur = conn.cursor()

cur.execute('''CREATE TABLE if not exists MyTable (name, address)''')

cur.execute('INSERT INTO MyTable VALUES(?,?)',('John','Chicago'))

conn.commit()


3. Run your program in IDLE (Run  Run Module...or just hit F5).  Save
it to your Desktop.

4. Now view your handiwork in the SQLite Database Browser.  Open
it and then do File  Open Database, then find a file on your Desktop
called mydatabase.db.  Open it.  Now you are looking at the database
you just made.  Click on the Browse Data tab and you are now seeing
that John lives in Chicago.

It's that simple to at least get started.  Thanks, Python.

Che










 I am getting more
and more surprised of what Python can do.  Very comprehensive.   Thanks
all.




Ken 
  
_
Bing brings you maps, menus, and reviews organized in one place.
http://www.bing.com/search?q=restaurantsform=MFESRPpubl=WLHMTAGcrea=TEXT_MFESRP_Local_MapsMenu_Resturants_1x1___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Class understanding

2009-11-24 Thread Che M

 Date: Tue, 24 Nov 2009 10:27:05 -0600
 From: jammer10...@gmail.com
 To: tutor@python.org
 Subject: [Tutor] Class understanding

 Hi all... Have been attempting to understand classes... Been getting
 along without them for a while now and feel it's time to jump in

 What I want to do it start a log with the logging module... I have
 this working without classes, but want to try... Here is a snippet of
 the code that I am hacking on:

I'm sure the better explainers will jump in presently, but let me try
a few tips...

 class logger():

The convention in Python is to make class names capitalized.  It is
not necessary, but it is a good habit to get into, so class Logger().

 import logging

Imports are traditionally done at the top of a Python file, not within
a class. 

 logger()

This calls the class but doesn't create a name for an instance of
the class, so you won't be able to access it later.  Instead, try
(assuming you rename logger() to Logger() ),

logger_instance = Logger()

Now you have a name for that instance of the class, and so
can access the goodies inside the class.  

 logger.write2log(log_info)

So that would now be:

logger_instance.write2log(log_info)

 encouragement, or pointers to good docs would be helpful... I've done
 a lot of searching via Google on classes, and it's all confusing to
 me...

Keep trying.  There have to be tons of good tutorials on classes.
They fall under the heading of Object Oriented Programming.   I tend
to think of a class as a container that has all the stuff you will need
to do a certain set of actions.  It can contain data (facts) and it can 
contain methods (functions).  You can create one or more instances
of any class (a traditional example being that Dog() is a class whereas
fluffy is an instance of a dog, and therefore has all the traditional dog
methods, like bark(), wag(), etc.)

CM

_
Windows 7: It works the way you want. Learn more.
http://www.microsoft.com/Windows/windows-7/default.aspx?ocid=PID24727::T:WLMTAGL:ON:WL:en-US:WWL_WIN_evergreen:112009v2___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] the art of testing

2009-11-24 Thread Serdar Tumgoren

Hi everyone,
The list recently discussed the virtues of unit testing, and I was
hoping someone could offer some high-level advice and further
resources as I try to apply the TDD methodology.

I'm trying to develop an application that regularly downloads some
government data (in XML), parses the data and then updates a database.
Simple enough in theory, but the problem I'm hitting is where to begin
with tests on data that is ALL over the place.

The agency in question performs little data validation, so a given
field can have a huge range of possible inputs (e.g. - a Boolean field
should be 0 or 1, but might be blank, have a negative number or even
strings like the word 'None').

In such a case, should I be writing test cases for *expected* inputs
and then coding the the parser portion of my program to handle the
myriad of possible bad data?

Or given the range of possible inputs, should I simply skip testing
for valid data at the parser level, and instead worry about flagging
(or otherwise handling) invalid input at the database-insertion layer
(and therefore write tests at that layer)?

Or should I not be testing data values at all, but rather the results
of actions performed on that data?

It seems like these questions must be a subset of the issues in the
realm of testing. Can anyone recommend a resource on the types of
tests that should be applied to the various tasks and stages of the
development process?

A friend recommended The Art of Software Testing -- is that the type
of book that covers these issues? If so, can anyone recommend a
suitable alternative that costs less than $100?

As always, I appreciate the advice.

Regards,
Serdar
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] (no subject)

2009-11-24 Thread Alan Gauld


OkaMthembo zebr...@gmail.com wrote

When i started off i had pretty much the same questions. I think you need 
to

start with the Python tutorial as it will show you the basics


Unfortunately it won't if the OP is a complete beginner - and from his 
email it

sounds like he is. The standard tutorial assumes quite a lot of knowledge
about programming, assuming you know at least one other language.

Thats why there are several absolute beginners tutorials - because for
many python programmers it is their first exposure and the standard 
tutorial

is not ideal for them.

OTOH, If you have ever done any programming before then the standard
tutorial is excellent.


keywords and how to define and use functions, classes, modules etc).


This is a good example. The standard tutorial assumes readers know
what a function is and why you'd want to use one.
The section on classes starts with a fauirly detailed description of
namespaces and scopes and the fine differences between
them - completely meaningless to a complete beginner.

And of course it doesn't describe IDLE - which is what the OP says
he has available.

--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 2:02 PM, Serdar Tumgoren zstumgo...@gmail.com wrote:
 Hi everyone,
 The list recently discussed the virtues of unit testing, and I was
 hoping someone could offer some high-level advice and further
 resources as I try to apply the TDD methodology.

 I'm trying to develop an application that regularly downloads some
 government data (in XML), parses the data and then updates a database.
 Simple enough in theory, but the problem I'm hitting is where to begin
 with tests on data that is ALL over the place.

 The agency in question performs little data validation, so a given
 field can have a huge range of possible inputs (e.g. - a Boolean field
 should be 0 or 1, but might be blank, have a negative number or even
 strings like the word 'None').

 In such a case, should I be writing test cases for *expected* inputs
 and then coding the the parser portion of my program to handle the
 myriad of possible bad data?

Yes. The parser needs to handle the bad data in some appropriate way,
unless you are filtering out the bad data before it reaches the
parser. The tests should cover the range of expected inputs, both good
and bad data. If you want to test the parser, you should write tests
that ensure that it behaves appropriately for the full range of
expected data. So your tests should include the full range of good
data and some sample bad data.

The book Pragmatic Unit Testing has a lot of guidelines about what
to test. The examples are in Java (or C#) but JUnit and Python's
unittest are pretty similar and the ideas certainly apply to any
language.

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Lie Ryan


Serdar Tumgoren wrote:

Hi everyone,
The list recently discussed the virtues of unit testing, and I was
hoping someone could offer some high-level advice and further
resources as I try to apply the TDD methodology.


TDD is different from data validation. TDD ensures program correctness. 
Data validation ensures input correctness.



In such a case, should I be writing test cases for *expected* inputs
and then coding the the parser portion of my program to handle the
myriad of possible bad data?


Yes, the parser should handle all bad data and respond in appropriate 
manner (raise an error or flag for manual check by programmer). Input 
should be sanitized as early as possible.


If you want to apply TDD here; you will be checking that the parser 
correctly normalize all bad data into the proper form (e.g. all 0, 
None, False, empty/empty in the boolean field is properly normalized 
to False (I assume there is no difference between each different 
representation of False?)).


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Serdar Tumgoren

Lie and Kent,

Thanks for the quick replies.

I've started writing some requirements, and combined with your
advice, am starting to feel a bit more confident on how to approach
this project.

Below is an excerpt of my requirements -- basically what I've learned
from reviewing the raw data using ElementTree at the command line.

Are these the types of requirements that are appropriate for the
problem at hand? Or am I not quite hitting the mark for the data
validation angle?

I figured once I write down these low-level rules about my input, I
can start coding up the test cases...Is that correct?

 requirements snippet

Root node of every XML file is PublicFiling
Every PublicFiling node must contain at least one Filing node
Every Filing must contain 'Type' attribute
Every Filing must contain 'Year' attribute, etc.
Filing node must be either a Registration or activity Report
Filing is a Registration when 'Type' attribute equals 'Registration'
or 'Registration Amendment'
Registration must not have an 'Amount' attribute
Registration must not have an 'is_state_or_local_attrib'

 end requirements
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Class understanding

2009-11-24 Thread Dave Angel

Che M wrote:

Date: Tue, 24 Nov 2009 10:27:05 -0600
From: jammer10...@gmail.com
To: tutor@python.org
Subject: [Tutor] Class understanding

Hi all... Have been attempting to understand classes... Been getting
along without them for a while now and feel it's time to jump in

What I want to do it start a log with the logging module... I have
this working without classes, but want to try... Here is a snippet of
the code that I am hacking on:

I'm sure the better explainers will jump in presently, but let me try
a few tips...

class logger():

The convention in Python is to make class names capitalized.  It is
not necessary, but it is a good habit to get into, so class Logger().

import logging

Imports are traditionally done at the top of a Python file, not within
a class. 

logger()

This calls the class but doesn't create a name for an instance of
the class, so you won't be able to access it later.  Instead, try
(assuming you rename logger() to Logger() ),

logger_instance = Logger()

Now you have a name for that instance of the class, and so
can access the goodies inside the class.  

logger.write2log(log_info)

So that would now be:

logger_instance.write2log(log_info)

encouragement, or pointers to good docs would be helpful... I've done
a lot of searching via Google on classes, and it's all confusing to
me...

Keep trying.  There have to be tons of good tutorials on classes.
They fall under the heading of Object Oriented Programming.   I tend
to think of a class as a container that has all the stuff you will need
to do a certain set of actions.  It can contain data (facts) and it can 
contain methods (functions).  You can create one or more instances

of any class (a traditional example being that Dog() is a class whereas
fluffy is an instance of a dog, and therefore has all the traditional dog
methods, like bark(), wag(), etc.)

CM

For my first class, I'd have picked something self-contained, and 
probably something dumb  simple, so as not to be confused between the 
stuff in the imports and the problems in understanding how class 
instances, methods, and attributes work.  Anyway, you probably 
understand the logging module better than I;  you certainly couldn't 
understand less.

Also, probably because you used tabs, the current code is heavily 
indented, and pretty hard to follow.  The def line is indented about 26 
columns, where I'd expect four.

CM has pointed out several important things. 

In addition, I need to point out that you need a self parameter on 
your method(s). 

And that if you use the same name for the argument as you used in the 
parameter, you can get confused as to who is doing what. 

Also, you want to derive new classes from object, for reasons that 
probably won't matter now, but when they do, it's easier if you've 
already got the habit. 

And finally I don't think you were planning to return an empty tuple.  
Probably you used syntax from other languages.  In Python, to return 
nothing, use one of three forms:  1) fall off the end of the 
function/method  2) return with no argument 3) return None

So your code would become:

import logging

class Logger:
   ... some initialization logic, which I don't know about...
   def write2log(self, log_msg):
   print writing to log, log_msg
   ... some logging stuff...
   return

inst = Logger()
log_info = This is first msg
inst.write2log(log_info)

I'm not sure why this is a class, unless you want to be able to have 
multiple loggers (instances of Logger).  And in that case, you 
presumably would need another method, the Python constructor, which is 
called __init__()

DaveA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Difficulty with csv files - line breaks

2009-11-24 Thread Tim Goddard

What I'm trying to do is store a bunch of information into a .csv
file.  Each row will contain a date, webpage, etc of a job
application.

My difficulty is that it seems something I am doing is not recording
the line breaks.  I've read that \r\n are default in the csv module
but so far I can not seem to use them successfully..So every time I
enter a new list of data it just gets appended to the end of the last
row in the .csv file.

It should read:
date, company, title, site, site url, ref_ID
date, company, title, site, site url, ref_ID

but instead it does this:
date, company, title, site, site url, ref_IDdate,
company, title, site, site url, ref_ID and so forth

Here's the segment of code responsible for my troubles.

import csv

date = raw_input(Enter the date applied: )
company = raw_input(Enter the company applied to: )
job_title = raw_input(Enter the job title: )
site = raw_input(Enter the website used: )
site_url = raw_input(Paste the URL here: )
ref_ID = raw_input(Enter the reference ID: )
entry_list = [date, company, job_title, site, site_url, ref_ID]
print Are you sure you want to add\n,
for entry in entry_list:
print entry
print to the file?
answer = yes_and_no()
if answer == y:
append_file(entry_list,filename)

def append_file(list,filename):
text_file = open(filename, a)
writer = csv.writer(text_file, quoting=csv.QUOTE_NONNUMERIC)
writer.writerow(list)
text_file.close()
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Kent Johnson

On Tue, Nov 24, 2009 at 3:41 PM, Serdar Tumgoren zstumgo...@gmail.com wrote:

 I've started writing some requirements, and combined with your
 advice, am starting to feel a bit more confident on how to approach
 this project.

 Below is an excerpt of my requirements -- basically what I've learned
 from reviewing the raw data using ElementTree at the command line.

 Are these the types of requirements that are appropriate for the
 problem at hand? Or am I not quite hitting the mark for the data
 validation angle?

I'm not really sure where you are going with this? This looks like a
data specification, but you said the data is poorly specified and not
under your control. So is this a specification of a data validator?

 I figured once I write down these low-level rules about my input, I
 can start coding up the test cases...Is that correct?

Yes...but I'm not really clear what it is you want to test. What does
your code do? What if a Filing does not have a 'Type' attribute?

Kent

  requirements snippet

 Root node of every XML file is PublicFiling
 Every PublicFiling node must contain at least one Filing node
 Every Filing must contain 'Type' attribute
 Every Filing must contain 'Year' attribute, etc.
 Filing node must be either a Registration or activity Report
 Filing is a Registration when 'Type' attribute equals 'Registration'
 or 'Registration Amendment'
 Registration must not have an 'Amount' attribute
 Registration must not have an 'is_state_or_local_attrib'

  end requirements
 ___
 Tutor maillist  -  tu...@python.org
 To unsubscribe or change subscription options:
 http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Dave Angel


Serdar Tumgoren wrote:

Lie and Kent,

Thanks for the quick replies.

I've started writing some requirements, and combined with your
advice, am starting to feel a bit more confident on how to approach
this project.

Below is an excerpt of my requirements -- basically what I've learned
from reviewing the raw data using ElementTree at the command line.

Are these the types of requirements that are appropriate for the
problem at hand? Or am I not quite hitting the mark for the data
validation angle?

I figured once I write down these low-level rules about my input, I
can start coding up the test cases...Is that correct?

 requirements snippet

Root node of every XML file is PublicFiling
Every PublicFiling node must contain at least one Filing node
Every Filing must contain 'Type' attribute
Every Filing must contain 'Year' attribute, etc.
Filing node must be either a Registration or activity Report
Filing is a Registration when 'Type' attribute equals 'Registration'
or 'Registration Amendment'
Registration must not have an 'Amount' attribute
Registration must not have an 'is_state_or_local_attrib'

 end requirements

  
That's a good start.  You're missing one requirement that I think needs 
to be explicit.  Presumably you're requiring that the XML be 
well-formed.  This refers to things like matching xxx  and /xxx 
nodes, and proper use of quotes and escaping within strings.  Most DOM 
parsers won't even give you a tree if the file isn't well-formed.


In addition, you want to state just how flexible each field is.  You 
mentioned booleans could be 0, 1, blank, ...  You might want ranges on 
numerics, or validation on specific fields such as Year, month, day, 
where if the month is 2, the day cannot be 30.


But most importantly, you can divide the rules where you say if the 
data looks like  the file is rejected.   Versus if the data looks 
like , we'll pretend it's actually , and keep going.  An example 
of that last might be what to do if somebody specifies March 35.  You 
might just pretend March 31, and keep going.


Don't forget that errors and warnings for the input data need to be 
output in a parseable form, at least if you expect more than one or two 
cases per run.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Serdar Tumgoren

 I'm not really sure where you are going with this? This looks like a
 data specification, but you said the data is poorly specified and not
 under your control. So is this a specification of a data validator?

The short answer -- yes, these are specs for a data validator.

And I should have been more specific about my problem domain. I'm
cobbling together a specification using various government manuals and
a *very* limited data definition.

For instance, the agency states that a lobbyist's status must be
either active (0), terminated (1) or administratively terminated (2)
or undetermined (3). So I know the expected inputs for that field.
However, the agency does not validate that data and it's possible for
that field to be blank or even contain gobbledygook strings such as a
'Enter Lobbyist Status' (residue from software built atop the
agency's automated filing service).

In other cases, based on working with the raw data, I've ascertained
that every Filing has at least a unique ID, and seems to have a Year,
etc. So it's a mish-mash of pre-defined specs (as best as I can
ascertain from the government), and patterns I'm discerning in the
data.


 Yes...but I'm not really clear what it is you want to test. What does
 your code do? What if a Filing does not have a 'Type' attribute?

At this stage, I'm just trying to perform some basic validation on the input.

'Type' is an attribute that I'd expect every filing to contain (and if
it does not, my code would have to log the record for human review).
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Serdar Tumgoren

 That's a good start.  You're missing one requirement that I think needs to
 be explicit.  Presumably you're requiring that the XML be well-formed.  This
 refers to things like matching xxx  and /xxx nodes, and proper use of
 quotes and escaping within strings.  Most DOM parsers won't even give you a
 tree if the file isn't well-formed.

I actually hadn't been checking for well-formedness on the assumption
that ElementTree's parse method did that behind the scenes. Is that
not correct?

(I didn't see any specifics on that subject in the docs:
http://docs.python.org/library/xml.etree.elementtree.html)

 But most importantly, you can divide the rules where you say if the data
 looks like  the file is rejected.   Versus if the data looks like
 , we'll pretend it's actually , and keep going.  An example of that
 last might be what to do if somebody specifies March 35.  You might just
 pretend March 31, and keep going.

Ok, so if I'm understanding -- I should convert invalid data to
sensible defaults where possible (like setting blank fields to 0);
otherwise if the data is clearly invalid and the default is
unknowable, I should flag the field for editing, deletion or some
other type of handling.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Difficulty with csv files - line breaks

2009-11-24 Thread Albert Sweigart

Tim,

I've checked your code and it seems to work as far as using newlines
for the line terminator. The default line terminator is \r\n, which
might not show up correctly in some text editors.

Otherwise, try checking to see if you've specified a blank line for
the line terminator. You can set it explicitly when you create your
csv.writer:

writer = csv.writer(text_file, quoting=csv.QUOTE_NONNUMERIC,
lineterminator='\r\n')


-Al
You should check out my free beginner's Python book here:
http://inventwithpython.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread spir

Serdar Tumgoren zstumgo...@gmail.com wrote:

 Lie and Kent,
 
 Thanks for the quick replies.
 
 I've started writing some requirements, and combined with your
 advice, am starting to feel a bit more confident on how to approach
 this project.
 
 Below is an excerpt of my requirements -- basically what I've learned
 from reviewing the raw data using ElementTree at the command line.
 
 Are these the types of requirements that are appropriate for the
 problem at hand? Or am I not quite hitting the mark for the data
 validation angle?
 
 I figured once I write down these low-level rules about my input, I
 can start coding up the test cases...Is that correct?
 
  requirements snippet
 
 Root node of every XML file is PublicFiling
 Every PublicFiling node must contain at least one Filing node
 Every Filing must contain 'Type' attribute
 Every Filing must contain 'Year' attribute, etc.
 Filing node must be either a Registration or activity Report
 Filing is a Registration when 'Type' attribute equals 'Registration'
 or 'Registration Amendment'
 Registration must not have an 'Amount' attribute
 Registration must not have an 'is_state_or_local_attrib'
 
  end requirements

This is a semantic schema (see wikipedia), meaning the specification of data 
structures describing something meaningfully.
It seems the major part of your program's task is checking correctness of 
parsed data (semantic validation).
Then the specification of your program should be the description of what it is 
supposed to do when processing valid and (most importantly) invalid data of all 
sorts. From this, you can directly write tests: in a sense, tests are a 
rewriting of a program's specification (*).

Denis

(*) The reason why, when a program is fully specified, one can write tests 
before starting coding. But afaik this works only for trivial apps, or inside a 
limited domain we know well. For we usually discover the app better as we 
develop it, which in turn changes its definition, often even dramatically.


la vita e estrany

http://spir.wikidot.com/


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Alternatives to get IP address of a computer : which one should I use ?

2009-11-24 Thread Eike Welk

On Tuesday 24 November 2009, Shashwat Anand wrote:
On my openSuse 11.0 machine your method doesn't work as intended:


e...@lixie:~ python
Python 2.5.2 (r252:60911, Dec  1 2008, 18:10:01)
[GCC 4.3.1 20080507 (prerelease) [gcc-4_3-branch revision 135036]] on 
linux2
Type help, copyright, credits or license for more information.
 import socket
 socket.gethostbyname(socket.gethostname())
'127.0.0.2'



It's a valid IP of my computer, but not the one you wanted. You could 
have written this one from memory (well I would have written 
127.0.0.1, which is also valid).  

Parsing the output from ifconfig would work for my computer. 


Kind regards,
Eike.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] How to get new messages from maildir?

2009-11-24 Thread chombee

I'm using the standard mailbox module to read a maildir, but it seems to 
be quite difficult to do some simple things. Is there any way to 
identify a message as new, unread, unseen or something similar? What 
about finding the most recent message?

My aim is to write a program that will print out the From: and Subject: 
headers of new (or unread, or unseen, whatever I can get) messages, in 
chronological order. Or failing that, just print out all messages in 
chronological order.

As far as I can tell there's no way to do the first, and to do the 
second you would have to use the date strings in the messages, 
converting them to datetimes with strptime first, although on my system 
there doesn't seem to be a valid strftime format for python that matches 
the date strings in my emails. They end like + (GMT), which I 
believe is %z (%Z) in strftime, but python will not accept the %z in 
the strftime pattern.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Alan Gauld

Serdar Tumgoren zstumgo...@gmail.com wrote 


Simple enough in theory, but the problem I'm hitting is where to begin
with tests on data that is ALL over the place.


I've just spent the last 2 days in a workshop with 30 of our company's 
end to end test team. These guys are professional testers, all they do 
is test software systems. Mostly they work on large scale systems 
comprising many millions of lines of code on multiple OS and physical 
servers/networks. It was very interesting working with them and learning 
more about the techniques they use. Some of these involve very esoteric
(and expensive!) tools. However much of it is applicable to smaller 
systems.


Two key principles they apply:

1) Define the System Under Test (SUT) and treat it as a black box.
This involves your test boundary and working out what all the inputs 
are and all the outputs. Then you create a matrix mapping every set 
of inputs (the input vector) to every corresponding set ouf outputs
(The output vector). The set of functions which maps the input to 
output is known as the transfor function  matrix and if you can define 
it mathematically it becomes possible to automate a compete test 
cycle. Unfortunately its virtually never definable in real terms so we 
come to point 2...


2) Use risk based testing
This means look at what is most likely to break and focus effort on those 
areas. Common breakage areas are poor data quality and faulty 
interfaces. So test the inputs and outputs thoroughly.



In such a case, should I be writing test cases for *expected* inputs
and then coding the the parser portion of my program to handle the
myriad of possible bad data?

Or given the range of possible inputs, should I simply skip testing
for valid data at the parser level, and instead worry about flagging
(or otherwise handling) invalid input at the database-insertion layer
(and therefore write tests at that layer)?


The typical way of testing inputs with ranges is to test
just below the lower boundary, just above the boundary, the mid point, 
just below the upper boundary, just above the boundary known invalid 
values, wildy implausible values.


Thus for an input that can accept values between 1 and 100 you 
would test 0,1,50,100,101, -50 and 'five' say


Its not exhaustive but it covers a range of valid and invalid data points.
You could also test very large data values such as
12165231862471893479073407147235801787789578917897
Which will check for buffer and overflow type problems

But the point is you applyintelligence to determine the most likely 
forms of data error and test those values, not every possible value.



Or should I not be testing data values at all, but rather the results
of actions performed on that data?


Data errors are a high risk area therefore should always be tested.
Look to automate the testing if at all possible and write a program 
to generate the data sets used and ideally to generate the expected 
output data too - but thats hard since presumably you need the SUT 
to do that!



It seems like these questions must be a subset of the issues in the
realm of testing. Can anyone recommend a resource on the types of
tests that should be applied to the various tasks and stages of the
development process?

A friend recommended The Art of Software Testing -- is that the type
of book that covers these issues? 


Yes and is one of the stabndard texts.
But most general software engineering texts cover testing issues.
For example try Somerville, Pressman, McConell etc.


suitable alternative that costs less than $100?


Most of the mentioned authors have at least 1 chapter on testing.

HTH,

PS. It was also interesting to hear how testing has moved on from 
the days I spent as a tester at the beginning of my career in 
software engineering. In particular the challenges of Agile techniques 
for E2E testing and the move away from blanket testing to risk based 
testing, as well as the change in emphasis from try to break it - the 
dominant advice in my day - to check it breaks cleanly under real loads


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Nested loop of I/O tasks

2009-11-24 Thread Bo Li

Dear Python

I am new to Python and having questions about its usage. Currently I have to
read two .csv files INCT and INMRI which are similar to this

INCT
  NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1
1 1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35
1 1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0  Nz 121.57
34.71 14.81 1.35 1 1 1  Reye 91.04 57.59 6.98 1.35 0 1 0
INMRI
NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1 1
1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35 1
1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0
My job is to match the name on the two files and combine the first three
attributes together. So far I tried to read two files. But when I tried to
match the pattern using nested loop, but Python stops me after 1 iteration.
Here is what I got so far.

INCT = open(' *.csv')
INMRI = open(' *.csv')

for row in INCT:
name, x, y, z, a, b, c, d = row.split(,)
print aaa,
for row2 in INMRI:
NAME, X, Y, Z, A, B, C, D = row2.split(,)
if name == NAME:
print aaa


The results are shown below

NONAME NONAME Cella  NONAME Chiasm NONAME FMagnum NONAME
Inion NONAME LEAM NONAME LTM NONAME Leye NONAME Nose
NONAME Nz NONAME REAM NONAME RTM NONAME Reye Cella
Chiasm FMagnum Iz LEAM LEAM LPAF LTM Leye Nz Reye


I was a MATLAB user and am really confused by what happens with me. I wish
someone could help me with this intro problem and probably indicate a
convenient way for pattern matching. Thanks!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Dave Angel




Serdar Tumgoren wrote:

That's a good start.  You're missing one requirement that I think needs to
be explicit.  Presumably you're requiring that the XML be well-formed.  This
refers to things like matching xxx  and /xxx nodes, and proper use of
quotes and escaping within strings.  Most DOM parsers won't even give you a
tree if the file isn't well-formed.



I actually hadn't been checking for well-formedness on the assumption
that ElementTree's parse method did that behind the scenes. Is that
not correct?

(I didn't see any specifics on that subject in the docs:
http://docs.python.org/library/xml.etree.elementtree.html)

  
I also would assume that ElementTree would do the check.  But the point 
is:  it's part of the spec, and needs to be explicitly handled in your 
list of errors:

file yyy.xml  was rejected because .

I am not saying you need to separately test for it in your validator, 
but effectively it's the second test you'll be doing.  (The first is:  
the file exists and is readable)

But most importantly, you can divide the rules where you say if the data
looks like  the file is rejected.   Versus if the data looks like
, we'll pretend it's actually , and keep going.  An example of that
last might be what to do if somebody specifies March 35.  You might just
pretend March 31, and keep going.



Ok, so if I'm understanding -- I should convert invalid data to
sensible defaults where possible (like setting blank fields to 0);
otherwise if the data is clearly invalid and the default is
unknowable, I should flag the field for editing, deletion or some
other type of handling.

  


Exactly.  As you said in one of your other messages, human intervention 
required.  Then the humans may decide to modify the spec to reduce the 
number of cases needing human intervention.  So I see the spec and the 
validator as a matched pair that will evolve.


Note that none of this says anything about testing your code.  You'll 
need a controlled suite of test data to help with that.  The word test 
is heavily overloaded (and heavily underdone) in our industry.


DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] the art of testing

2009-11-24 Thread Nick

As well as the other replies, consider that you are doing unit testing:
http://en.wikipedia.org/wiki/Unit_test

One method is black-box testing, which is where the thing (class, 
function, module) you are testing is treated as a black box, something 
that takes input and returns output, and how it does it are irrelevant 
from the tester's perspective.  You remove the implementation of the thing 
from the test of the thing, which allows you to focus on the tests.  In 
fact, you can write the tests before you code up the black box, which is a 
valid development technique that has many proponents.

You throw all sorts of data at the black box, particularly invalid, 
boundary and corner-case data, and test that the black box returns 
consistent, valid output in all cases, and that it never crashes or 
triggers an exception (unless part of the design).

Once you have run your tests and have produced a set of test failures, you 
have some leads on what within the black box is broken, and you take off 
your tester's hat, put on your developer hat, and go fix them.  Repeat.  A 
pre-built test suite makes this easier and gives consistency.  You know 
that your input data is the same as it was before your changes and you can 
test for consistency of output over your development cycle so that you 
know you haven't inadvertently introduced new bugs.  This process is 
regression testing.  

Clearly, ensuring your test data covers all possible cases is important.  
This is one reason for designing the test suite before building the black 
box.  You won't have inadvertently tainted your mind-set with preconceived 
notions on what the data contains, since you haven't examined it yet.  
(You've only examined the specs to find out what it *should* contain; your 
black box implementation will handle everything else gracefully, returning 
output when it should and triggering an exception when it should.)  This 
frees you up to create all sorts of invalid, i.e. non-specification, and 
boundary test data, without preconceived ideas.

Once you are passing your test data, throw lots of samples of real-life 
data at it.  If your test data was comprehensive, real-life data should be 
a piece of cake.

Obviously I'm a fan of unit-testing. Sure, the first time they're a bit of 
work to build up, but you'll find you can re-use them over and over with a 
small amount of editing.  Many conditions are the same for any program, 
such as (off the top of my head) file-not-found, file-has-one-record-only, 
file-has-a-trillion-records, string-where-int-expected, int-out-of-
expected-range, and so on.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Nested loop of I/O tasks

2009-11-24 Thread Christian Witts


Bo Li wrote:

Dear Python

I am new to Python and having questions about its usage. Currently I 
have to read two .csv files INCT and INMRI which are similar to this


INCT
NONAME  121.57  34.71   14.81   1.350   0   1
Cella   129.25  100.31  27.25   1.351   1   1
Chiasm  130.3   98.49   26.05   1.351   1   1
FMagnum 114.89  144.94  -15.74  1.351   1   1
Iz  121.57  198.52  30.76   1.351   1   1
LEAM160.53  127.6   -1.14   1.351   1   1
LEAM55.2124.66  12.32   1.351   1   1
LPAF180.67  128.26  -9.05   1.351   1   1
LTM 77.44   124.17  15.95   1.351   1   1
Leye146.77  59.17   -2.63   1.351   0   0
Nz  121.57  34.71   14.81   1.351   1   1
Reye91.04   57.59   6.981.350   1   0


INMRI
NONAME  121.57  34.71   14.81   1.350   0   1
Cella   129.25  100.31  27.25   1.351   1   1
Chiasm  130.3   98.49   26.05   1.351   1   1
FMagnum 114.89  144.94  -15.74  1.351   1   1
Iz  121.57  198.52  30.76   1.351   1   1
LEAM160.53  127.6   -1.14   1.351   1   1
LEAM55.2124.66  12.32   1.351   1   1
LPAF180.67  128.26  -9.05   1.351   1   1
LTM 77.44   124.17  15.95   1.351   1   1
Leye146.77  59.17   -2.63   1.351   0   0


My job is to match the name on the two files and combine the first 
three attributes together. So far I tried to read two files. But when 
I tried to match the pattern using nested loop, but Python stops me 
after 1 iteration. Here is what I got so far.


INCT = open(' *.csv')
INMRI = open(' *.csv')

for row in INCT:
name, x, y, z, a, b, c, d = row.split(,)
print aaa,
for row2 in INMRI:
NAME, X, Y, Z, A, B, C, D = row2.split(,)
if name == NAME:
print aaa


The results are shown below

NONAME NONAME Cella  NONAME Chiasm NONAME FMagnum 
NONAME Inion NONAME LEAM NONAME LTM NONAME Leye 
NONAME Nose NONAME Nz NONAME REAM NONAME RTM NONAME 
Reye Cella Chiasm FMagnum Iz LEAM LEAM LPAF LTM 
Leye Nz Reye



I was a MATLAB user and am really confused by what happens with me. I 
wish someone could help me with this intro problem and probably 
indicate a convenient way for pattern matching. Thanks!



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor
  
What's happening is you are iterating over the first file and on the 
first line on that file you start iterating over the second file.  Once 
the second file has been completely looped through it is 'empty' so your 
further iterations over file 1 can't loop through file 2.


If your output is going to be sorted like that so you know NONAME will 
be on the same line in both files what you can do is


INCT = open('something.csv', 'r')
INMRI = open('something_else.csv', 'r')

rec_INCT = INCT.readline()
rec_INMRI = INMRI.readline()

while rec_INCT and rec_INMRI:
   name, x, y, z, a, b, c, d = rec_INCT.split(',')
   NAME, X, Y, Z, A, B, C, D = rec.INMRI.split(',')

   if name == NAME:
   print 'Matches'

   rec_INCT = INCT.readline()
   rec_INMRI = INMRI.readline()

INCT.close()
INMRI.close()

What will happen is that you open the files, read the first line of each 
and then start with the while loop.  It will only run the while as long 
as both the INCT and INMRI files have more lines to read, if one of them 
runs out then it will exit the loop.  It then does the splitting, checks 
to see if it matches at which point you can do your further processing 
and after that read another line of each file.


Of course if the files are not sorted then you would have to process it 
a little differently.  If the file sizes are small you can use one of 
the files to build a dictionary, key being the `name` and value being 
the rest of your data, and then iterate over the second file checking to 
see if the name is in dictionary.  It would also work for this scenario 
of perfect data as well.


Hope that helps.  


--
Kind Regards,
Christian Witts


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Nested loop of I/O tasks

2009-11-24 Thread wesley chun

On Tue, Nov 24, 2009 at 2:42 PM, Bo Li boli1...@gmail.com wrote:

 I am new to Python and having questions about its usage. Currently I have to 
 read two .csv files INCT and INMRI which are similar to this:
 [...]
 I was a MATLAB user and am really confused by what happens with me. I wish 
 someone could help me with this intro problem and probably indicate a 
 convenient way for pattern matching. Thanks!


greetings and welcome to Python!

the problem you are experiencing is due to the fact that you do not
read in and cache your data first. you are iterating over the data in
both files once, which is what enables your first pass to work.

however, on the second pass, INMRI does not return any more data
because you have already exhausted all lines of the file on the first
pass. if you intend on reiterating over the file, then you must read
in all of the data first and just use that data structure rather than
the actual file as you have.

hope this helps!
--wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Core Python Programming, Prentice Hall, (c)2007,2001
Python Fundamentals, Prentice Hall, (c)2009
   http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Nested loop of I/O tasks

2009-11-24 Thread Dave Angel




Bo Li wrote:

Dear Python

I am new to Python and having questions about its usage. Currently I have to
read two .csv files INCT and INMRI which are similar to this

INCT
  NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1
1 1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35
1 1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0  Nz 121.57
34.71 14.81 1.35 1 1 1  Reye 91.04 57.59 6.98 1.35 0 1 0
INMRI
NONAME 121.57 34.71 14.81 1.35 0 0 1  Cella 129.25 100.31 27.25 1.35 1 1
1  Chiasm 130.3 98.49 26.05 1.35 1 1 1  FMagnum 114.89 144.94 -15.74 1.35 1
1 1  Iz 121.57 198.52 30.76 1.35 1 1 1  LEAM 160.53 127.6 -1.14 1.35 1 1 1
LEAM 55.2 124.66 12.32 1.35 1 1 1  LPAF 180.67 128.26 -9.05 1.35 1 1 1  LTM
77.44 124.17 15.95 1.35 1 1 1  Leye 146.77 59.17 -2.63 1.35 1 0 0
My job is to match the name on the two files and combine the first three
attributes together. So far I tried to read two files. But when I tried to
match the pattern using nested loop, but Python stops me after 1 iteration.
Here is what I got so far.

INCT = open(' *.csv')
INMRI = open(' *.csv')

for row in INCT:
name, x, y, z, a, b, c, d = row.split(,)
print aaa,
for row2 in INMRI:
NAME, X, Y, Z, A, B, C, D = row2.split(,)
if name == NAME:
print aaa


The results are shown below

NONAME NONAME Cella  NONAME Chiasm NONAME FMagnum NONAME
Inion NONAME LEAM NONAME LTM NONAME Leye NONAME Nose
NONAME Nz NONAME REAM NONAME RTM NONAME Reye Cella
Chiasm FMagnum Iz LEAM LEAM LPAF LTM Leye Nz Reye


I was a MATLAB user and am really confused by what happens with me. I wish
someone could help me with this intro problem and probably indicate a
convenient way for pattern matching. Thanks!

  
I'm wondering how Christian's quote of your message was formatted so 
much better.  Your csv contents are word-wrapped when I see your email.  
Did you perhaps send it using html mail, instead of text?


The other thing I note (and this is the same with Christian's version of 
your message), is that the code you show wouldn't run, and also wouldn't 
produce the output you supplied, so you must have retyped it instead of 
copy/pasting it.  That makes the job harder, for anybody trying to help.


Christian's analysis of your problem was spot-on.  Files can only be 
iterated once, and thus the inner loop will fail the second time through 
the outer loop.  However, there are two possible fixes that are both 
closer to what you have, and therefore perhaps more desirable.


Simplest change is to do a readlines() on the second file.  This means 
you have to have enough memory for the whole file, stored as a list.


INCT = open('file1.csv')
INMRIlist = open('file2.csv').readlines()

for row in INCT:
   name, x, y, z, a, b, c, d = row.split(,)
   print name,
   for row2 in INMRIlist:
   NAME, X, Y, Z, A, B, C, D = row2.split(,)
   print NAME,
   if name == NAME:
   print ---matched---



The other choice, somewhat slower, but saving of memory, is


INCT = open('file1.csv')
#INMRI = open('file2.csv')

for row in INCT:
   name, x, y, z, a, b, c, d = row.split(,)
   print name,
   for row2 in open('file2.csv'):
   NAME, X, Y, Z, A, B, C, D = row2.split(,)
   print NAME,
   if name == NAME:
   print ---matched---

There are many other things I would change (probably eventually going to 
the dictionary that Christian mentioned), but these are the minimum 
changes to let you continue down the path you've envisioned.



(all code untested, I just typed it directly into the email, assuming 
Python2.6)



DaveA

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

38 matches

Mail list logo