Re: Unwanted Spaces and Iterative Loop

2014-01-27 Thread Mark Lawrence

On 27/01/2014 01:58, matt.s.maro...@gmail.com wrote:

On Sunday, 26 January 2014 20:56:01 UTC-5, Chris Angelico  wrote:

On Mon, Jan 27, 2014 at 12:15 PM,  matt.s.maro...@gmail.com wrote:


I`m not reading and writing to the same file, I just changed the actual paths 
to directory.




For next time, say directory1 and directory2 to preserve the fact

that they're different. Though if they're file names, I'd use file1

and file2 - calling them directory implies that they are, well,

directories :)



ChrisA


Thanks, but any chance you could help me out with my question of removing the 
FarmID from the postal code?



Any chance that you could read and action this 
https://wiki.python.org/moin/GoogleGroupsPython to prevent us seeing the 
double line spacing above?


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread Mark Lawrence

On 26/01/2014 21:46, matt.s.maro...@gmail.com wrote:

I have been working on a python script that separates mailing addresses into 
different components.

Here is my code:

inFile = directory
outFile = directory
inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode)
for line in inHandler:
 str = line.replace(FarmID\tAddress,  )
 outHandler.write(str[0:-1])

 str = str.replace( ,\t, 1)
 str = str.replace( Rd,,\tRd\t\t)
 str = str.replace( Rd,\tRd\t)
 str = str.replace(Ave,,\tAve\t\t)
 str = str.replace(Ave ,\tAve\t\t)
 str = str.replace(St ,\tSt\t\t)
 str = str.replace(St,,\tSt\t\t)
 str = str.replace(Dr,,\tDr\t\t)
 str = str.replace(Lane,,\tLane\t\t)
 str = str.replace(Pky,,\tPky\t\t)
 str = str.replace( Sq,,\tSq\t\t)
 str = str.replace( Pl,,\tPl\t\t)

 str = str.replace(\tE,,E\t)
 str = str.replace(\tN,,N\t)
 str = str.replace(\tS,,S\t)
 str = str.replace(\tW,,W\t)
 str = str.replace(,,\t)
 str = str.replace( ON,ON\t)


 outHandler.write(str)
inHandler.close()

The text file that this manipulates has 91 addresses, so I'll just paste 5 of 
them in here to get the idea:

FarmID  Address
1   1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2   4260 Mountainview Rd, Lincoln, ON L0R 1B2
3   25 Hunter Rd, Grimsby, ON L3M 4A3
4   1091 Hutchinson Rd, Haldimand, ON N0A 1K0

My issue is that in the output file, there is a space before each city and each 
postal code that I do not want there.

Furthermore, the FarmID is being added on to the end of the postal code under 
the original address column for each address.  This also is not supposed to be 
happening, and I am having trouble designing an iterative loop to 
remove/prevent that from happening.

Any help is greatly appreciated!



Make your life easier by using the csv module to read and write your 
data, the write using the excel-tab dialect, see 
http://docs.python.org/3/library/csv.html#module-csv


--
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.


Mark Lawrence

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread MRAB

On 2014-01-26 21:46, matt.s.maro...@gmail.com wrote:

I have been working on a python script that separates mailing addresses into 
different components.

Here is my code:

inFile = directory
outFile = directory
inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')


Shouldn't you be writing a '\n' at the end of the line?


outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode)
for line in inHandler:


This is being done on every single line of the file:


 str = line.replace(FarmID\tAddress,  )
 outHandler.write(str[0:-1])

 str = str.replace( ,\t, 1)
 str = str.replace( Rd,,\tRd\t\t)
 str = str.replace( Rd,\tRd\t)
 str = str.replace(Ave,,\tAve\t\t)
 str = str.replace(Ave ,\tAve\t\t)
 str = str.replace(St ,\tSt\t\t)
 str = str.replace(St,,\tSt\t\t)
 str = str.replace(Dr,,\tDr\t\t)
 str = str.replace(Lane,,\tLane\t\t)
 str = str.replace(Pky,,\tPky\t\t)
 str = str.replace( Sq,,\tSq\t\t)
 str = str.replace( Pl,,\tPl\t\t)

 str = str.replace(\tE,,E\t)
 str = str.replace(\tN,,N\t)
 str = str.replace(\tS,,S\t)
 str = str.replace(\tW,,W\t)
 str = str.replace(,,\t)
 str = str.replace( ON,ON\t)


 outHandler.write(str)
inHandler.close()

The text file that this manipulates has 91 addresses, so I'll just paste 5 of 
them in here to get the idea:

FarmID  Address
1   1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
2   4260 Mountainview Rd, Lincoln, ON L0R 1B2
3   25 Hunter Rd, Grimsby, ON L3M 4A3
4   1091 Hutchinson Rd, Haldimand, ON N0A 1K0

My issue is that in the output file, there is a space before each city and each 
postal code that I do not want there.


You could try splitting on '\t', stripping the leading and trailing
whitespace on each part, and then joining them together again with
'\t'. (Make sure that you also write the '\n' at the end of line.)


Furthermore, the FarmID is being added on to the end of the postal code under 
the original address column for each address.  This also is not supposed to be 
happening, and I am having trouble designing an iterative loop to 
remove/prevent that from happening.

Any help is greatly appreciated!


As Mark said, you could also use the CSV module.

--
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread Jason Friedman


 outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode)

 ...


 FarmID  Address
 1   1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
 2   4260 Mountainview Rd, Lincoln, ON L0R 1B2
 3   25 Hunter Rd, Grimsby, ON L3M 4A3
 4   1091 Hutchinson Rd, Haldimand, ON N0A 1K0


You are wanting to produce tab-separated output, with an Address field
plus the Address split into fields for Street Number, Street Name, Suffix
Type, Direction?

The four lines you have pasted are examples of your input?  If yes,
Direction is a single letter?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread matt . s . marotta
On Sunday, 26 January 2014 18:44:16 UTC-5, Jason Friedman  wrote:
 outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode)
 
 
 
 ...
  
 FarmID  Address
 
 1       1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
 
 2       4260 Mountainview Rd, Lincoln, ON L0R 1B2
 
 3       25 Hunter Rd, Grimsby, ON L3M 4A3
 
 4       1091 Hutchinson Rd, Haldimand, ON N0A 1K0
 
 
  You are wanting to produce tab-separated output, with an Address field 
 plus the Address split into fields for Street Number, Street Name, Suffix 
 Type, Direction?
 
 
 
 The four lines you have pasted are examples of your input?  If yes, 
 Direction is a single letter?

Yes to your first question.  Yes, the four lines I have pasted are examples of 
input.  Direction is a single letter (there are some records that are `King St. 
 E,`). I have solved the problem with the spaces, but still cannot figure out 
the iterative loop to get rid of the farm ID in the address column that isn`t 
split.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread Steven D'Aprano
On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote:

 I have been working on a python script that separates mailing addresses
 into different components.
 
 Here is my code:
 
 inFile = directory
 outFile = directory
 inHandler = open(inFile, 'r')
 outHandler = open(outFile, 'w')

Are you *really* opening the same file for reading and writing at the 
same time?

Even if your operating system allows that, surely it's not a good idea. 
You might get away with it for small files, but at some point you're 
going to run into weird, hard-to-diagnose bugs.


 outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir
\tCity\tProvince\tPostalCode)

This looks like a CSV file using tabs as the separator. You really ought 
to use the csv module.

http://docs.python.org/3/library/csv.html
http://docs.python.org/2/library/csv.html

http://pymotw.com/2/csv/


 for line in inHandler:
 str = line.replace(FarmID\tAddress,  )
 outHandler.write(str[0:-1])
 str = str.replace( ,\t, 1)
 str = str.replace( Rd,,\tRd\t\t)
 str = str.replace( Rd,\tRd\t)
 str = str.replace(Ave,,\tAve\t\t) 
 str = str.replace(Ave,\tAve\t\t)
 str = str.replace(St ,\tSt\t\t)
 str = str.replace(St,,\tSt\t\t)
 str = str.replace(Dr,,\tDr\t\t)
  [snip additional string manipulations]
 str = str.replace(,,\t)
 str = str.replace( ON,ON\t)
 outHandler.write(str)


Aiy aiy aiy, what a mess! I get a headache just trying to understand it!

The first question that comes to mind is that you appear to be writing 
each input line *twice*, first after a very minimal set of string 
manipulations (you convert the literal string FarmID\tAddress to a 
space, then write the whole line out), the second time after a whole mess 
of string replacements. Why?

If the sample data you show below is accurate, I *think* what you are 
trying to do is simply suppress the header line. The first line in the 
input file is:

FarmID  Address

and rather than write that you want to write a space. I don't know why 
you want the output file to begin with a space, but this would be better:

for line in inHandler:
line = line.strip()  # Remove any leading and trailing whitespace,
# including the trailing newline. Later, we'll add a newline 
# back in.
if line == FarmID\tAddress:
outHandler.write( )  # Write a mysterious space.
continue  # And skip to the next line.
# Now process the non-header lines.


Now, as far as the non-header lines, you do a whole lot of complex string 
manipulations, replacing chunks of text with or without tabs or commas to 
the same text with or without tabs but in a different order. The logic of 
these manipulations completely escape me: what are you actually trying to 
do here?

I *strongly* suggest that you don't try to implement your program logic 
in the form of string manipulations. According to your sample data, your 
data looks like this:

1   1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0

i.e. 

farmId TAB address COMMA district COMMA postcode

It is much better to pull the line apart into named components, 
manipulate the components directly, then put it back together in the 
order you want. This makes the code more understandable, and easier to 
change if you ever need to change things.

for line in inHandler:
line = line.strip()
if line == FarmID\tAddress:
outHandler.write( )  # Write a mysterious space.
continue
# Now process the non-header lines.
farmid, address = line.split(\t)
farmid = farmid.strip()
address, district, postcode = address.split(,)
address = address.strip()
district = district.strip()
postcode = postcode.strip()
# Now process the fields however you like.
parts_of_address = address.split( )
street_number = parts_of_address[0]  # first part
street_type = parts_of_address[-1]  # last part
street_name = parts_of_address[1:-1]  # everything else
street_name =  .join(street_name)

and so on for the post code. Then, at the very end, assemble the parts 
you want to write out, join them with tabs, and write:

fields = [farmid, street_number, street_name, street_type, ... ]
outHandler.write(\t.join(fields))
outHandler.write(\n)


Or use the csv module to do the actual writing. It will handle escaping 
anything that needs escaping, newlines, tabs, etc.



-- 
Steven
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread matt . s . marotta
On Sunday, 26 January 2014 19:40:26 UTC-5, Steven D'Aprano  wrote:
 On Sun, 26 Jan 2014 13:46:21 -0800, matt.s.marotta wrote:
 
 
 
  I have been working on a python script that separates mailing addresses
 
  into different components.
 
  
 
  Here is my code:
 
  
 
  inFile = directory
 
  outFile = directory
 
  inHandler = open(inFile, 'r')
 
  outHandler = open(outFile, 'w')
 
 
 
 Are you *really* opening the same file for reading and writing at the 
 
 same time?
 
 
 
 Even if your operating system allows that, surely it's not a good idea. 
 
 You might get away with it for small files, but at some point you're 
 
 going to run into weird, hard-to-diagnose bugs.
 
 
 
 
 
  outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir
 
 \tCity\tProvince\tPostalCode)
 
 
 
 This looks like a CSV file using tabs as the separator. You really ought 
 
 to use the csv module.
 
 
 
 http://docs.python.org/3/library/csv.html
 
 http://docs.python.org/2/library/csv.html
 
 
 
 http://pymotw.com/2/csv/
 
 
 
 
 
  for line in inHandler:
 
  str = line.replace(FarmID\tAddress,  )
 
  outHandler.write(str[0:-1])
 
  str = str.replace( ,\t, 1)
 
  str = str.replace( Rd,,\tRd\t\t)
 
  str = str.replace( Rd,\tRd\t)
 
  str = str.replace(Ave,,\tAve\t\t) 
 
  str = str.replace(Ave,\tAve\t\t)
 
  str = str.replace(St ,\tSt\t\t)
 
  str = str.replace(St,,\tSt\t\t)
 
  str = str.replace(Dr,,\tDr\t\t)
 
   [snip additional string manipulations]
 
  str = str.replace(,,\t)
 
  str = str.replace( ON,ON\t)
 
  outHandler.write(str)
 
 
 
 
 
 Aiy aiy aiy, what a mess! I get a headache just trying to understand it!
 
 
 
 The first question that comes to mind is that you appear to be writing 
 
 each input line *twice*, first after a very minimal set of string 
 
 manipulations (you convert the literal string FarmID\tAddress to a 
 
 space, then write the whole line out), the second time after a whole mess 
 
 of string replacements. Why?
 
 
 
 If the sample data you show below is accurate, I *think* what you are 
 
 trying to do is simply suppress the header line. The first line in the 
 
 input file is:
 
 
 
 FarmIDAddress
 
 
 
 and rather than write that you want to write a space. I don't know why 
 
 you want the output file to begin with a space, but this would be better:
 
 
 
 for line in inHandler:
 
 line = line.strip()  # Remove any leading and trailing whitespace,
 
 # including the trailing newline. Later, we'll add a newline 
 
 # back in.
 
 if line == FarmID\tAddress:
 
 outHandler.write( )  # Write a mysterious space.
 
 continue  # And skip to the next line.
 
 # Now process the non-header lines.
 
 
 
 
 
 Now, as far as the non-header lines, you do a whole lot of complex string 
 
 manipulations, replacing chunks of text with or without tabs or commas to 
 
 the same text with or without tabs but in a different order. The logic of 
 
 these manipulations completely escape me: what are you actually trying to 
 
 do here?
 
 
 
 I *strongly* suggest that you don't try to implement your program logic 
 
 in the form of string manipulations. According to your sample data, your 
 
 data looks like this:
 
 
 
 1 1067 Niagara Stone Rd, Niagara-On-The-Lake, ON L0S 1J0
 
 
 
 i.e. 
 
 
 
 farmId TAB address COMMA district COMMA postcode
 
 
 
 It is much better to pull the line apart into named components, 
 
 manipulate the components directly, then put it back together in the 
 
 order you want. This makes the code more understandable, and easier to 
 
 change if you ever need to change things.
 
 
 
 for line in inHandler:
 
 line = line.strip()
 
 if line == FarmID\tAddress:
 
 outHandler.write( )  # Write a mysterious space.
 
 continue
 
 # Now process the non-header lines.
 
 farmid, address = line.split(\t)
 
 farmid = farmid.strip()
 
 address, district, postcode = address.split(,)
 
 address = address.strip()
 
 district = district.strip()
 
 postcode = postcode.strip()
 
 # Now process the fields however you like.
 
 parts_of_address = address.split( )
 
 street_number = parts_of_address[0]  # first part
 
 street_type = parts_of_address[-1]  # last part
 
 street_name = parts_of_address[1:-1]  # everything else
 
 street_name =  .join(street_name)
 
 
 
 and so on for the post code. Then, at the very end, assemble the parts 
 
 you want to write out, join them with tabs, and write:
 
 
 
 fields = [farmid, street_number, street_name, street_type, ... ]
 
 outHandler.write(\t.join(fields))
 
 outHandler.write(\n)
 
 
 
 
 
 Or use the csv module to do the actual writing. It will handle escaping 
 
 anything that needs escaping, newlines, tabs, etc.
 
 
 
 
 
 
 
 -- 
 
 Steven

I`m not reading and writing to the same file, I just changed the actual paths 
to directory.

This is for a 

Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread Chris Angelico
On Mon, Jan 27, 2014 at 12:15 PM,  matt.s.maro...@gmail.com wrote:
 I`m not reading and writing to the same file, I just changed the actual paths 
 to directory.

For next time, say directory1 and directory2 to preserve the fact
that they're different. Though if they're file names, I'd use file1
and file2 - calling them directory implies that they are, well,
directories :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread matt . s . marotta
On Sunday, 26 January 2014 20:56:01 UTC-5, Chris Angelico  wrote:
 On Mon, Jan 27, 2014 at 12:15 PM,  matt.s.maro...@gmail.com wrote:
 
  I`m not reading and writing to the same file, I just changed the actual 
  paths to directory.
 
 
 
 For next time, say directory1 and directory2 to preserve the fact
 
 that they're different. Though if they're file names, I'd use file1
 
 and file2 - calling them directory implies that they are, well,
 
 directories :)
 
 
 
 ChrisA

Thanks, but any chance you could help me out with my question of removing the 
FarmID from the postal code?
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread Jason Friedman

 I`m not reading and writing to the same file, I just changed the actual
 paths to directory.

 This is for a school assignment, and we haven`t been taught any of the
 stuff you`re talking about.  Although I appreciate your help, everything
 needs to stay as is and I just need to create the loop to get rid of the
 farmID from the end of the postal codes.
 --
 https://mail.python.org/mailman/listinfo/python-list


If you are allowed to use if/then this seems to work:

inFile = data
outFile = processed
inHandler = open(inFile, 'r')
outHandler = open(outFile, 'w')
for line in inHandler:
if line.startswith(FarmID):

outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode\n)
else:
line = line.replace( ,\t, 1)
line = line.replace( Rd,,\tRd\t\t)
line = line.replace( Rd,\tRd\t)
line = line.replace(Ave,,\tAve\t\t)
line = line.replace(Ave ,\tAve\t\t)
line = line.replace(St ,\tSt\t\t)
line = line.replace(St,,\tSt\t\t)
line = line.replace(Dr,,\tDr\t\t)
line = line.replace(Lane,,\tLane\t\t)
line = line.replace(Pky,,\tPky\t\t)
line = line.replace( Sq,,\tSq\t\t)
line = line.replace( Pl,,\tPl\t\t)

line = line.replace(\tE,,E\t)
line = line.replace(\tN,,N\t)
line = line.replace(\tS,,S\t)
line = line.replace(\tW,,W\t)
line = line.replace(,,\t)
line = line.replace( ON,ON\t)

outHandler.write(line)
inHandler.close()
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Unwanted Spaces and Iterative Loop

2014-01-26 Thread matt . s . marotta
On Sunday, 26 January 2014 21:00:35 UTC-5, Jason Friedman  wrote:
 I`m not reading and writing to the same file, I just changed the actual paths 
 to directory.
 
 
 
 This is for a school assignment, and we haven`t been taught any of the stuff 
 you`re talking about.  Although I appreciate your help, everything needs to 
 stay as is and I just need to create the loop to get rid of the farmID from 
 the end of the postal codes.
 
 
 --
 
 https://mail.python.org/mailman/listinfo/python-list
 
 
 
 If you are allowed to use if/then this seems to work:
 
 
 
 inFile = data
 
 outFile = processed
 inHandler = open(inFile, 'r')
 outHandler = open(outFile, 'w')
 
 for line in inHandler:
     if line.startswith(FarmID):
         
 outHandler.write(FarmID\tAddress\tStreetNum\tStreetName\tSufType\tDir\tCity\tProvince\tPostalCode\n)
 
     else:
         line = line.replace( ,\t, 1)
         line = line.replace( Rd,,\tRd\t\t)
 
         line = line.replace( Rd,\tRd\t)
         line = line.replace(Ave,,\tAve\t\t)
         line = line.replace(Ave ,\tAve\t\t)
 
         line = line.replace(St ,\tSt\t\t)
         line = line.replace(St,,\tSt\t\t)
         line = line.replace(Dr,,\tDr\t\t)
 
         line = line.replace(Lane,,\tLane\t\t)
         line = line.replace(Pky,,\tPky\t\t)
 
         line = line.replace( Sq,,\tSq\t\t)
         line = line.replace( Pl,,\tPl\t\t)
 
 
 
         line = line.replace(\tE,,E\t)
         line = line.replace(\tN,,N\t)
         line = line.replace(\tS,,S\t)
 
         line = line.replace(\tW,,W\t)
         line = line.replace(,,\t)
         line = line.replace( ON,ON\t)
 
 
 
         outHandler.write(line)
 inHandler.close()

Unfortunately this did not work - the columns get messed up and there is no 
column for the full address.
-- 
https://mail.python.org/mailman/listinfo/python-list