On 2016-05-06 20:10, DFS wrote:
getAddresses.py

Scrapes addresses from www.usdirectory.com and stores them in a SQLite
database, or writes them to text files for mailing labels, etc

Now, just by typing 'fast food Taco Bell <city> 10 db all' you can find
out how many Taco Bells are within 10 miles of you, and store all the
addresses in your own address database.

No more convoluted Googling, or hitting the 'Next Page' button, or
fumbling with the Yellow Pages...

Note: the db structure is flat on purpose, and the .csv files aren't
quote delimited.

Put the program in its own directory.  It creates the SQLite database
there, and writes files there, too.

Reviews of code, bug reports, criticisms, suggestions for improvement,
etc are all welcome.

OK, you asked for it... :-)

1. It's shorter and clearer not to compare with True or False:

       if verbose:

   and:

       if not dupeRow:

2. You can print a blank line with an empty print statement:

       print

3. When looking for unique items, a set is a better choice than a list:

       addrCheck = set()

       def addrUnique(addr):
           if addr not in addrCheck:
               x = True
               addrCheck.add(addr)
           else:
               x = False
           return x

4. Try string formatting instead multiple concatenation:

       print "%s arguments" % argCnt

5. Strings have a .join method, and when you combine it with string slicing:

       keyw = "+".join(sys.argv[1 : argCnt - 5])

6. Another example of string formatting:

       search = "%s %s %s %s %s" % (keyw, cityzip, state, miles, addrWant)

7. It's recommended to use the 'with' statement when handling files:

       with open(webfile, "w") as f:
           if store == "csv":
               f.write("Name,Address,CityStateZip\n")

If you don't want to use the 'with' statement, note that closing the file is:

           f.close()

   It needs the "()"!

8. When using SQL, you shouldn't try to insert the values yourself; you should use parametrised queries:

       cSQL = "DELETE FROM addresses WHERE datasrc = ? AND search = ?;"
       if verbose:
           print cSQL
       db.execute(cSQL, (datasrc, search))
       conn.commit()

It'll insert the values where the "?" are and will do any necessary quoting itself. (Actually, some drivers use "?", others use "%s", so if it doesn't work with one, try the other.)

The way you wrote it, it would fail if a value contained a "'". It's that kind of thing that leads to SQL injection attacks.

--
https://mail.python.org/mailman/listinfo/python-list

Reply via email to