Gilles Ganault wrote: > I have a working Python script that SELECTs rows from a database to > fetch a company's name from a web-based database. > > Since this list is quite big and the site is the bottleneck, I'd like > to run multiple instances of this script, and figured a solution would > be to pick rows at random from the dataset, check in my local database > if this item has already been taken care of, and if not, download > details from the remote web site. > > If someone's done this before, should I perform the randomization in > the SQL query (SQLite using the APSW wrapper > http://code.google.com/p/apsw/), or in Python? > > Thank you. > > Here's some simplified code: > > sql = 'SELECT id,label FROM companies WHERE activity=1' > rows=list(cursor.execute(sql)) > for row in rows: > id = row[0] > label = row[1] > > print strftime("%H:%M") > url = "http://www.acme.com/details.php?id=%s" % id > req = urllib2.Request(url, None, headers) > response = urllib2.urlopen(req).read() > > name = re_name.search(response) > if name: > name = name.group(1) > sql = 'UPDATE companies SET name=? WHERE id=?' > cursor.execute(sql, (name,id) ) I don't think you need to randomize the requests. Instead you could control a pool of worker processes using
http://docs.python.org/library/multiprocessing.html Peter -- http://mail.python.org/mailman/listinfo/python-list