En Sun, 25 May 2008 22:42:06 -0300, <[EMAIL PROTECTED]> escribió:
> def joinSets(set1, set2):
> for i in set2:
> set1.add(i)
> return set1
Use the | operator, or |=
> Traceback (most recent call last):
> File "C:/Python25/Progs/WebCrawler/spider2.py", line 47, in <module>
> x = scrapeSites("http://www.yahoo.com")
> File "C:/Python25/Progs/WebCrawler/spider2.py", line 31, in
> scrapeSites
> site = iterator.next()
> RuntimeError: Set changed size during iteration
You will need two sets: the one you're iterating over, and another collecting
new urls. Once you finish iterating the first, continue with the new ones; stop
when it's empty.
> def scrapeSites(startAddress):
> site = startAddress
> sites = set()
> iterator = iter(sites)
> pos = 0
> while pos < 10:#len(sites):
> newsites = scrapeSite(site)
> joinSets(sites, newsites)
> pos += 1
> site = iterator.next()
> return sites
Try this (untested):
def scrapeSites(startAddress):
allsites = set() # all links found so far
pending = set([startAddress]) # pending sites to examine
while pending:
newsites = set() # new links
for site in pending:
newsites |= scrapeSite(site)
pending = newsites - allsites
allsites |= newsites
return allsites
> wtf? im not multithreading or anything so how can the size change here?
You modified the set you were iterating over. Another example of the same
problem:
d = {'a': 1, 'b': 2, 'c':3}
for key in d:
d[key+key]=0
--
Gabriel Genellina
--
http://mail.python.org/mailman/listinfo/python-list