Dear all, I am new to this list, and even though I tried to go through the archives, I couldn't find this problem mentioned before.
Summary: ------- Plucker Desktop has a link-traversal/MAXDEPTH problem that 1.1.13 does not exhibit. Possible cause/Suggested solution: [for developers, really... :-] --------------------------------- Seems to be caused by doing depth first traversal instead of breadth first. Using a FIFO queue instead of a LIFO stack for parsed URLs could solve the problem. But I haven't looked at the source - I'm not really comfortable with Python ;-) Detail: ------ Say MAXDEPTH=3. A link appears on one page at level 2, but *also* (through a different path) on another page on level 3. If Plucker arrives at the "level 2" page before the "level 3" page, it will fetch that link. If, however, it arrives at the "level 3" page first, it does not fetch it. In other words, order matters in determining what gets fetched, not just level. Simple example: -------------- http://www.wired.com/news_drop/palmpilot/index.html is a low bandwidth site. The main (level 1) page has 5 links (Top Stories, Business, Culture, Technology, and Politics - I will abbreviate them as TS, B, C, T, and P in my description below). Each "level 2" page has 1 or more article summaries, each of which links to the corresponding full article. The problem is that Plucker will fetch full articles *only* for the Politics page, no others! Take *special note* of the bottom "nav bar" on each of the 5 level 2 pages - the same links (TS, B, C, T, and P) appear there. They are part of the problem :-) Here's what Plucker does (remember MAXDEPTH = 3): --> depth 1 fetch MAIN page save 5 links TS, B, C, T, P --> depth 2 fetch P (Politics) [Plucker appears to put them in LIFO order, so P gets picked up first] push URLs for article headers to stack push URLs for bottom "nav bar" (same TS, B, C, T, and P) to stack --> depth 3 fetch pages (TS, B, C, T) from nav bar links stored just now. DO NOT recurse, because you are already at level 3 Actually, they are fetched in the order T, C, B, and TS. --> depth 3 fetch articles for article headers parsed from the "P" page (now done with P page) --> depth 2 look at stack and see "T" as the next entry DISCARD it because it's already been parsed PROBLEM: article bodies for article headers in T never get fetched. I hope that makes sense. The end result is that I can see detailed articles ONLY for the Politics page. None of the others. If my description of the problem is pathetic I ask that you go to http://www.wired.com/news_drop/palmpilot/index.html and browse around a bit, then try to download this via the Plucker Desktop, at MAXDEPTH=3. Thanks for a great product! Sitaram _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

