Ah yes, I only explained this part of my algorithm briefly. I said that I filter out "...links that are associated with menus, images, forms, or that go outside of our website...", but neglected to mention my full algorithm I guess: Here's a list of links that I remove from my array(s) of links (I have a sub procedure that gets all of the links on a page, then deletes these ones, and gives me an array to return, then I return that and store it in the BIG array):
Links that are deleted from my array by just not adding them in :) 1. if tempLink.src.empty? 2. if (tempLink.id.to_s)["menu"].nil? 3. if (tempLink.id.to_s)["breadcrumb"].nil? 4. if (tempLink.href.to_s)["javascript"].nil? 5. if not (tempLink.href.to_s).empty? 6. if (tempLink.href.to_s)["@"].nil? 7. if not (tempLink.href.to_s)["vehix"].nil? So a little explanation now. This is actually one long "if" statement and each of these are linked by and's. The first one makes sure that the link isn't attached to an image. Number 2 makes sure that in our case the class name doesn't tell us the link is in a menu - our classes (HTML) are very descriptive like this for the most part. Number 3 makes sure that the link isn't part of a breadcrumb (which is like a traceback of pages that are associated with each other, like in a menu hierarchy). Number 4 makes sure the link isn't a [b]javascript[/b] link. Number 5 makes sure the href of the link has some text in it. Number 6 makes sure that the link isn't a mailto, because those suck, especially when you are actually running something like Outlook or Thunderbird! And finally number 7 makes sure that the link contains our domain name, because all of the links I care about do. So there you have it. I realized a lot of the same things as you did :) Thanks for pointing those out though. I went through about a week to develop this. Now it's all up to my logging to tell me what's wrong. Nathan --------------------------------------------------------------------- Posted via Jive Forums http://forums.openqa.org/thread.jspa?threadID=5183&messageID=14404#14404 _______________________________________________ Wtr-general mailing list [email protected] http://rubyforge.org/mailman/listinfo/wtr-general
