On Thu, 13 Jun 2002, Bill Janssen wrote: > > That was going to be another question. What is the advantage of using > > Sitescooper if you are already using Plucker. > > In general, Plucker has focussed on producing good documents from > available Web pages, rather than on all the technicalities of fetching > Web pages and the manifold various impediments placed in the way of > that fetching. Sitescooper, on the other hand, has put much more work > into fetching remote pages and storing them locally.
Also, sitescooper has a mature library of URLs and post-processing scripts, which plucker is in the process of building. sitescooper has provision for specifying in a sitefile that bits and pieces be stripped from the pages. For instance :- URL: http://www.mozillazine.org/contents.rdf Name: MozillaZine Description: Your source for Mozilla news, advocacy, interviews, builds, and more! ContentsFormat: rss StoryURL: /talkback\.html\?article=\d+ # You may also want to add a StoryStart and StoryEnd line to # clean up the stories. Here's sample lines (you need to edit them): # StoryStart: --features-- StoryEnd: form method="post" action Cheers, Andy!
