[EMAIL PROTECTED] said:

> I think that the problem is that the story tags in the contents page st=
> arts in one line and ends in another.
> 
> I think that the same problem happened when I've tryed to scoop some Li=
> nux Documents from http://www.linuxdoc.org, because all Docbook
> generated d= ocuments are a mess. Take a look at source of some HowTos
> at linuxdoc.
> 
> I'm attaching the .site file for the Brazilian News site so you can tak=
> e a look at it.

Hi Rodrigo -- 

that is a mess! whitespace in the URLs themselves!! 2 tips:

  1. if you use "\s+" in the patterns, that will match newlines and any
  whitespace at all.

  2. if the story URL is picked up by sitescooper as containing spaces,
  which it might (Sitescooper as far as I know is totally right to do
  this!) then you might have to use a substitution on the URL, using
  URLProcess, to fix it.

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to