--- In [email protected], "dleidinger" <dleidin...@...> wrote:
>
> Hi Sheri,
> 
> > Did you have to edit your cookies.txt file?
> 
> No, there had been only a few from cinemazone - for two reasons:
> 1. I use Opera by default - not IE
> 2. on my machine all temporary files are stored in a ramdisk - so always 
> clean after reboot.

I still can't do it with wget. In fact I wasn't getting any output from your 
scriptlet, so I tried to run it from a command line. I got an error that said I 
needed to contact wget's development team.

> 
> > Your regex gives me an error from pcre when applied to one of the
> > files I previously downloaded by other means: "PCRE exec
> > failedMatching error -8 backtracking limit exceeded"
> 
> Got the same error for page 9. It seems there are some limitations 
> for some regex-features.

Unanchored patterns can be very inefficient, and backtracking can take a lot of 
time. The default backtracking limit is 10 million. That means PCRE will try up 
to 10 million combinations to find a match from a single starting positon! 
Depending on the pattern, the larger the subject string the more combinations 
could be possible. So it is usually a blessing to get an error before PCRE 
tries to locate many multiple matches of such a pattern. However, the 
pcreservice in the regex plugin (including regex.pcrematchall) do have an 
optional argument where you can specify your own MatchLimit.

> I also noticed some performance-issues, when running my regex on a 
> big file. After some testing i modified my script:
> - download each page one after another
> - extract each relevant block (<tr>..</tr>) via regex into a vector
> - extract required information for each vector-element via several regex

I'm sure that was much better.

There is some info in the optional pcre.chm file you may find helpful -- see 
the page titled "pcreperform".

Regards,
Sheri


Reply via email to