--- In [email protected], "dleidinger" <dleidin...@...> wrote: > > Hi Sheri, > > > Did you have to edit your cookies.txt file? > > No, there had been only a few from cinemazone - for two reasons: > 1. I use Opera by default - not IE > 2. on my machine all temporary files are stored in a ramdisk - so always > clean after reboot.
I still can't do it with wget. In fact I wasn't getting any output from your scriptlet, so I tried to run it from a command line. I got an error that said I needed to contact wget's development team. > > > Your regex gives me an error from pcre when applied to one of the > > files I previously downloaded by other means: "PCRE exec > > failedMatching error -8 backtracking limit exceeded" > > Got the same error for page 9. It seems there are some limitations > for some regex-features. Unanchored patterns can be very inefficient, and backtracking can take a lot of time. The default backtracking limit is 10 million. That means PCRE will try up to 10 million combinations to find a match from a single starting positon! Depending on the pattern, the larger the subject string the more combinations could be possible. So it is usually a blessing to get an error before PCRE tries to locate many multiple matches of such a pattern. However, the pcreservice in the regex plugin (including regex.pcrematchall) do have an optional argument where you can specify your own MatchLimit. > I also noticed some performance-issues, when running my regex on a > big file. After some testing i modified my script: > - download each page one after another > - extract each relevant block (<tr>..</tr>) via regex into a vector > - extract required information for each vector-element via several regex I'm sure that was much better. There is some info in the optional pcre.chm file you may find helpful -- see the page titled "pcreperform". Regards, Sheri
