[power-pro] Re: Hello. Question for webpage copy

Sheri Sun, 05 Apr 2009 09:27:10 -0700

--- In [email protected], "dleidinger" <dleidin...@...> wrote:
>
> Hi Sheri,
> 
> Sorry for answering so late - but because i am definitly not a
> wget-expert it took me a while, to get the hang of how to
> download web-pages, which require a logon. At the end the
> solution i found (there may be others) wasn't so difficult. You
> have to logon to such a page with a browser (i used IE in that
> case) and then export the browser-cookie-files containing the
> loggin-values to a text-file.
> 
> This text-file is usaly called "cookies.txt" and can be activated
> in wget. I completed the template-script with that wget-options,
> including a regex to extract the requested values. (As i am also
> not a regex-expert, i guess there are better regular expressions
> for doing that).


Hi Detlef,

Did you have to edit your cookies.txt file?

The other day when I was trying to get the files with wget, I also exported my 
cookies file from ie (because I couldn't see how to do it from Firefox). Mine 
was a large file. When I first tried to use it, I got debug output out of wget, 
that showed the first few lines of that file (which were unrelated to the 
website I was trying to get). So then I edited the cookie file and kept only 
the lines with from cinemazone in there (there were a few).

Maybe I will try it all again later, just for my own wget experience.

Your regex gives me an error from pcre when applied to one of the files I 
previously downloaded by other means: "PCRE exec failed Matching error -8 
backtracking limit exceeded"

When I extracted the info the other day I took a different approach.

I downloaded pages 1-10. I appended them all together. I did a pcre_matchall 
that captured the 10 whole tables that contained the member data. Then in the 
matched result I think I replaced the </td> tags with tabs and all the other 
tags with "".

Regards,
Sheri

[power-pro] Re: Hello. Question for webpage copy

Reply via email to