--- In [email protected], "dleidinger" <dleidin...@...> wrote: > > Hi Sheri, > > Sorry for answering so late - but because i am definitly not a > wget-expert it took me a while, to get the hang of how to > download web-pages, which require a logon. At the end the > solution i found (there may be others) wasn't so difficult. You > have to logon to such a page with a browser (i used IE in that > case) and then export the browser-cookie-files containing the > loggin-values to a text-file. > > This text-file is usaly called "cookies.txt" and can be activated > in wget. I completed the template-script with that wget-options, > including a regex to extract the requested values. (As i am also > not a regex-expert, i guess there are better regular expressions > for doing that).
Hi Detlef, Did you have to edit your cookies.txt file? The other day when I was trying to get the files with wget, I also exported my cookies file from ie (because I couldn't see how to do it from Firefox). Mine was a large file. When I first tried to use it, I got debug output out of wget, that showed the first few lines of that file (which were unrelated to the website I was trying to get). So then I edited the cookie file and kept only the lines with from cinemazone in there (there were a few). Maybe I will try it all again later, just for my own wget experience. Your regex gives me an error from pcre when applied to one of the files I previously downloaded by other means: "PCRE exec failed Matching error -8 backtracking limit exceeded" When I extracted the info the other day I took a different approach. I downloaded pages 1-10. I appended them all together. I did a pcre_matchall that captured the 10 whole tables that contained the member data. Then in the matched result I think I replaced the </td> tags with tabs and all the other tags with "". Regards, Sheri
