Re: Web page source using wget?
Thanks Hrvoje, using http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query in IE worked like a charm. I didn't have to follow links. I am now trying to automate this using wget 1.8.2 (Windows). There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query I tried different ways to get this working, but so far have been unsuccessful. Any ideas? Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 6:12 PM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: It does look a little complicated This is how it looks: form action=InventoryStatus.asp method=post [...] [...] select name=cboSupplier option value=4541-134289454A/option option value=4542-134289 selected454B/option /select Those are the important parts. It's not hard to submit this form. With Wget 1.9, you can even use the POST method, e.g.: wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4541-134289status=allaction-select=Query' \ -O InventoryStatus1.asp wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4542-134289status=allaction-select=Query' -O InventoryStatus2.asp It might even work to simply use GET, and retrieve http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query without the need for `--post-data' or `-O', but that depends on the ASP script that does the processing. The harder part is to automate this process for *any* values in the drop-down list. You might need to use an intermediary Perl script that extracts all the option value=... from the HTML source of the page with the drop-down. Then, from the output of the Perl script, you call Wget as shown above. It's doable, but it takes some work. Unfortunately, I don't know of a (command-line) tool that would make this easier.
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Did you add --save-cookies=FILE? By default Wget will use cookies, but will not save them to an external file and they will therefore be lost. 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query For this step, add --load-cookies=FILE, where FILE is the same file you specified to --save-cookies above.
Re: Web page source using wget?
I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login wget --load-cookies=cookies.txt http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier=4541-134289status=allaction-select=Query --http-user=4542-134289 After executing the above two lines, it creates two files: 1). [EMAIL PROTECTED] : I can see that this file contains a message (among other things): Your session has expired due to a period of inactivity 2). [EMAIL PROTECTED] Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:37 AM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Did you add --save-cookies=FILE? By default Wget will use cookies, but will not save them to an external file and they will therefore be lost. 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query For this step, add --load-cookies=FILE, where FILE is the same file you specified to --save-cookies above.
Re: Web page source using wget?
A slight correction the first wget should read: wget --save-cookies=cookies.txt http://customer.website.com/supplyweb/general/default.asp?UserAccount=USERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submit=Login I tried this link in IE, but it it comes back to the same login screen. No errors messages are displayed at this point. Am I missing something? I have attached the source for the login page. Thanks, Suhas - Original Message - From: Suhas Tembe [EMAIL PROTECTED] To: Hrvoje Niksic [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:53 AM Subject: Re: Web page source using wget? I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login wget --load-cookies=cookies.txt http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier=4541-134289status=allaction-select=Query --http-user=4542-134289 After executing the above two lines, it creates two files: 1). [EMAIL PROTECTED] : I can see that this file contains a message (among other things): Your session has expired due to a period of inactivity 2). [EMAIL PROTECTED] Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:37 AM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Did you add --save-cookies=FILE? By default Wget will use cookies, but will not save them to an external file and they will therefore be lost. 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query For this step, add --load-cookies=FILE, where FILE is the same file you specified to --save-cookies above.
Re: Web page source using wget?
Hi Suhas! Well, I am by no means an expert, but I think that wget closes the connection after the first retrieval. The SSL server realizes this and decides that wget has no right to log in for the second retrieval, eventhough the cookie is there. I think that is a correct behaviour for a secure server, isn't it? Does this make sense? Jens A slight correction the first wget should read: wget --save-cookies=cookies.txt http://customer.website.com/supplyweb/general/default.asp?UserAccount=U SERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submi t=Login I tried this link in IE, but it it comes back to the same login screen. No errors messages are displayed at this point. Am I missing something? I have attached the source for the login page. Thanks, Suhas - Original Message - From: Suhas Tembe [EMAIL PROTECTED] To: Hrvoje Niksic [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:53 AM Subject: Re: Web page source using wget? I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login wget --load-cookies=cookies.txt http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier =4541-134289status=allaction-select=Query --http-user=4542-134289 After executing the above two lines, it creates two files: 1). [EMAIL PROTECTED] : I can see that this file contains a message (among other things): Your session has expired due to a period of inactivity 2). [EMAIL PROTECTED] Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:37 AM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Did you add --save-cookies=FILE? By default Wget will use cookies, but will not save them to an external file and they will therefore be lost. 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289 status=allaction-select=Query For this step, add --load-cookies=FILE, where FILE is the same file you specified to --save-cookies above. -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Hopefully you used quotes to protect the spaces in URLs from the shell? After the first command, does `cookies.txt' contains what looks like a valid cookie?
Re: Web page source using wget?
Jens Rösner [EMAIL PROTECTED] writes: Well, I am by no means an expert, but I think that wget closes the connection after the first retrieval. The SSL server realizes this and decides that wget has no right to log in for the second retrieval, eventhough the cookie is there. I think that is a correct behaviour for a secure server, isn't it? Why would it be correct? Persistent connections are a mere optimization; a new connection should work as well as the old one, as long as the credentials (usually provided by a cookie) are provided. There are security mechanisms that authorize on a per-connection basis, and they require new log in for each new connections (I believe NTLM is like this), but this should not be the case here. Even if it were the case, you could tell Wget to use the same connection, like this: wget http://URL1... http://URL2... In that case you shouldn't even have to bother with `--save-cookies' and `--load-cookies'. But maybe something else is going wrong for Suhas; I really don't know.
Re: Web page source using wget?
So, is there a way I can get to the page I want after logging into a secure server using wget? Can I keep the SSL connection open for the second retrieval to work? The other thing I noticed is that the first URL (to log in) does not seem to work, because when I use that same URL in IE, it brings me back to the login screen (see attached source of the login page). I don't get logged-in. I am not quite sure if it is the URL that is incorrect or it is something else. Thanks, Suhas - Original Message - From: Jens Rösner [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 12:51 PM Subject: Re: Web page source using wget? Hi Suhas! Well, I am by no means an expert, but I think that wget closes the connection after the first retrieval. The SSL server realizes this and decides that wget has no right to log in for the second retrieval, eventhough the cookie is there. I think that is a correct behaviour for a secure server, isn't it? Does this make sense? Jens A slight correction the first wget should read: wget --save-cookies=cookies.txt http://customer.website.com/supplyweb/general/default.asp?UserAccount=U SERAccessCode=PASSWORDLocale=en-usTimeZone=EST:-300action-Submi t=Login I tried this link in IE, but it it comes back to the same login screen. No errors messages are displayed at this point. Am I missing something? I have attached the source for the login page. Thanks, Suhas - Original Message - From: Suhas Tembe [EMAIL PROTECTED] To: Hrvoje Niksic [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:53 AM Subject: Re: Web page source using wget? I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login wget --load-cookies=cookies.txt http://customer.website.com/supplyweb/smi/inventorystatus.asp?cboSupplier =4541-134289status=allaction-select=Query --http-user=4542-134289 After executing the above two lines, it creates two files: 1). [EMAIL PROTECTED] : I can see that this file contains a message (among other things): Your session has expired due to a period of inactivity 2). [EMAIL PROTECTED] Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 11:37 AM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: There are two steps involved: 1). Log in to the customer's web site. I was able to create the following link after I looked at the form section in the source as explained to me earlier by Hrvoje. wget http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLoca le=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Did you add --save-cookies=FILE? By default Wget will use cookies, but will not save them to an external file and they will therefore be lost. 2). Execute: wget http://customer.website.com/InventoryStatus.asp?cboSupplier=4541-134289 status=allaction-select=Query For this step, add --load-cookies=FILE, where FILE is the same file you specified to --save-cookies above. -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++ html xmlns:bml=urn:brainna.com:bml:2002 head META http-equiv=Content-Type content=text/html; charset=ISO-8859-1 titleSupplyWEB Login/title /headscript language=JavaScript1.1 type=text/javascript var amSymbol = AM; var pmSymbol = PM; var negativeSymbol = -; var dateSeparator = /; var dateFormat = M/dd/; var timeSeparator = :; var timeFormat = h:mm:ss t; var decimalSeparator = .; function setIcon(icon, required, valid) { if (!valid) { icon.alt = X; icon.src = ../images/error.gif; } else if (required) { icon.alt = *; icon.src = ../images/required.gif; } else { icon.alt = ; icon.src = ../images/blank.gif; } } function login_UserAccount_validate() { var valid = true; setIcon(document.login.UserAccount_icon, true, valid); return valid; } function login_AccessCode_validate() { var valid = true; setIcon(document.login.AccessCode_icon, true, valid); return valid; } function login_Locale_validate() { var valid = true; if (valid) valid = login_Locale_custom_validate(document.login.Locale); setIcon(document.login.Locale_icon, true, valid); return valid; } function
Re: Web page source using wget?
Cookies.txt looks like this: # HTTP cookie file. # Generated by Wget on 2003-10-13 13:19:26. # Edit at your own risk. There is nothing after the 3rd line. So, it doesn't look like a valid cookie file. - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 13, 2003 12:57 PM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: I tried, but it doesn't seem to have worked. This what I did: wget --save-cookies=cookies.txt http://customer.website.com?UserAccount=USERAccessCode=PASSWORDLocale=English (United States)TimeZone=(GMT-5:00) Eastern Standard Time (USA amp; Canada)action-Submit=Login Hopefully you used quotes to protect the spaces in URLs from the shell? After the first command, does `cookies.txt' contains what looks like a valid cookie?
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: Cookies.txt looks like this: # HTTP cookie file. # Generated by Wget on 2003-10-13 13:19:26. # Edit at your own risk. There is nothing after the 3rd line. So, it doesn't look like a valid cookie file. It's valid all right, but there are no cookies inside. The thing is, Wget will only save cookies that are marked as permanent through an expiry date in the future. Currently there is no way to force saving non-permanent cookies. You can, however, run both URLs in the same Wget invocation by providing them both on the command line. That way cookies should be shared.
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: The other thing I noticed is that the first URL (to log in) does not seem to work, because when I use that same URL in IE, it brings me back to the login screen (see attached source of the login page). I don't get logged-in. Why are you using that URL if it is confirmed that it doesn't work? The form tag in the login script specifies the POST method. Therefore it is quite possible that the login script requires the use of POST. If that is the case, you'll need to get Wget 1.9-beta and provide login information with the `--post-data' option. I'm sorry I don't have better news for you. Web services can be a real pain.
Re: Web page source using wget?
Hi Hrvoje! retrieval, eventhough the cookie is there. I think that is a correct behaviour for a secure server, isn't it? Why would it be correct? Sorry, I seem to have been misled by my own (limited) experience: From the few secure sites I use, most will not let you log in again after you closed and restarted your browser or redialed your connection. That's what reminded my of Suhas' problem. Even if it were the case, you could tell Wget to use the same connection, like this: wget http://URL1... http://URL2... Right, I always forget that, thanks! Cya Jens -- NEU FÜR ALLE - GMX MediaCenter - für Fotos, Musik, Dateien... Fotoalbum, File Sharing, MMS, Multimedia-Gruß, GMX FotoService Jetzt kostenlos anmelden unter http://www.gmx.net +++ GMX - die erste Adresse für Mail, Message, More! +++
Re: Web page source using wget?
Jens Rösner [EMAIL PROTECTED] writes: Hi Hrvoje! retrieval, eventhough the cookie is there. I think that is a correct behaviour for a secure server, isn't it? Why would it be correct? Sorry, I seem to have been misled by my own (limited) experience: From the few secure sites I use, most will not let you log in again after you closed and restarted your browser That merely means that the cookie is marked non-permanent -- which is probably the case here as well. A site that banned reconnecting would effectively ban all HTTP/1.0 browsers, which would probably be going too far.
Re: Web page source using wget?
Thanks everyone for the replies so far.. The problem I am having is that the customer is using ASP Java script. The URL stays the same as I click through the links. So, using wget URL for the page I want may not work (I may be wrong). Any suggestions on how I can tackle this? Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 06, 2003 5:19 PM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: Hello Everyone, I am new to this wget utility, so pardon my ignorance.. Here is a brief explanation of what I am currently doing: 1). I go to our customer's website every day log in using a User Name Password. 2). I click on 3 links before I get to the page I want. 3). I right-click on the page choose view source. It opens it up in Notepad. 4). I save the source to a file subsequently perform various tasks on that file. As you can see, it is a manual process. What I would like to do is automate this process of obtaining the source of a page using wget. Is this possible? Maybe you can give me some suggestions. It's possible, in fact it's what Wget does in its most basic form. Disregarding authentication, the recipe would be: 1) Write down the URL. 2) Type `wget URL' and you get the source of the page in file named SOMETHING.html, where SOMETHING is the file name that the URL ends with. Of course, you will also have to specify the credentials to the page, and Tony explained how to do that.
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: Thanks everyone for the replies so far.. The problem I am having is that the customer is using ASP Java script. The URL stays the same as I click through the links. URL staying the same is usually a sign of the use of frame, not of ASP and JavaScript. Instead of looking at the URL entry field, try using copy link to clipboard instead of clicking on the last link. Then use Wget on that.
Re: Web page source using wget?
Got it! Thanks! So far so good. After logging-in, I was able to get to the page I am interested in. There was one thing that I forgot to mention in my earlier posts (I apologize)... this page contains a drop-down list of our customer's locations. At present, I choose one location from the drop-down list click submit to get the data, which is displayed in a report format. I right-click then choose view source save source to a file. I then choose the next location from the drop-down list, click submit again. I again do a view source save the source to another file and so on for all their locations. I am not quite sure how to automate this process! How can I do this non-interactively? especially the submit portion of the page. Is this possible using wget? Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 5:02 PM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: Thanks everyone for the replies so far.. The problem I am having is that the customer is using ASP Java script. The URL stays the same as I click through the links. URL staying the same is usually a sign of the use of frame, not of ASP and JavaScript. Instead of looking at the URL entry field, try using copy link to clipboard instead of clicking on the last link. Then use Wget on that.
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: this page contains a drop-down list of our customer's locations. At present, I choose one location from the drop-down list click submit to get the data, which is displayed in a report format. I right-click then choose view source save source to a file. I then choose the next location from the drop-down list, click submit again. I again do a view source save the source to another file and so on for all their locations. It's possible to automate this, but it requires some knowledge of HTML. Basically, you need to look at the form.../form part of the page and find the select tag that defines the drop-down. Assuming that the form looks like this: form action=http://foo.com/customer; method=GET select name=location option value=caCalifornia option value=maMassachussetts ... /select /form you'd automate getting the locations by doing something like: for loc in ca ma ... do wget http://foo.com/customer?location=$loc; done Wget will save the respective sources in files named customer?location=ca, customer?location=ma, etc. But this was only an example. The actual process depends on what's in the form, and it might be considerably more complex than this.
Re: Web page source using wget?
It does look a little complicated This is how it looks: form action=InventoryStatus.asp method=post name=select onsubmit=return select_validate(); style=margin:0 div style=margin-top:10px table border=1 bordercolor=#d9d9d9 bordercolordark=#ff bordercolorlight=#d9d9d9 cellpadding=3 cellspacing=0 width=100% tr td style=font-weight:bold;color:black;background-color:#CC;text-align:right width=20%nobrSuppliernbsp;/nobr/td td style=color:black;background-color:#F0;text-align:left colspan=2nobrselect name=cboSupplieroption value=4541-134289454A/option option value=4542-134289 selected454B/option/select img id=cboSupplier_icon name=cboSupplier_icon src=../images/required.gif alt=*/nobr/td /tr tr td style=font-weight:bold;color:black;background-color:#CC;text-align:right width=20%nobrQuantity Statusnbsp;/nobr/td td style=color:black;background-color:#F0;text-align:left colspan=2 table border=0 cellpadding=0 cellspacing=0 tr td table border=0 tr td width=1input id=choice_IDAMCB3B name=status type=radio value=over/td td style=color:black;background-color:#F0;text-align:leftspan onclick=choice_IDAMCB3B.checked=true; Over/span/td td width=1input id=choice_IDARCB3B name=status type=radio value=under/td td style=color:black;background-color:#F0;text-align:leftspan onclick=choice_IDARCB3B.checked=true; Under/span/td td width=1input id=choice_IDAWCB3B name=status type=radio value=both/td td style=color:black;background-color:#F0;text-align:leftspan onclick=choice_IDAWCB3B.checked=true; Both/span/td td width=1input id=choice_IDA1CB3B name=status type=radio value=all checked/td td style=color:black;background-color:#F0;text-align:leftspan onclick=choice_IDA1CB3B.checked=true; All/span/td /tr /table /td td img id=status_icon name=status_icon src=../images/blank.gif alt=/td /tr /table /td /tr tr td style=font-weight:bold;color:black;background-color:#CCnbsp;/td td colspan=2 style=font-weight:bold;color:black;background-color:#CC;text-align:leftinput type=submit name=action-select value=Query onclick=doValidate = true; /td /tr /table /div /form I don't see any specific URL that would get the relevant data after I hit submit. Maybe I am missing something... Thanks, Suhas - Original Message - From: Hrvoje Niksic [EMAIL PROTECTED] To: Suhas Tembe [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, October 07, 2003 5:24 PM Subject: Re: Web page source using wget? Suhas Tembe [EMAIL PROTECTED] writes: this page contains a drop-down list of our customer's locations. At present, I choose one location from the drop-down list click submit to get the data, which is displayed in a report format. I right-click then choose view source save source to a file. I then choose the next location from the drop-down list, click submit again. I again do a view source save the source to another file and so on for all their locations. It's possible to automate this, but it requires some knowledge of HTML. Basically, you need to look at the form.../form part of the page and find the select tag that defines the drop-down. Assuming that the form looks like this: form action=http://foo.com/customer; method=GET select name=location option value=caCalifornia option value=maMassachussetts ... /select /form you'd automate getting the locations by doing something like: for loc in ca ma ... do wget http://foo.com/customer?location=$loc; done Wget will save the respective sources in files named customer?location=ca, customer?location=ma, etc. But this was only an example. The actual process depends on what's in the form, and it might be considerably more complex than this.
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: It does look a little complicated This is how it looks: form action=InventoryStatus.asp method=post [...] [...] select name=cboSupplier option value=4541-134289454A/option option value=4542-134289 selected454B/option /select Those are the important parts. It's not hard to submit this form. With Wget 1.9, you can even use the POST method, e.g.: wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4541-134289status=allaction-select=Query' \ -O InventoryStatus1.asp wget http://.../InventoryStatus.asp --post-data \ 'cboSupplier=4542-134289status=allaction-select=Query' -O InventoryStatus2.asp It might even work to simply use GET, and retrieve http://.../InventoryStatus.asp?cboSupplier=4541-134289status=allaction-select=Query without the need for `--post-data' or `-O', but that depends on the ASP script that does the processing. The harder part is to automate this process for *any* values in the drop-down list. You might need to use an intermediary Perl script that extracts all the option value=... from the HTML source of the page with the drop-down. Then, from the output of the Perl script, you call Wget as shown above. It's doable, but it takes some work. Unfortunately, I don't know of a (command-line) tool that would make this easier.
Web page source using wget?
Hello Everyone, I am new to this wget utility, so pardon my ignorance.. Here is a brief explanation of what I am currently doing: 1). I go to our customer's website every day log in using a User Name Password. 2). I click on 3 links before I get to the page I want. 3). I right-click on the page choose view source. It opens it up in Notepad. 4). I save the source to a file subsequently perform various tasks on that file. As you can see, it is a manual process. What I would like to do is automate this process of obtaining the source of a page using wget. Is this possible? Maybe you can give me some suggestions. Thanks in advance. Suhas
Re: Web page source using wget?
Suhas Tembe wrote: 1). I go to our customer's website every day log in using a User Name Password. [snip] 4). I save the source to a file subsequently perform various tasks on that file. What I would like to do is automate this process of obtaining the source of a page using wget. Is this possible? That depends on how you enter your user name and password. If it's via using an HTTP user ID and password, that's pretty easy. wget http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS If you supply your user ID and password via a web form, it will be tricky (if not impossible) because wget doesn't POST forms (unless someone added that option while I wasn't looking. :-) Tony
Re: Web page source using wget?
Tony Lewis [EMAIL PROTECTED] writes: wget http://www.custsite.com/some/page.html --http-user=USER --http-passwd=PASS If you supply your user ID and password via a web form, it will be tricky (if not impossible) because wget doesn't POST forms (unless someone added that option while I wasn't looking. :-) Wget 1.9 can send POST data. But there's a simpler way to handle web sites that use cookies for authorization: make Wget use the site's own cookie. Export cookies as explained in the manual, and specify: wget --load-cookies=COOKIE-FILE http://... Here is an excerpt from the manual section that explains how to export cookies. `--load-cookies FILE' Load cookies from FILE before the first HTTP retrieval. FILE is a textual file in the format originally used by Netscape's `cookies.txt' file. You will typically use this option when mirroring sites that require that you be logged in to access some or all of their content. The login process typically works by the web server issuing an HTTP cookie upon receiving and verifying your credentials. The cookie is then resent by the browser when accessing that part of the site, and so proves your identity. Mirroring such a site requires Wget to send the same cookies your browser sends when communicating with the site. This is achieved by `--load-cookies'--simply point Wget to the location of the `cookies.txt' file, and it will send the same cookies your browser would send in the same situation. Different browsers keep textual cookie files in different locations: Netscape 4.x. The cookies are in `~/.netscape/cookies.txt'. Mozilla and Netscape 6.x. Mozilla's cookie file is also named `cookies.txt', located somewhere under `~/.mozilla', in the directory of your profile. The full path usually ends up looking somewhat like `~/.mozilla/default/SOME-WEIRD-STRING/cookies.txt'. Internet Explorer. You can produce a cookie file Wget can use by using the File menu, Import and Export, Export Cookies. This has been tested with Internet Explorer 5; it is not guaranteed to work with earlier versions. Other browsers. If you are using a different browser to create your cookies, `--load-cookies' will only work if you can locate or produce a cookie file in the Netscape format that Wget expects. If you cannot use `--load-cookies', there might still be an alternative. If your browser supports a cookie manager, you can use it to view the cookies used when accessing the site you're mirroring. Write down the name and value of the cookie, and manually instruct Wget to send those cookies, bypassing the official cookie support: wget --cookies=off --header Cookie: NAME=VALUE
Re: Web page source using wget?
Suhas Tembe [EMAIL PROTECTED] writes: Hello Everyone, I am new to this wget utility, so pardon my ignorance.. Here is a brief explanation of what I am currently doing: 1). I go to our customer's website every day log in using a User Name Password. 2). I click on 3 links before I get to the page I want. 3). I right-click on the page choose view source. It opens it up in Notepad. 4). I save the source to a file subsequently perform various tasks on that file. As you can see, it is a manual process. What I would like to do is automate this process of obtaining the source of a page using wget. Is this possible? Maybe you can give me some suggestions. It's possible, in fact it's what Wget does in its most basic form. Disregarding authentication, the recipe would be: 1) Write down the URL. 2) Type `wget URL' and you get the source of the page in file named SOMETHING.html, where SOMETHING is the file name that the URL ends with. Of course, you will also have to specify the credentials to the page, and Tony explained how to do that.